|
Boost : |
From: dylan_nicholson (dylan_nicholson_at_[hidden])
Date: 2002-03-05 21:48:42
--- In boost_at_y..., "dietmar_kuehl" <dietmar_kuehl_at_y...> wrote:
> The specification above, however, allows users to possible encode
> names using UTF-8 if they want to do so (well, I think UTF-8 would
> create certain reserved characters, notably '/', but this is an
issue
> users of the code have to address).
>
Actually UTF-8 only uses characters in the 128-255 range for
multibyte characters so '/' can't be "generated" from a UTF-16 or UTF-
32 string that didn't already include '/'.
If only Windows' MBCS encoding schemes were like this - writing code
to avoid finding '\' as the second byte of an MBCS encoding is a pain
in the ass. But it *is* necessary. The basic idea is to write code
something like
inline int safe_mblen(const char* cptr)
{
int l = mblen(cptr, MB_CUR_MAX);
return l > 1 ? l : 1;
}
inline int find(const char* scan, char c)
{
for (const char* cptr = scan; *cptr; cptr += safe_mblen(cptr))
if (c == *cptr)
return cptr - scan;
return -1;
}
Which works at least with MSVC's CRT implementation.
Dylan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk