From: dylan_nicholson (dylan_nicholson_at_[hidden])
Date: 2002-03-03 21:36:52
--- In boost_at_y..., Jan Langer <jan_at_l...> wrote:
> the filesystem library will need a mechanism for converting
> char types into other char types. the same problem also occurs with
> basic_string. i think this is a quite genaral case and it is worth a
> general solution. an other example is a program using wstring and
> wanting to print to a normal char-stream.
> a reasonable way of solving this is to get a ctype facet and to
> narrow or widen to the character.
However you can't assume that will do the correct mapping.
Example (I've given before) is that with Win9x, even though long
filenames are stored in UTF-16 format inside the FAT file system, the
only access to these names is via a Windows only propriety MBCS
encodings. To do those encodings you must use MultiByteToWideChar
(CP_ACP, ...) and WideCharToMultiByte(CP_ACP, ...).
The same is essentially true under NT except that of course NT *can*
handle both Unicode and MBCS filenames internally, so there really
should be no need for library code to do any conversions.
In fact MS do provide a "Unicode" layer for Win9x that does these
too, and it would not be (IMHO) unreasonable to simply require that
if you *wish* to use std::wstring to hold filenames and you want to
support Win9x then you must use MS's supplied library (as far as I
understand it, you simply download it, link it in your application,
and redistribute the DLL it with your application - it has some magic
to continue working correctly under NT). That way at least for the
Win32 implementation *no* wstring <-> string conversions should be
For POSIX however, assuming you go the ctype-narrow/widen approach,
the main issue is of course which locale to request. I would say
locale("") (ie the default "system" locale), but there probably needs
to be a once-off method of overriding this.
Does the latest MAC interface have any unicode support?
One thing that might be generically useful is UTF-8 <-> UTF-16 <->
UTF-32 conversion. Not much use for filesystem support seeing as
very few filesystems use these standards (fair enough...they didn't
exist until a few years), but extremely useful for internet based
protocols. The problem is deciding whether you are using wstring as
UTF-16 or UTF-32. Some people on c.l.c++.m have claimed that UTF-16
wouldn't be allowed because wstring isn't supposed to allow any multi-
char characters, but in fact even UTF-32 uses multi-char "combining
sequences" (esp for diacritics), so this argument doesn't hold with
me*. On the other hand UTF-32 is patently excessive expensive for
the vast majority of languages, and probably even the majority of
cases for languages that really do need 4 billion
* See http://www.unicode.org/unicode/faq/char_combmark.html#7
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk