Boost logo

Boost :

From: Jan Langer (jan_at_[hidden])
Date: 2002-03-04 06:12:27


On Mon, 4 Mar 2002, dylan_nicholson wrote:
>--- In boost_at_y..., Jan Langer <jan_at_l...> wrote:
>> the filesystem library will need a mechanism for converting
>different
>> char types into other char types. the same problem also occurs with
>> basic_string. i think this is a quite genaral case and it is worth a
>> general solution. an other example is a program using wstring and
>> wanting to print to a normal char-stream.
>> a reasonable way of solving this is to get a ctype facet and to
>apply
>> narrow or widen to the character.
>>
>However you can't assume that will do the correct mapping.

yes, just yes. it only does narrow or widen. if this is the correct
thing depends on the platform and the application.

>The same is essentially true under NT except that of course NT *can*
>handle both Unicode and MBCS filenames internally, so there really
>should be no need for library code to do any conversions.
>In fact MS do provide a "Unicode" layer for Win9x that does these
>too, and it would not be (IMHO) unreasonable to simply require that
>if you *wish* to use std::wstring to hold filenames and you want to
>support Win9x then you must use MS's supplied library (as far as I
>understand it, you simply download it, link it in your application,
>and redistribute the DLL it with your application - it has some magic
>to continue working correctly under NT). That way at least for the
>Win32 implementation *no* wstring <-> string conversions should be
>needed.

yes, so the win32 implementation should not use string_cast. only if an
application thinks that it will need conversion by means of ctype's
narrow and widen, it will use string_cast

>For POSIX however, assuming you go the ctype-narrow/widen approach,
>the main issue is of course which locale to request. I would say
>locale("") (ie the default "system" locale), but there probably needs
>to be a once-off method of overriding this.

on default the cast take locale () but another locale can be passed as
second parameter
  string_cast <char> (L"äÖüß", std::locale ("de_DE"));
does the correct thing on my platform.

>One thing that might be generically useful is UTF-8 <-> UTF-16 <->
>UTF-32 conversion. Not much use for filesystem support seeing as
>very few filesystems use these standards (fair enough...they didn't
>exist until a few years), but extremely useful for internet based
>protocols. The problem is deciding whether you are using wstring as
>UTF-16 or UTF-32. Some people on c.l.c++.m have claimed that UTF-16
>wouldn't be allowed because wstring isn't supposed to allow any multi-
>char characters, but in fact even UTF-32 uses multi-char "combining
>sequences" (esp for diacritics), so this argument doesn't hold with
>me*. On the other hand UTF-32 is patently excessive expensive for
>the vast majority of languages, and probably even the majority of
>cases for languages that really do need 4 billion
>different "characters".

i'm no character-encoding-expert at all. i just want a proper solution
to convert chars. perhaps it would be useful to provide a third template
parameter which does the actual conversion. on default it would be the
narrow-widen-string_conversion-class.

-- 
jan langer ... jan_at_[hidden]
"pi ist genau drei"

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk