Boost logo

Boost :

Subject: Re: [boost] [C++0x] Emulate C++0x char16_t, char32_t, std::u16string, and std::u32string
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-07-20 21:56:10


On Wed, Jul 20, 2011 at 6:12 PM, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
>> ...
>> Does this make sense?
>>
>> --Beman
>>
>
> No, you can't emulate them.

Sorry, but people have been doing this for at least ten years,
although the specific type names used vary.

> Emulation of char16_t/char32_t is useless for any real use.

People have been using char16_t/char32_t or equivalent to handle
UTF-16/UTF-32 for years. It does work, it is useful, it is used in
production code, and Boost.Filesystem users are asking for it. So
there are existence proofs that this sort of emulation does work for
enough purposes to be quite useful.

> You can't create working
>
>  std::basic_ostringstream<char16_t> stream;
>
> Because stream << 1245 would not work due to lack of std::locale facets.
>
> You can't create requires facets as for example they are specialized
> in many standard libraries.

Emulation via simple uint16_t and uint32_t typedefs doesn't work for
all use cases. So only use it when it does work.

> Even existing Microsoft's VC2010 does not work if you compile application
> with /MD or /MDd

I'll retest just to be sure, but I'm fairly sure that some of my tests
have used those switches.

> Note: char, wchar_t, char16_t and char32_t are much more then basic types
> that can be distinguished, they bring character information with them.
>
> If you want to represent a UTF-16 or UTF-32 code unit just use uint16_t
> or uint32_t, like for example ICU does for UChar and Qt does for QChar,
> but this isn't something that suppose to work with standard library
> in place where characters exists.

Ummmm? Did you look at the attachment? That's what it does. Uses
uint16_t and uint32_t if the compiler does not supply the new
character types, otherwise just uses the supplied standard library
unchanged.

> Also for File system? Please, don't try make it more complicated then it
> is now.

There is no change to the filesystem interface; char16_t and char32_t
were designed into V3 right from the beginning. It is just a case of
adding char16_t/char32_t overloads to some implementation code. (That
may not be entirely correct for POSIX systems when the native char
encoding for filenames is not UTF-8. I'm just about to work that out.)

> You want to make boost.filesystem better? Make it use UTF-8 on Windows
> by default and drop all "wide-crap" (sorry windows users).

Well, that certainly would be exciting:-) But more seriously,
Boost.Filesystem and the C++ standard library are designed to work
with native encodings as well as UTF-8, UTF-16, and UTF-32. Users will
do what best serves their interests, and means a plethora of encodings
for years and years to come. Get over it.

> All operating systems around (with one exception) use char * API and
> one operating system uses utf-16/wchar_t API.
>
> So adding arbitrary character that no operating system uses seems
> to be waste of effort.

The issue isn't what the operating system uses, it is what users want
and the standard library demands. We are moving from a C++ world that
only supports char and wchar_t to a C++ world that supports char,
wchar_t, char16_t, and char32_t.

> I **personally** don't see any benefit in adding char16_t/char32_t emulation
> to the Boost and specialty to the Boost.Filesystem.
>
> Today Boost.Filesystem has enough problems besides char16_t/char32_t.

Lack of char16_t/char32_t support is seen as a problem by some users,
plus I'm working on that portion of the code now in a effort to clear
tickets released to locale, codecvt, and character encoding issues.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk