Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2000-08-22 21:11:48


on 8/22/00 8:49 AM, Karl Nelson at kenelson_at_[hidden] wrote:

[SNIP]
>> To simplify matters, let's use the popular computer convention of "char"
>> being one octet and "wchar_t" being two octets in size. If they're not
>> those sizes, use more appropriate types (look at cstdint.hpp). The UTF-8
>> stream you have works in "char," but you should really be working with the
>> final representation in "wchar_t;" by the time we get to your formatting
>> problem, the UTF-8 stuff should already be taken care of with your
>> formatting solution ignorant of it.
>
> Fortunately, UTF-8 is close enough to ASCII that my code is basically
> ignorant of it. If this was Shift-JIS or some other format where
> the meaning of the text may change based on switches then the format
> would likely have to be more aware.

Is the code you're wrapping based on ASCII and accepts Unicode via UTF-8
tricks, or is it Unicode-based that inconveniently works in UTF-8? I
thought it was the latter (and my solution assumes such), but you speak as
it is the former. If it is the former, you luck out that all the format
commands are regular low-ASCII characters, so confusion between a command
and an UTF-8 part is minimized.

But the format problem is still separate from the UTF-8 stuff. Your format
code could be used with, for example, a non-ASCII/Unicode system.

>> To get the true Unicode stream, make a basic_streambuf based of "wchar_t"
>> and (possibly) a Unicode-customized char_traits class. Have this streambuf
>> take any "char"-based streambuf as a parameter member. Your custom
>> streambuf should read "char" data from the inner streambuf, assumed to be
>> UTF-8 octets, and convert it to Unicode "wchar_t" data. Make the reverse
>> provisions for writing. You could also add a locale with "wchar_t"-based
>> and Unicode-customized facets.
>
> Neither of these are good solutions at this time in gcc/Unix world.
> There are other STL implementations that you can drop in but
> the STL the majority of users have is totally defective. This
> is temporary in that gcc within the next year will have charT streams.
> (or at least that is my current understanding.)
[TRUNCATE]

Maybe you should just drop the users with the "bad" default standard
library. I think the guy doing the regex stuff dumped people without
exception handling. There is only so much bending backwards we can do; we
shouldn't greatly inconvenience more-compliant compilers for the sake of a
broken, but popular, compiler. Fortunately, your users are using the free
GCC and can drop in improved & free STL's. At least the GCC guys are aware
of the problem and are trying to fix it. (This list is talking about a
compiler for another platform that has a worse form of this problem, and
[worse] the creators don't seem to care about fixing it.)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk