Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2004-04-11 17:12:43

On 4/9/04 3:54 AM, "Vladimir Prus" <ghost_at_[hidden]> wrote:

> Daryle Walker wrote:
>> On 4/6/04 3:27 AM, "Vladimir Prus" <ghost_at_[hidden]> wrote:
>>> it seems that Unicode support is the last issue that should be addressed
>>> before the library can be added to CVS. Since the issue is somewhat
>>> tricky, I'd appreciate some comments before I start coding.
>> What about:
>> * There's no guarantee that "char" is based on ASCII
>> * There's no guarantee that "wchar_t" is based on Unicode
>> Since other text-related parts of Boost don't really deal with Unicode
>> issues, maybe you should address it after putting it in CVS.
> It was specifically requested that some Unicode/wchar_t support be added
> before putting to CVS.

That doesn't mean that you _have_ to do it. You can give the person who
gave the request a (temporary) rejection notice.

>> Maybe after
>> discussions on how Unicode can fit in Boost-wide. (Other posts in this
>> thread have admitted that the problem is big and difficult. I don't think
>> it's worth delaying the library over. Sometimes, cool-sounding ideas in
>> the abstract turn out to be bad ones in practice.)
> What 'cool-sounding idea' do you mean? What I proposed was that unicode data
> is just passed though, without modification.

I read messages in this thread about doing full-blown Unicode handling, and
I've read about doing nothing (being as Unicode-ignorant as other
text-processing Boost libraries). I wouldn't mind adding "wchar_t" support,
without necessarily assuming that it's Unicode.

However, the Unicode "problem" is so big that it could take more time and
effort than what you have done on program-options so far. _That_ is what I
don't want to delay the library for. Also, a solution should be applicable
for all of Boost's text libraries, not just this one.

>> Even if you do come up with some grand Unicode plan, you would have to make
>> sure your library works with platforms that don't use ASCII/Unicode.
> Do you know specific case there wchar_t does not implicitly means Unicode.

Not personally, but that's about as relevant as asking for a platform whose
"char" isn't 8 bits. (I've heard platforms like that have existed.) Just
because all the common platforms do it a certain way (and/or there's no
counter-examples) doesn't mean you can portably assume that the common
assumption is all that matters. The identities and code-points of the
members of the (narrow and wide) character sets are implementation-defined.
The C++ parser allows characters to be named by their ISO-Unicode number,
but it's supposed to be mapped to the platform's code-point for that
character, not necessarily maintained in Unicode.

Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com

Boost list run by bdawes at, gregod at, cpdaniel at, john at