From: Daryle Walker (darylew_at_[hidden])
Date: 2004-04-11 17:12:43
On 4/9/04 3:54 AM, "Vladimir Prus" <ghost_at_[hidden]> wrote:
> Daryle Walker wrote:
>> On 4/6/04 3:27 AM, "Vladimir Prus" <ghost_at_[hidden]> wrote:
>>> it seems that Unicode support is the last issue that should be addressed
>>> before the library can be added to CVS. Since the issue is somewhat
>>> tricky, I'd appreciate some comments before I start coding.
>> What about:
>> * There's no guarantee that "char" is based on ASCII
>> * There's no guarantee that "wchar_t" is based on Unicode
>> Since other text-related parts of Boost don't really deal with Unicode
>> issues, maybe you should address it after putting it in CVS.
> It was specifically requested that some Unicode/wchar_t support be added
> before putting to CVS.
That doesn't mean that you _have_ to do it. You can give the person who
gave the request a (temporary) rejection notice.
>> Maybe after
>> discussions on how Unicode can fit in Boost-wide. (Other posts in this
>> thread have admitted that the problem is big and difficult. I don't think
>> it's worth delaying the library over. Sometimes, cool-sounding ideas in
>> the abstract turn out to be bad ones in practice.)
> What 'cool-sounding idea' do you mean? What I proposed was that unicode data
> is just passed though, without modification.
I read messages in this thread about doing full-blown Unicode handling, and
I've read about doing nothing (being as Unicode-ignorant as other
text-processing Boost libraries). I wouldn't mind adding "wchar_t" support,
without necessarily assuming that it's Unicode.
However, the Unicode "problem" is so big that it could take more time and
effort than what you have done on program-options so far. _That_ is what I
don't want to delay the library for. Also, a solution should be applicable
for all of Boost's text libraries, not just this one.
>> Even if you do come up with some grand Unicode plan, you would have to make
>> sure your library works with platforms that don't use ASCII/Unicode.
> Do you know specific case there wchar_t does not implicitly means Unicode.
Not personally, but that's about as relevant as asking for a platform whose
"char" isn't 8 bits. (I've heard platforms like that have existed.) Just
because all the common platforms do it a certain way (and/or there's no
counter-examples) doesn't mean you can portably assume that the common
assumption is all that matters. The identities and code-points of the
members of the (narrow and wide) character sets are implementation-defined.
The C++ parser allows characters to be named by their ISO-Unicode number,
but it's supposed to be mapped to the platform's code-point for that
character, not necessarily maintained in Unicode.
-- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com