From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-04-13 14:32:04
In article <87ptabsq2h.fsf_at_[hidden]>, Jeremy Maitin-Shepard <jbms_at_[hidden]>
> What it comes down to is that basic_string is designed with fixed-width
> character representations in mind.
> I would be more in favor of creating a separate type to represent
> Unicode strings.
I agree with this completely.
> > 4) define low level access to the core Unicode data properties (in
> > unidata.txt).
> Reuse of the ICU library would probably be very helpful in this.
> > 5) Begin to add locale support - a big job, probably a few facets at a
> > time.
> The issue is that, despite what you say, most or all of the standard
> library facets are not suitable for use with Unicode strings. For
> instance, the character classification and toupper-like operations need
> not be tied to a locale. Furthermore, many of the operations such as
> toupper on a single character are not well defined, and rather must be
> defined as a string to string mapping. Finally, the single-character
> type must be a 32-bit integer, while the code unit type will probably
> not be (since UTF-32 as the internal representation would be
You are forgetting that abstract Unicode characters are defined as sequences of
code points (even if those code points are 32-bit) and string manipulation has
to take this into account (there are numerous combinations of characters and
combining marks that must be treated as single units for purpose of searching,
collation, etc.) A single encoded character type may be 32 bits, but encoded
characters are often not the level on which the clients need to manipulate
I am happy to see that there is someone here who knows more about locales in C++
than I do; I haven't had the time to research that as thoroughly as I would like
-- If this message helped you, consider buying an item from my wish list: <http://web.meeroh.org/wishlist>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk