From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-04-13 14:27:52
In article <00c101c42167$8d0a7f40$1b440352_at_fuji>,
"John Maddock" <john_at_[hidden]> wrote:
> > - The standard facets (and the locale class itself, in that it is a
> > functor for comparing basic_strings) are tied to facilities such as
> > std::basic_string and std::ios_base which are not suitable for
> > Unicode support.
> Why not? Once the locale facets are provided, the std iostreams will "just
> work", that was the whole point of templating them in the first place.
I have already gone over this in other posts, but, in short, std::basic_string
makes performance guarantees that are at odds with Unicode strings.
> However I think we're getting ahead of ourselves here: I think a Unicode
> library should be handled in stages:
> 1) define the data types for 8/16/32 bit Unicode characters.
The fact that you believe this is a reasonable first step leads me to believe
that you have not given much thought to the fact that even if you use a 32-bit
Unicode encoding, a character can take up more than 32 bits (and likewise for
16-bit and 8-bit encodings. Unicode characters are not fixed-width data in any
> 2) define iterator adapters to convert a sequence of one Unicode character
> type to another.
This is also not as easy as you seem to believe that it is, because even within
one encoding many strings can have multiple representations.
> 3) define char_traits specialisations (as necessary) in order to get
> basic_string working with Unicode character sequences, typedef the
> appropriate string types:
> typedef basic_string<utf8_t> utf8_string; // etc
This is not a good idea. If you do this, you will produce a basic_string which
can violate well-formedness of Unicode strings when you use any mutation
algorithm other than concatenation, or you will violate performance guarantees
> 7) Anything I've forgotten :-)
I think you have forgotten to read and understand the complexity of Unicode (or
any of the books that discuss the spec less tersely, such as Unicode
Demystified), because I think that some of the suggestions you made here are
incompatible with how Unicode actually works. Please correct me if I am wrong --
I would love to be wrong :-)
> The main goal would be to define a good clean interface, the implementation
> could be:
We can't define a good clean interface until we understand the problems.
-- If this message helped you, consider buying an item from my wish list: <http://web.meeroh.org/wishlist>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk