Boost logo

Boost :

From: Sundell Software (sundell.software_at_[hidden])
Date: 2005-03-18 09:48:15


On Wed, 16 Mar 2005 18:13:36 +0100, Erik Wien <wien_at_[hidden]> wrote:

> Not entirely, but certainly less that optimal. basic_string (and the
> iostreams) make assuptions that don't neccesarily apply to Unicode text.
> One of them is that strings can be represented as a sequence of equally
> sized characters. Unicode can be represented that way, but that would
> mean you'd have to use 32 bits pr. character to be able to represent all
> the code point assigned in the Unicode standard. In most cases, that is
> way too much overhead for a string, and usually also a waste, since
> unicode code points rarely require more that 16 bits to be encoded. You
> could of course implement unicode for 16 bit characters in basic_string,
> but that would require that the user know about things like surrogate
> pairs, and also know how to correctly handle them. An unlikely scenario.

Looking at the code, it seems to duplicate alot of what basic_string
does. AFAIK, though i haven't looked that close at unicode, you have
two ways of viewing the string. As a string of UTF-* elements(?) and
the other as a string of characters. The former has the same
properties as basic_string, the latter doesn't.

It seems to me then, that a possible design would be to make it a
basic_string and provide special iterators etc that views the string
as characters. This would require the iterator to have a reference to
the basic_string to be able to support assignment. Maybe it would
require whole wrapper class around basic_string to provide the
required functionality.

Rakshasa


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk