Boost logo

Boost :

From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-04-14 20:05:06


In article <022101c4220f$a0c363f0$a8500352_at_fuji>,
 "John Maddock" <john_at_[hidden]> wrote:

> > > > - The standard facets (and the locale class itself, in that it is a
> > > > functor for comparing basic_strings) are tied to facilities such as
> > > > std::basic_string and std::ios_base which are not suitable for
> > > > Unicode support.
> > >
> > > Why not? Once the locale facets are provided, the std iostreams will
> "just
> > > work", that was the whole point of templating them in the first place.
> >
> > I have already gone over this in other posts, but, in short, std::basic_string
> > makes performance guarantees that are at odds with Unicode strings.
>
> Basic_string is a sequence of code points, no more no less, all performance
> guarentees for basic_string can be met as such.

If all you want basic_string for is a sequence of code points, you should use a
vector<codePointT> instead, as vector does not provide additional methods that
would be at best deceptive and at worst dangerous when applied to Unicode
strings.

> Iterator adapters for normalisation / composition / compression would also be
> useful additions.
>
> Likewise adapters for iterating "characters" and "glyphs".

Leaving compression out, as I don't see what it has to do with Unicode strings
per se, I don't think they would be useful additions, I think they would be
required in order a boost Unicode library to meet my expectations.

> Working on sequences of code points always requires care: clearly one could
> erase a low surrogate and leave a high surrogate "orphanned" behind for
> example. One would need to make it clear in the documention that potential
> problems like this can occur.

It is precisely because this interface is dangerous that I believe that it
should not be the default interface to a Unicode string. It is rarely useful and
often harmful. It does not make it easy to do things right.

> Unicode is such a large and complex issue, that it's actually pretty hard to
> keep even a small fraction of the issues in ones mind at a time, hence my
> suggestion to split the issue up into a series of steps.

The problem is that I think that some of the steps you propose do not take us in
the direction of a useful Unicode string abstraction in boost, but merely
provide convenient wrappers for the simple problems without tackling the
complicated problems. I don't have a problem with solving simple problems first,
but I would like to have a reason to believe that solving those simple problems
gets us closer to solving the hard problems at a later time; I am not convinced
the approach you proposal fits that bill.

meeroh

-- 
If this message helped you, consider buying an item
from my wish list: <http://web.meeroh.org/wishlist>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk