Boost logo

Boost :

Subject: Re: [boost] [gsoc] unicode tools and an unicode string type
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-03-30 13:06:39


Stefan Seefeld wrote :

> If this is indeed going to be more about design than actual code, I'd
> suggest to make the review and discussion of previous attempts (that did
> have broad support in this community), such as the already mentioned
> work by Erik Wien
> (see http://lists.boost.org/Archives/boost/2004/10/74349.php) a central
> part of the work.

I haven't read the whole thread, since it's quite big and I am limited
in time at the moment.
I was never able to get the final code of Erik Wien, too.

I have gone through most of the unicode threads, and have noted most of
the issues that were raised.

Note that my proposal is somewhat more restricted than Erik's.
I'm not planning to do any locale-specific work, no collation support,
and no integration with codecvt facets, or standard locales.

Locale support could eventually be added given time (and I'd personally
rather do it using a custom-made locale system, like ICU does), but it
was asked to restrict the scope of the library.

It will be purely iterators (or rather, ranges) and algorithms, which
are quite more simple to deal with that the whole standard locale subsystem.

On top of that will be layered an unicode string type, which is nothing
more than a glorified container wrapper with eventual type erasure which
purpose is to maintain invariants and thus accurately represent an
unicode string.

It's really aimed at being simple and non-intrusive. Components are
fairly separate and code is thus incremental, and the unicode string
just composes the work.

I personally believe basic_string, char_traits, and codecvt facets and
the standard locale system are not really suitable to deal with unicode,
which may have been the reason why previous proposals ended up they way
they did.
I think some people said the same in the various unicode discussions, too.

> As I already mentioned, I don't think a top-down approach is a good idea
> in this case, but it would be especially bad if all it did was to add
> yet another item to this
> potentially-good-but-unimplemented-unicode-designs bag.

Efficient algorithms are provided by the Unicode consortium, so it's
mostly just the design or glue code that needs work.
The glue depending on what integration with other components is being
done. Here, it's mostly just range concepts.

Furthermore, assuming design is what matters the most for that project,
the documentation in itself would be integral part of the project.
Lack of good documentation may be the reason why some previous unicode
projects failed, too.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk