Boost logo

Boost :

From: Edward Diener (eddielee_at_[hidden])
Date: 2004-10-19 00:38:53


Erik Wien wrote:
> Hi. I am in the process of planning a library for handling unicode
> strings
> in C++, and would like to probe the interest in the boost community
> for something like that. I read through the unicode dicussion that
> was up back
> in april, and from what I could gather there was some amount of
> interest,
> but no one felt comfortable taking on the task as of yet.
>
> I am hoping to be able to run this project as my Bachelor's Thesis in
> Computer Engineering (Not sure if that is the correct translation from
> Norwegian.) and if it gets approved by my college, myself and two
> other programmers will spend one semester working exclusively on
> this. (of course in collaboration with the boost community) At the
> end of that semester I
> hope the library (Or at least parts of it) will be in such a state it
> can submitted for review by boost.
>
> The library should ultimately have suppport for at least basic
> handling of unicode strings (in all encodings), collation of strings
> and other locale specific operations. The library should also be (to
> the extent that is possible) integrated with the standard C++ library
> (and boost) to get as
> much functionality as possible "for free". I'm here thinking of,
> among other things, the std::locale class and compabillity with
> iostreams. How these requirements are fulfilled will be determined as
> the project (hopefully) moves forward.

A few points you probably already know:

1) Wide characters and Unicode characters are not necessarily the same thing
for any given implementation.
2) There are quite a few Unicode encodings.
3) The idea is to be able to plug in a Unicode encoding into the same
standard library templates and boost templates which now support 'char' and
wchar_t'. In other words ideally you want to treat your Unicode encoding as
just another character type, with extra smarts depending on the encoding.
The extra smarts would be used in specializations.

In the past in comp.std.c++ I attempted to promote the idea that all
standard library functionality which dealt generally in characters and
strings should be parameterized on the character type for the sake of
orthogonality and the future. While most are, there is still some
functionality which does not, ie exceptions and file names and locale
message files, and assume that only narrow characters exist in its usage. I
am still amazed that programmers from countries which would normally use
wide characters as Unicode encodings, such as the Japanese, have not made
more of an issue with this, but perhaps they are so used to their far more
difficult DBCS roots that pursuing wide characters everywhere, much less a
real Unicode encoding, is a minor issue with them.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk