Boost logo

Boost :

From: Alberto Barbati (abarbati_at_[hidden])
Date: 2002-12-05 15:02:29


Vladimir Prus wrote:
> First interpretation is that you're interested in support for
> different Unicode encodings, via appropriate facets. Then
> Alberto Barbati is the last person who touches this matter,
> in
> news://news.gmane.org:119/aq72e4$pog$1@main.gmane.org
>
> I assume he's holding a lock on implementation work. Alberto,
> did you get anywhere?

Yes, despite the clear lack of interest from Boosters about this issue,
I'm still working on it ( but I don't have any "lock" ;) ).

I had a few problems with the interpretation of the standard, but thanks
to a few guys from comp.std.c++ I can now say that I have a working
implementation of facets to converts from UTF-8/16/32 (external) to
UTF-16/32 (internal), with endian variants, a total of 10 facets. The
implementation fulfill a basic suite of tests on VS.NET with both the
native STL and STLport.

The facets are conformant to Unicode 3.2 requirements about
non-characters, use of surrogates and non-shortest UTF-8 sequences.
After a private discussion with a field expert, I decided to drop the
UCS-2 facets, so surrogate support is no longer optional. I also decided
to drop facets with UTF-8 as the internal encoding because they are not
very useful and the current wording of the C++ standard de facto
disallows a portable implementation :(. I hope the LWG would consider
clarifying the issue.

My next steps would be to polish the code, write the docs and prepare a
more complete test suite. If everything goes well, I think I could
submit the library for review by the end of the month.

> Second interpretation is conversion between all the 8-bit encodings
> out there. E.g. from koi8-r to windows-1251. Since there's GNU
> iconv already, I'd rather see a tiny wrapper over it. (GNU iconv works
> on Windows, too).

Here things become more complex. UTF conversions are just algorithmic
stuff, easy to do. Other conversions like koi8-r o windows-1251 require
look-up tables and simply gathering the data for all of them will be
equivalent to rewriting a part of ICU, which is a huge piece of work.

The idea of wrapping ICU is very interesting. However the Boost policy
explictly disallows dependencies from external libraries, so this
solution is out of discussion. Moreover, the only things ICU is missing
are the conversion facets. I don't see any reason to wrap anything else.
Unfortunately, as I said before, not all conversions can be portably
expressed as a facet with the current C++ standard, so even writing
wrapping facets has little meaning.

Alberto Barbati


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk