Boost logo

Boost :

From: Alberto Barbati (abarbati_at_[hidden])
Date: 2003-01-05 21:52:28


Hi Boosters,

I have put in the Boost file section the first version of my UTF
library. You can find it here:

http://groups.yahoo.com/group/boost/files/utf/

A couple of months ago, I posted a message to check if there was
interest in such a library and I got just one answer from Vladimir Prus
(hi, Vladimir!). I hope that in front of a full-blown and working
library I might get more attention.

What you will find in the library:

* codecvt facets for the following external encodings: UTF-8, UTF-16LE,
UTF-16BE, UTF-32LE, UTF-32BE. The facets are templated, in order to
avoid any reference to the platform wchar_t type (if present).

The internal encoding can be either UTF-16 or UTF-32. A convenience
interface is provided to automatically select the internal encoding
according to the size (2 or 4 bytes) of the character type used internally.

The facets will perform correct handling of the following Unicode features:

   - all 17 character planes
   - non-characters (U+XFFFE, U+XFFFF, U+FDD0 - U+FFEF)
   - UTF-16 surrogates pairs (both externally and internally)
   - UTF-8 non-shortest forms (externally)

* a convenience interface to autodetect the correct facet according the
file signature (BOM)

* a comprehensive test suite (with Jamfile)

* a little example (with Jamfile)

What you won't find in the library:

* documentation :-( I'm working on it!!! I swear. Give me some more
time! (and a little feedback)

* facets for UCS-2 or UCS-4 (these encoding are very similar to UTF-16
and UTF-32 but are *not* the same!)

* facets that uses UTF-8 internally (this is too complex and won't work
portably, believe me!)

Compatibility

The test suite and the example have been tested with VS.NET with both
the native STL and STLport 4.5.3. However, STLport have major bugs in
the codecvt interface and in the basic_filebuf implementation, so in
order to compile and run the wchar_t and uint16_t tests you need to
apply a patch that is provided with the library and *rebuild* STLport.
The uint32_t test won't compile in any case due to an incomplete
implementation of the entire locale suite (I am going to contact Boris
Fomitchev in order to see how we can make a patch).
The test suite will compile and run correctly even in presence of the
/Zc:wchar_t option (that's why there are a wchar_t and a uint16_t test
in the first place).

The facets that have UTF16 internally were a major challenge. I provided
two different implementations. The default one is a "compatibility" one
that should work with most STL implementations (including VS.NET and
STLport that have a minor flaw in them :-( ). The other one should be a
little more performant but I don't know on how many compiler it will
work. The alternative implementation can be selected from file
config.hpp. In that file you can also find a #define that should be
changed if your implementation correctly implements Library Issue 75
about the prototype of function do_length().

I hope you enjoy this library and find it useful. According to the
feedback I receive, I will go on writing a decent document in view of a
formal submission.

Thanks in advance for your time and help,

Alberto Barbati


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk