Boost logo

Boost :

Subject: [boost] [rfc] Unicode GSoC project
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-05-12 14:50:55


Hi everyone. I'm in charge of the Unicode Google Summer of Code project.

I have been working on range adaptors to iterate over code points in an
UTF-x string as well as converting back those code points to UTF-y for
the past week and

I stopped working on these for a bit to put together some short
documentation (which is my first quickbook document, so it may not be
very pretty).
This is not a documentation of the final work, but rather that of what
I'm working on at the moment.

I would like to know everyone's opinion of the concepts I am defining,
which assume the range that is being worked on is indeed a valid unicode
range in a particular encoding, as well as the system used to enforce
those concepts.

Also, I put the normalization form C as part of the invariant, but maybe
that should be something orthogonal. I personally don't think it's
really useful for general-purpose text though.

While the system doesn't provide conversion from other character sets,
this can easily be added by using assume_utf32. For example, using an
ISO-8859-1 string as input to assume_utf32 just works, since ISO-8859-1
is included verbatim into Unicode.

The documentation contains as well some introductory Unicode material.

You can find the documentation online here:
http://mathias.gaunard.emi.u-bordeaux1.fr/unicode/doc/html/


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk