Boost logo

Boost :

From: Cory Nelson (phrosty_at_[hidden])
Date: 2008-03-10 16:31:29


On Mon, Mar 10, 2008 at 1:10 PM, Graham <Graham_at_[hidden]> wrote:
> Sebastian,
>
>
>
> As Unicode characters that are not in page zero can require more than 32
> bits
>
> to encode them [yes really] this means that one 'character' can be very
> long

Unicode defines codepoints from 0 to 10FFFF - this can be encoded with
32 bits in UTF-8 and UTF-16.

> in UTF-8/16 encoding. It is even worse if you start looking at
> conceptual
>
> characters [graphemes] where you can easily have three characters make
> up a
>
> conceptual character.
>

Normalization support would be nice, but is a huge task that is out of
scope of the library (imho). This is where you have to decide if you
want a full blown Unicode library or just a small codec.

-- 
Cory Nelson

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk