Boost logo

Boost :

Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Esben Mose Hansen (boost_at_[hidden])
Date: 2009-02-14 06:48:38


On Saturday 14 February 2009 11:53:20 Graham wrote:
> Using UTF-8 can work well if you are only targeting American and Western
> Europe for non-literary use.
>
> If you need to support the rest of the world you really need to move to
> UTF-32 due to the large number of characters and the grapheme and glyph
> handling [e.g. in Urdu you can type 3 characters and they are displayed
> as a single combined glyph, and the cursor should never be placed
> between them].

I think you have gotten something mixed up. UTF-8 and UTF-32 (aka UCS4) are
just two encodings of the same character set, including the combining you
mentioned (which are really not that uncommon, e.g. mêlée contains 2
characters which could be written by combining glyphs. In practical terms,
UTF-32 is somewhat useless. (A case might be made for UTF-16, though)

-- 
Kind regards, Esben

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk