|
Boost : |
Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Patrick Horgan (phorgan1_at_[hidden])
Date: 2011-01-19 23:29:58
On 01/19/2011 08:33 AM, Peter Dimov wrote:
> Edward Diener wrote:
>> Inevitably a Unicode standard will be adapted where every character
>> of every language will be represented by a single fixed length number
>> of bits.
>
> This was the prevailing thinking once. First this number of bits was
> 16, which incorrect assumption claimed Microsoft and Java as victims,
> then it became 21 (or 22?). Eventually, people realized that this will
> never happen even if we allocate 32 bits per character, so here we are.
At 32 bits we can encode all current languages, all extinct languages,
Klingon, and still have most the space empty. You might want to read
the Unicode spec which talks clearly about this. If you just read
through the end of Chapter 6 you'll have a great overall understanding
of Unicode. It's available as a compressed pdf file at:
http://www.unicode.org/versions/Unicode5.2.0/UnicodeStandard-5.2.zip
Patrick
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk