|
Boost : |
Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Edward Diener (eldiener_at_[hidden])
Date: 2011-01-19 12:18:48
On 1/19/2011 11:33 AM, Peter Dimov wrote:
> Edward Diener wrote:
>> Inevitably a Unicode standard will be adapted where every character of
>> every language will be represented by a single fixed length number of
>> bits.
>
> This was the prevailing thinking once. First this number of bits was 16,
> which incorrect assumption claimed Microsoft and Java as victims, then
> it became 21 (or 22?). Eventually, people realized that this will never
> happen even if we allocate 32 bits per character, so here we are.
"Eventually, people realized..." . This is just rhetoric, where "people"
is just whatever your own opinion is.
I do not understand the technical reason for it never happening. Are
human "alphabets" proliferating so fast that we can not fit the notion
of a character in any alphabet into a fixed size character ? In that
case neither are we ever going to have multi-byte characters
representing all of the possible characters in any language. But it is
absurd to believe that. "Eventually people realized that making a fixed
size character representing every character in every language was doable
and they just did it." That sounds fairly logical to me, aside from the
practicality of getting diverse people from different
nationalities/character-sets to agree on things.
Of course you can argue that having a variable number of bytes
representing each possible character in any language is better than
having a single fixed size character and I am willing to listen to that
technical argument. But from a programming point of view, aside from the
"waste of space" issue, it does seem to me that having a fixed size
character has the obvious advantage of being able to access a character
via some offset in the character array, and that all the algorithms for
finding/inserting/deleting/changing characters become much easier and
quicker with a fixed size character, as well as displaying and inputting.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk