|
Boost : |
Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-20 09:21:15
On Thu, 20 Jan 2011 00:05:47 -0800
Patrick Horgan <phorgan1_at_[hidden]> wrote:
>>> Inevitably a Unicode standard will be adapted where every character
>>> of every language will be represented by a single fixed length
>>> number of bits. [...]
>>
>> I'm no Unicode expert, but the reason this hasn't happened might be
>> combinatorial explosion. In which case it might never happen. But I
>> could well be wrong. And I hope I am, the design you outline is
>> something I'd love to see.
>
> It's already here and has been for a long time. That's just UCS
> encoded as UTF-32. [...]
The problem, in my uninformed view of it, is the idea of combining
characters. Any time you can have a single character that requires more
than one code-point, you can't assume that a fixed number of bits will
be able to represent every character.
I may be wrong, and I hope I am. If a character is guaranteed never to
consist of more than X code-points, it would be simple to offer a
fixed-width character type, even if the width is huge by comparison to
the eight-bit char type. But from what I've seen, I don't think that's
the case.
-- Chad Nelson Oak Circle Software, Inc. * * *
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk