Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Patrick Horgan (phorgan1_at_[hidden])
Date: 2011-01-21 20:47:14


On 01/21/2011 09:50 AM, Beman Dawes wrote:
> ... elision by patrick ....
>
> IMO, Any serious Unicode string proposal has to address UTF-8 strings,
> UTF-16 strings, UTF-32 strings, and probably UTF strings where the
> particular UTF encoding is established at runtime. Applications that
> deal with Asian languages, do a lot of random access, or would pay a
> performance or storage penalty will demand more than just UTF-8
> strings. There might be other variants, too, such as a BMP-string. If
> a Unicode string library provides a strong design framework that is
> clearly articulated, then an initial implementation would only have to
> provide the most needed types; UTF-8 and UTF-16/BMP.
>
> I really doubt any proposal will get taken very seriously is it only
> supports one of the UTF encodings.

+1 with the caveat that UTF-8 and UTF-32 is considered by many to be the
most needed types with UTF-16 considered evil. (Seems to be a
Windows/non-Windows split. I like them all;) So all three (four if you
want to differentiate between fixed-width UTF-16/BMP (really UCS-2) and
the full UTF-16) would be needed to avoid people saying that it doesn't
fill their needs so why did we bother. The UTF string with run-time
would carry a lot of extra code. Wouldn't a programmer know which he
wanted to use internally at compile time?

Patrick

p.s. Nice quick description of the differences between and history of
UCS-2 UCS-4 utf-8 utf-16 utf-32 at
http://en.wikipedia.org/wiki/Universal_Character_Set


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk