Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-01-21 12:50:36


On Fri, Jan 21, 2011 at 6:25 AM, Matus Chochlik <chochlik_at_[hidden]> wrote:
> Dear list,
>
> following the whole string encoding discussion I would like
> to make some suggestions.
>
> >From the whole debate it is becoming clear, that
> instant switch from encoding-agnostic/platform-native
> std::string to UTF-8-encoded std::string is not likely
> to happen.
>
> Then it was proposed that we create a utf8_t string type
> that would be used *together* (for all eternity) with
> the standard basic_string<>. While I see the advantages
> here, I (as I already said elsewhere) have the following
> problem with this approach:
>
> Using a name like utf8_t or u8string, string_utf8, etc.
> at least to me (and I've consulted this off the list,
> with several people) suggests, that UTF-8 is still
> something special and IMO also sends the message
> that it is OK to remain forever with the various encodings
> and std::string as it is today. We should *IMO* endorse
> the opposite.

IMO, Any serious Unicode string proposal has to address UTF-8 strings,
UTF-16 strings, UTF-32 strings, and probably UTF strings where the
particular UTF encoding is established at runtime. Applications that
deal with Asian languages, do a lot of random access, or would pay a
performance or storage penalty will demand more than just UTF-8
strings. There might be other variants, too, such as a BMP-string. If
a Unicode string library provides a strong design framework that is
clearly articulated, then an initial implementation would only have to
provide the most needed types; UTF-8 and UTF-16/BMP.

I really doubt any proposal will get taken very seriously is it only
supports one of the UTF encodings.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk