Subject: Re: [boost] [string] proposal
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-01-21 12:50:36
On Fri, Jan 21, 2011 at 6:25 AM, Matus Chochlik <chochlik_at_[hidden]> wrote:
> Dear list,
> following the whole string encoding discussion I would like
> to make some suggestions.
> >From the whole debate it is becoming clear, that
> instant switch from encoding-agnostic/platform-native
> std::string to UTF-8-encoded std::string is not likely
> to happen.
> Then it was proposed that we create a utf8_t string type
> that would be used *together* (for all eternity) with
> the standard basic_string<>. While I see the advantages
> here, I (as I already said elsewhere) have the following
> problem with this approach:
> Using a name like utf8_t or u8string, string_utf8, etc.
> at least to me (and I've consulted this off the list,
> with several people) suggests, that UTF-8 is still
> something special and IMO also sends the message
> that it is OK to remain forever with the various encodings
> and std::string as it is today. We should *IMO* endorse
> the opposite.
IMO, Any serious Unicode string proposal has to address UTF-8 strings,
UTF-16 strings, UTF-32 strings, and probably UTF strings where the
particular UTF encoding is established at runtime. Applications that
deal with Asian languages, do a lot of random access, or would pay a
performance or storage penalty will demand more than just UTF-8
strings. There might be other variants, too, such as a BMP-string. If
a Unicode string library provides a strong design framework that is
clearly articulated, then an initial implementation would only have to
provide the most needed types; UTF-8 and UTF-16/BMP.
I really doubt any proposal will get taken very seriously is it only
supports one of the UTF encodings.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk