Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ looklikeinthe future [was Always treat ... ]
From: Robert Ramey (ramey_at_[hidden])
Date: 2011-01-19 15:56:07


Peter Dimov wrote:
> Alexander Lamaison wrote:
>> I was under the impression that Linux changed from interpreting
>> char* as being in a multitude of different encodings to being in
>> UTF-8 by default.
>
> Well, it probably depends on what part of Linux we're talking to, but
> most of the functions do not interpret char* as being in any encoding,
> neither do they have a default. They just treat it as a byte sequence.

hmmm - that's what I always considered std::string to be. There's
no notion of locale in there.

I'm still not seeing why we can't continue to consider std::string
just a sequence of bytes with some extra sauce ..

... and make a new class utf8_string .. derived from which which includes
a code point iterator, a function to return a utf8 "character or codepoint
or whatever it is".

I just can't see anything wrong with this. It doesn't redefine the
sematics (formal, intuitive, common usage) of std::string, utf8_string would
let one use the special unicode sauces when needed. And it could
be implicitly converted to std::string when passed as a function
argument. Finally, given the history of this, I don't believe utf8 is the
"end of the road". It still leaves open the possibility of the next
greatest thing - whatever that turns out to be. To summarize:

std::string - a sequence of bytes
utf8_string - a sequence of "code points" implemented in terms of
std::string.
(or at least convertible to std::string)

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk