Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-21 09:06:10


On Fri, Jan 21, 2011 at 2:35 PM, Alexander Lamaison <awl03_at_[hidden]> wrote:
>>
>> Why not boost::string (explicitly stating in the docs that it is UTF-8-based) ?
>> the name u8string suggests to me that it is meant for some special case
>> of character encoding and the (encoding agnostic/native) std::string
>> is still the way
>> to go.
>
> That was the idea, was it not?  We should be encoding agnostic wherever
> possible.

If we can (globally) agree upon an encoding, that will be able
to handle all imaginable writing systems, will be robust, etc., etc.
we *will* end up being encoding agnostic.

Today, what is called 'encoding agnostic' causes many
problems. For example you save a file with name containing
non-ASCII characters, even if it is latin with some accents,
on one version of Windows and you ship it to a machine with
another version of Windows using a different encoding
the name becomes garbled.

Same thing with applications
that use text files to exchange information. Either you
pick a single encoding and stick to that, or you use what
is the current platforms native encoding is and do the encoding
detection and transcoding on demand, and usually you loose
some information in the process. In both cases you have to
transcode the text explicitly. I don't see (besides support for
legacy SW/HW) why so many people are saying that this is OK.

>
>> IMO we should send the message that UTF-8 is
>> "normal"/"(semi-)standard"/"de-facto-standard"
>> and the other encodings like the native_t (or even ansi_t,
>> ibm_cp_xyz_t, string16_t,
>> string32_t, ...) are the special cases and they should be treated as such.
>
> Why?  When a string doesn't need to be converted, why force it to be?

Already on many platforms you won't have to do any transcoding
precisely because those platforms have already adopted a single
encoding: UTF-8. I can't imagine why any new SW would choose
anything else besides Unicode for text representations and to
support legacy apps and/or hardware that accepts commands
or prints output in a specific encoding there are tools like iconv.

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk