Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-20 11:52:58


On Thu, Jan 20, 2011 at 3:33 PM, Chad Nelson
<chad.thecomfychair_at_[hidden]> wrote:
>
>>>> Besides the ugly name and that is a new class ? No :)
>>>
>>> If you can think of a more-acceptable-but-still-descriptive name for
>>> it, I'm all ears. :-)
>>
>> I have an idea: what about boost::string, which could possibly become
>> the next std::string in the future.
>
> And string16 and string32? We'll have to support UTF-32, as the
> single-codepoint-per-element type, and UTF-16 (distasteful though it
> may be) is needed for Windows.
>
> Or are you suggesting the utf* types in addition to the boost::string
> type? If so, I believe the idea has merit.

If boost::string uses utf-8 by default and I will be able to do
sed 's/boost::string/std::string/g' with all my sources at some point
in the distant future without breaking them (completely) we can have
string16, string32, string_ucs2, string_ucs4, etc. for all I care :-).

I am not against alternative string representations and encodings,
but I would like to finally see a string class, which I can for example
write to a file on a Windows machine with cp1250 and open it
on Linux with utf-8 without doing explicit transcoding, which
allows to do true code-point and character iteration, supporting
the essential algorithms (it is open for debate which ones), which
I can use as a type for parameters of my functions and member
variables, etc.

>>
>> OK, if the long term plan is:
>>
>> 1) design and implement boost::string using UTF-8 doing all the things
>> like code-point iteration, character iteration, convenience stuff like
>> starts-with, ends-with, replace, trim, etc., etc. with as much
>> backward compatibility with std::string as possible without hindering
>> progress
>>
>> 2) try really hard to push it to the standard
>>
>> then I'm on board with that.
>
> Some of those could be problematic (I've run across references implying
> that 0x20 isn't the universal word-separation character, so trim would
> at least need some extra parameters), but for the most part, I'd agree
> with it.

This is *exactly* why I would like to see them in a standard string
(or string manipulation library) , designed and implemented by true
experts and not reinvented by an "expert" like me :)

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk