Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-26 04:54:16

On Wed, Jan 26, 2011 at 10:37 AM, Yakov Galka <ybungalobill_at_[hidden]> wrote:
> Excuse my ignorance, but can someone explain to me why people are so keen on
> immutable strings? Aren't they basically the same as 'shared_ptr<const
> std::string>'?

I'm fairly neutral on the immutability issue, I do not oppose it if
someone shows why it is a superior design, provided it does not
break everything horribly (from the backward compatibility perspective).

> I follow these discussions, and I must admit that I already use std::string
> in my projects with utf8 encoding assumed by default. What matters for me is
> the lack of a "standard" way to manipulate those strings. I.e.:
> 1) Convert them to and from other APIs' encoding:
>    SetWindowTextW(to_utf16(my_string));
> 2) Iterate through the codepoints, characters, words etc.. like this:
>    for(char32_t cp : codepoints(my_string))
>        ...;


> The original proposal (in the other thread) was to use the type of the
> string to ensure at compile time that the above code is valid. I understand
> that it is needed in the current world where not everybody uses utf8. It's
> fine for me. But why
> On Fri, Jan 21, 2011 at 13:25, Matus Chochlik <chochlik_at_[hidden]> wrote:
>> create a class called boost::string that will have
>> all the properties that a string handling class in 2011+ A.D.
>> should have, basically what std::string should have been.

The original proposal was to keep the existing string but to
switch to UTF-8 as the default encoding. This is what still is
my long term goal. The whole discussion changed my opinion
on how to get there. I personally would not have any problem
with doing the instant switch .. but many other people would,
and with good reasons.

> ?
> What are those properties? Isn't std::string *is* what it should have been?
> Do you mean that you want to put there in any possible algorithm you can
> imagine?

What I was talking about is basically adding some more convenience
member functions, many of which are currently implemented by the
string_algo Boost library, to the strings interface and more importantly
to extend the strings interface with 'Unicode-functionality' i.e. the ability
to traverse the string not just as a sequence of bytes but as a sequence
of Unicode code-points and if possible even "logical characters".

> IMO std::string is just a container of bytes with two useful convenience
> methods (c_str() and substr()) and a utf8 encoding that had to be assumed by
> default but unfortunately isn't. Everything else should be generic
> algorithms that work with sequences of characters in some encoding. So,
> maybe it's better to focus on designing something like boost::iterator_range
> with an encoding associated with it and algorithms that work with these
> ranges?
I that is to succeed it has to be (backward)compatible with the existing APIs,
however borked they seem to us (me included). There are lots of strings
implementations that are *cool* but unusable by anything except algorithms
specifically designed for them.


Boost list run by bdawes at, gregod at, cpdaniel at, john at