Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-27 05:52:22


On Thu, Jan 27, 2011 at 4:09 PM, Artyom <artyomtnk_at_[hidden]> wrote:
> Hello,
>
>
> To all discussing about how to create
> an "ultimate" string, I'd like to remind
> you following "ultimate" strings existing
> there:
>

Who are these people discussing how to create an "ultimate" string? Oh
you mean me who wants to create an "immutable" string?

I hardly called it an "ultimate" string so I think you're throwing a
strawman red herring here. At any rate, I'll indulge you.

[snip all the stupid non-ultimate strings quoted]

They're all broken. Is that what you wanted me to say? :D

>
> Now Questions:
> --------------
>
> 1. Why do YOU think you'll be able to create something "better"?
>

I don't know about "better". I do know "different" though. Better is
largely a matter of perspective.

> 2. Why do YOU think boost::string would be adopted in
>   favor of std::string or one of the current widely used
>   QString/ustring/wxString/UnicodeString/CString?
>

I don't. That wasn't the point though. It's not some illusion of
grandeur or some messianic vision that came from some voice in my head
asking me to etch it in stone. There's an opportunity to implement
strings in a different way and I think it's worth doing regardless of
whether it will be adopted in favor of anything that's already out
there. People said COBOL is ugly but still people to this day write
programs in it -- so I have no intentions of asking others to use the
immutable string if they won't want to.

> 3. What so painful problems are you going to solve that
>   would make it so much better then widely used and adopted
>   std::string? Iterators? Mutability? Performance?
>
>   (Clue: there is no painful problems with std::string)
>

Sorry, but for someone who's dealt with std::string for *a long time
(close to 8 years)* here's are a few real painful problems with it:

1. Because of COW implementations you can't deal with it properly in
multiple threads without explicitly creating a copy of the string. SSO
makes it a lot more unpredictable. The's all sorts of concurrency
problems with std::string that are not addressed by the interface.

2. Temporaries abound. Operations return temporaries and copies of
data in the string. Substring operations create new strings to avoid
the problem with concurrent mutation of the underlying data.

3. It has to be contiguous in memory because of the assumption that
data() and c_str() should give a view of the string's data at any
given time across mutations makes extending, shrinking, and generally
manipulating it a pain not only from an interface/performance
perspective but also from the point of view of an implementor. Then
you look at the resource issues this raises with potential
fragmentation when you keep mutating strings and growing it.

4. Because of the mutability of std::string your iterators *may* be
invalidated when the string changes. This is crucial for efficiency
concerned code that deals with strings.

5. Because of the contiguous requirement, using it for any "text"
that's larger than a memory page's worth of data will kill your cache
coherency -- and then when you modify parts of it then you can thank
your virtual memory manager when the modifications are done. Then you
see that you would have to implement your own segmented data structure
to act as a string and then you realize you're better off not using
std::string for situations where the amount of data you're going to
deal with is potentially larger than cache line.

> Now Suggestion:
> ---------------
>
> 1. Accept it that there is quite small chance that something
>   that is not std::string would be widely accepted
>

So, what if the chances are small that it'd be widely accepted? That
never stopped a lot of people -- heck it never stopped me.

> 2. Try to solve existing "string" problems by using same
>   std::string and adding few things to handle it better.
>

Sorry, but the existing string problems are precisely because of the
way std::string is designed/implemented. No if's buts about it. We can
agree to disagree on this one.

>   Clue: take a look on what Boost.Locale does.
>

I did, I like how it's designed, and it solves what it solves. However
I don't think an immutable string and Boost.Locale are mutually
exclusive.

You choose to deal with std::string while I OTOH would like to give an
alternative interpretation of strings. I'll leave it at that. ;)

-- 
Dean Michael Berris
about.me/deanberris

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk