Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Patrick Horgan (phorgan1_at_[hidden])
Date: 2011-01-27 16:19:00


On 01/27/2011 02:52 AM, Dean Michael Berris wrote:
> ... elision by patrick ...
>
> Sorry, but for someone who's dealt with std::string for *a long time
> (close to 8 years)* here's are a few real painful problems with it:
>
> 1. Because of COW implementations you can't deal with it properly in
> multiple threads without explicitly creating a copy of the string. SSO
> makes it a lot more unpredictable. The's all sorts of concurrency
> problems with std::string that are not addressed by the interface.
>
> 2. Temporaries abound. Operations return temporaries and copies of
> data in the string. Substring operations create new strings to avoid
> the problem with concurrent mutation of the underlying data.
>
> 3. It has to be contiguous in memory because of the assumption that
> data() and c_str() should give a view of the string's data at any
> given time across mutations makes extending, shrinking, and generally
> manipulating it a pain not only from an interface/performance
> perspective but also from the point of view of an implementor. Then
> you look at the resource issues this raises with potential
> fragmentation when you keep mutating strings and growing it.

It doesn't have to be contiguous, but rather act as if it were. Of
course everyone does it contiguously because the alternatives are all a
lot worse.

> 4. Because of the mutability of std::string your iterators *may* be
> invalidated when the string changes. This is crucial for efficiency
> concerned code that deals with strings.

You _have_ to treat them as if they were invalidate.

> 5. Because of the contiguous requirement, using it for any "text"
> that's larger than a memory page's worth of data will kill your cache
> coherency -- and then when you modify parts of it then you can thank
> your virtual memory manager when the modifications are done. Then you
> see that you would have to implement your own segmented data structure
> to act as a string and then you realize you're better off not using
> std::string for situations where the amount of data you're going to
> deal with is potentially larger than cache line.
>
Thank you! This is a nice discussion of some of the advantages of an
immutable string vs a mutable string.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk