Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-21 12:14:38


On Sat, Jan 22, 2011 at 12:55 AM, Dave Abrahams <dave_at_[hidden]> wrote:
> At Fri, 21 Jan 2011 20:07:51 +0800,
> Dean Michael Berris wrote:
>>
>> 1. Immutable. No if's or but's about it. I don't want a string to be
>> modifiable. Period. You can create it, and once it's created, that's
>> it.
>
> Do you want to prevent
> 1. wholesale mutation such as
>
>          x = y
>          x += y
>
> or just
>
> 2. per-char mutation such as
>
>          x[10] = 'a'
>
> ?
>
> eliminating #2 does a lot for implementation flexibility
> (e.g. allowing refcounts or GC to be used cleanly), and can be useful
> for thread safety if there's no "small string optimization," because
> the buffers holding the chars are truly immutable.
>
> However, preventing #1 is a more serious matter...
>

I want to prevent #2 but not #1. :)

And actually, I would have phrased the concept of #1 to be:

  x = "Some string";
  x = x ^ "... and another string";

Because adding two strings isn't the same as joining two strings in
concatenation. ;)

>> 2. Has real value semantics. This means, once you've copied it, that's
>> really copied. No funky copy-on-write reference-counting
>> mumbo-jumbo.
>
> I guess you're talking about just per-char mutation, then, because
> value semantics implies assignability.
>

Yep.

>> 3. Has all the algorithms that apply to it defined externally.
>>
>> 4. Looks like a real STL container except the iterator type is smarter
>> than your average iterator.
>>
>> Encoding is a matter of external interpretation and I think should not
>> be part of a string's interface. You can have wrappers that interpret
>> a string as a UTF-* string.
>
> What does it iterate over?  chars?  code points?  characters?
> Something else?
>

I can see basically a way of saying what you want when you want to get
an iterator from it -- by default though a call to '.begin()' will
return an iterator characters (just so you don't break compatibility
with std::string).

The iterator can store a reference to the original string and when
advanced, can do the appropriate interpretation of the string in
context. If you wanted a code point iterator, you'd get the code point
iterator. If you wanted a character based on a certain encoding then
you can have a special iterator for that. An iterator would also know
whether it was out of bounds.

This allows people to write code that dealt with code points,
characters (based on the encoding), and raw data if absolutely
necessary.

-- 
Dean Michael Berris
about.me/deanberris

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk