Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-21 08:37:31


On Fri, Jan 21, 2011 at 9:23 PM, Eric Niebler <eric_at_[hidden]> wrote:
> On 1/21/2011 7:45 PM, Dean Michael Berris wrote:
>> On Fri, Jan 21, 2011 at 8:37 PM, Eric Niebler <eric_at_[hidden]> wrote:
>>>
>>> Uh, if the string is immutable, then two strings can transparently share
>>> the same data. There is no "write" in "copy-on-write". That's the
>>> definition of immutable. :-)
>>
>> Ha! You're right. :D Now if you can make this work without reference
>> counting, that'd be perfect. :D
>
> Why? Immutable data is exactly the situation (and the only situation
> IMO) when you'd want to share data. The potential savings, both in time
> and space, are huge. What's wrong with a ref-counted string impl?
>

Good question. Here are some reasons off the top of my head for why a
ref-counted string implementation might not be desirable:

1. The overhead of maintaining that table of reference counts may be
significant if you have a lot of strings created in your application.
The only way I know how this would be implemented is with maybe static
intrusive container of these string chunks. Notice I say "created" not
copied around because the cost of copying whole strings around in
memory would dwarf the cost of having that static intrusive container
for string chunks.

2. In the case that you do have strings copied around, consider the
case in multi-threaded applications where you have different threads
running on different cores. Although there won't be any mutations,
updating a reference count that's shared across cores will cause some
"unnecessary" cache ping-pong across cores. You won't have this if
each thread was just operating on a copy of the data and not needing
to update reference counts.

3. The reference count almost always will be implemented with atomic
counters to be efficient. In cases where the hardware doesn't support
these (in embedded environments) the reference count would be
implemented either with a spin-lock if the platform supports
threading, OS-level semaphores, etc. -- none of which would be
necessary if you just copied the string to begin with.

For the majority of cases an interning strategy for strings to be able
to share the same storage would be a good thing. Maybe turning off the
reference counting "default" might be an option as well.

Of course I'm just listing the reasons why a ref-counted
implementation wouldn't be desirable -- in the majority of the cases,
I'd say reference counting may be a good thing. ;)

-- 
Dean Michael Berris
about.me/deanberris

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk