Boost logo

Boost :

Subject: Re: [boost] [smart ptr] Any interest in copy-on-write pointer for C++11?
From: Ralph Tandetzky (ralph.tandetzky_at_[hidden])
Date: 2013-02-09 03:49:13


On 02/09/2013 07:01 AM, Mathias Gaunard wrote:
> On 08/02/13 18:41, Ralph Tandetzky wrote:
>
>> Copy-on-write is the most important use case of the class design. Even
>> larger libraries like Qt use copy-on-write for some types (like QImage
>> or QPixmap) and make the user interface much easier to use.
>
> That's because Qt is broken, and is written to support broken C++
> programming practices.
> If you only copy when you need to copy, then there is no need for the
> COW mechanism. COW was invented to workaround issues with code that
> copied but didn't actually need to.
> COW also has the nasty effect that your real object is gonna change
> its address just because you modified it. For this reason the C++11
> standard prevents implementation of standard components like
> std::string from using that mechanism.
>

Obviously, COW was invented for code that copied but didn't need to. And
that's still a valid reason to use COW today in C++11. Yes, you can move
objects and avoid a copy. Or you can swap cheaply, if you need to. But
sometimes you need to copy, if you don't know for sure whether the
reference count is 1.

Under your assertions the STL is just as broken. For example
std::vector<T>::pushback() might change the address of the contained
data and therefore invalidate all pointers and iterators to it. That's
still a source of many bugs unfortunately, especially for newbies in
C++. The COW implementation of std::string had the following weird effect:
         std::string a("Hello world!");
         char & p = a[11];
         std::string b( a ); // makes a cheap copy
         p = '.'; // modifies a and b
The reason is the escaping reference. The same effect can be achieved
with QImage because the interface allows escaping pointers to the
contained image data. The cow_ptr<T> design I'm proposing tries to avoid
this pitfall. There's a const version of the member function get(), but
not mutable one, since that would let a pointer escape instantly. The
only function that lets a non-const pointer escape directly is
operator->(), which is reasonable, since it is unlikely for a user of
the class to write
         T * raw = ptr.operator->();
To still make it possible to modify the pointed to object easily you can
use the cow_ptr member function apply() which takes a functor which
takes a raw pointer to T or to use the macro COW_APPLY in the following way
         auto ptr = make_cow<Type>( /* constuctor arguments */ );
         ptr.apply( [&]( Type * p ) { p->modify(); } ); // equivalent to
ptr->modify();
         COW_APPLY(ptr) { ptr->modify(); }; // does the same
If you really want to, you can still let a pointer or a reference to the
pointed-to object escape. But it's harder. This way the interface is
easy to use correctly and hard to use incorrectly. If you use it
correctly (i.e. don't let references escape), then it does not suffer
from the above problem with the cow implementation of string and I
believe it's safe.

>> A major
>> reason is that value semantics are easier to reason about than reference
>> semantics. For this purpose cow_ptr<T> is an enabler. Moving isn't
>> always sufficient.
>
> Value semantics do not require COW. There is no logic between your
> statements. Value semantics mean that when you modify a copy of an
> object, then the original is left unmodified. To achieve this you can
> either copy when requested or delay it until you're actually modifying.

Sorry, my statement was incomplete. The goal is to avoid unnecessary
copies, because copying can be really expensive (think of the data
inside an image or matrix class). I would prefer a matrix class with
value sematics instead of reference semantics. It's easier to reason
about. And I would like to be able to write
         Matrix a, b(1000,1000);
         a = b;
without having to fear that the whole 1000x1000 matrix is deeply copied.

> The simplest way to deal with this is to simply copy when asked to
> copy by the user, which is not only straightforward and keeping a good
> separation of concerns, but it also means you don't need the delaying
> mechanism and the overhead attached to it.
> KISS.

For client code of a class using cow_ptrs for data members internally it
is even easier not to worry about making copies, but the class
automatically does it for you. cow_ptr helps to implement that behaviour.

>> The class design allows you to copy objects polymorphically. I tried it.
>> It works. It's useful.
>
> It would work just as well without COW. COW does not affect observable
> behaviour, it's purely an implementation-specific detail.
> It should not leak in the interface.

It can be an implementation-specific detail and can make code faster
without the client code knowing about it. But it can also be a thing the
client code relies upon as with the Matrix class above. Sometimes client
code might want to know, if copies can be made cheaply.

>> I used it in production code.
>
> That just means the code does what you need and is stable enough to
> work reliably in your use cases.
> This doesn't say anything about the quality of the design.

At least I could convince you that there are legit use cases. That was
harder than I expected.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk