Boost logo

Boost :

Subject: Re: [boost] [smart ptr] Any interest in copy-on-write pointer for C++11?
From: Ralph Tandetzky (ralph.tandetzky_at_[hidden])
Date: 2013-02-13 10:46:37


SUMMARY
=======

I would like to summarize the discussion about copy-on-write so far:

* I wrote a copy-on-write-pointer implementation
<https://github.com/ralphtandetzky/cow_ptr.git> with the following use
cases:

    1. A const-correct pimpl pointer.
    2. Helper class for implementing copy-on-write for higher level
    structures, where copying is expensive. (E.g. matrix or image classes)
    3. cow_ptr<Base> wraps polymorphic classes giving them genuine value
    semantics, so they can be put into standard containers, even if Base
    is abstract.
    4. It can be used to add cloning to a class hierarchy non-intrusively.

* Thread-safety was discussed. (brought up by Alexey)

    - The reference counter is atomic.
    - All constant operations on cow_ptr and its pointee are thread-safe
    as long as const operations on the pointee are thread-safe.
    - If for the pointee constant operations are thread-safe and if it
    is safe to write to a pointee from one thread as long as no one else
    is reading or writing, then the same is true for all individual
    cow_ptrs pointing to that object and all access though these cow_ptrs.

* Is cow still necessary? (brought up by Mathias)

    - Since in C++11 you can move objects cheaply instead of copying
    them an important use case of copy-on-write is gone. Before
    move-semantics returning objects by value was sometimes a bad
    performance issue. Copy-on-write solved that.
    - Cow is still useful today for matrix classes or image classes or
    even trees that share state under the hood, but should not influence
    each other when writing.
    - Example: If you want to implement a property tree, you can use the
    approach
             class PropertyTree : AbstractProperty
             {
             public:
                 /* implementation of public interface. */
             private:
                 std::list<cow_ptr<AbstractProperty>> properties;
             };
    - Having this you can keep a history of a big property tree in
    memory easily.
             std::vector<PropertyTree> history;
             auto current = history.back();
             current.modify();
             history.push_back( current );

* Is COW unsafe? (brought up by Mathias)

    - COW is sometimes considered unsafe. That's why the C++ standard
    COW implementations of std::string.
    - The code
             std::string a("Hello world!");
             char * p = &a[11];
             std::string b( a );
             *p = '.'; // modifies a and b, if std::string was
    implemented using COW.
    does not work correctly, for COW-implementations of std::string.
    - The reason this does not work is the escaped pointer. When
    escaping pointers are strictly avoided, this effect cannot happen.
    Therefore cow_ptr does not provide a non-const version of the get()
    member function, but a member function modify() (formerly known as
    apply()) which can be used in the following way:
             cow_ptr<MyType> p( new MyType );
             auto q = p;
             p.modify( [&]( MyType * p ){ p->doSomething();
    p->doSomethingElse(); } );
             COW_MODIFY(p) { p->doSomething(); p->doSomethingElse(); };
    // equivalent to the line above
    - It is still possible for a pointer to escape, but the interface is
    such that it is easy to use it correctly and hard to use incorrectly.
    - The interface design of std::string prevents the possibility for
    implementing it correctly. Hence COW must be considered during
    interface design phase of a class.

* Alternatives to COW (brought up by Mathias)

    - C++11 move and cloning.

        -Most often unnecessary copies can be avoided using C++11
        move-semantics and cloning where necessary.

    - Flyweight factory.

        - Objects are accessed by a hash value. There's always only one
        copy of identical objects. For complex objects that are modified
        often recalculating the hash and synchronizing the hash table
        thread-safely can be a bad performance bottleneck.

    - shared_ptr<T const>

        - Even with shared_ptr<T const> you never know, if there's a
        shared_ptr<T> object (non-const) through which the pointee is
        modified. shared_ptrs are really shared. It is likely more error
        prone to use shared_ptr to implement COW. If T is a polymorphic
        class but does not have a clone() member function, then cloning
        will not work properly because of slicing. shared_ptr is useful
        for many things, but it's probably not the best tool to
        implement COW.

* The name (brought up by Peter)

    - cow is an acronym and lower case. It's a farm animal ... enticing
    me to write member function names like "moo". The name does not
    reflect the ability to contain polymorphic value pointers. (Peter)
    - clone_on_write<T> would be a suggestion of mine. It might be
    useful to drop the _ptr suffix completely, since the class has value
    semantics.
    - Others have suggested to split cow_ptr<T> into a read_ptr<T> and
    write_ptr<T> classes.

* Slicing problems (brought up by Vincente)

    - The constructor taking an Y * pointer might lead to slicing
    problems, if the pointee is not an Y object, but somethings derived.
    - The default_copier will make a runtime-check assert( typeid(*p) ==
    typeid(Y) ).

* Comparison to adobe::copy_on_write<T>
<http://cppnow.org/session/value-semantics-and-concepts-based-polymorphism/>
(brought up by Andreas)

    - This class is constructed by moving a T object into itself.
    Copying is implemented as cheap copy of a pointer with reference
    counting.
    - other than constructors, destructors and assignment operators
    there are only the public member functions read() and write().
    read() returns a const reference to the contained object, write()
    makes an internal copy, if the reference count is greater than 1,
    and then returns a non-const reference to the contained object.
    - The class does not support cloning for polymorphic T, but always
    uses the copy-constructor of T in order to copy.
    - Hence the class interface is extremely simple.

* Comparison to value_ptr<T>
<http://www.google.de/url?sa=t&rct=j&q=n3339&source=web&cd=3&sqi=2&ved=0CD4QFjAC&url=http%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fwg21%2Fdocs%2Fpapers%2F2012%2Fn3339.pdf&ei=umkbUabjN6nh4QTAmoCYAg&usg=AFQjCNGikPTGbnWijae8tzd1KTLvz1C63Q>
in N3339 (open-std) (brought up by Vincente)

    - Basic properties: A value_ptr<T> mimics the value semantics of its
    pointee. Hence the pointee lifetime is the pointer lifetime, and the
    pointee is copied whenever the pointer is copied. Internally the
    pointee can be of a derived class of T. In this case the object is
    cloned properly.
    - Hence value_ptr<T> has the use-cases 1, 3 and 4 of cow_ptr<T>, but
    does not implement copy-on-write (use case 2).
    - value_ptr has the cloner and the deleter as template arguments of
    the class. The current implementation of cow_ptr only has the
    pointee type as template parameter. The cloner and deleter are
    stored dynamically.
    - value_ptr does not have a reference counter.
    - Other than that value_ptr<T> and cow_ptr<T> are extremely similar
    from the public interface.
    - In conjunction with copy_on_write<T> this can be used to do the
    same stuff as cow_ptr<T> does. The way to use it would be
    copy_on_write<value_ptr<T>>.

* pointer-semantics or value-semantics and nullptr (brought up by Vincente)

    - Should the COW-class be nullable? If not, then it should probably
    not be called cow_ptr.
    - This question has not been discussed to the end yet. Personally, I
    don't think that null-cow_ptrs are very useful.

* Different member and non-member functions (brought up by Vincente)

    - relational operators (brought up by Vincente)

        - It is not clear, whether operator==() on cow_ptrs should only
        compare pointers or also pointees. This would depend on whether
        the COW-class is considered a pointer or a genuine value.

    - release()

        - Should not exists, because the callee would not know what
        deleter to call. (similar to shared_ptr)

    - reset()

        - Will be implemented in order to provide the performance benefits.

* The write_ptr<T> and read_ptr<T> solution (brought up by Peter)

    - read_ptr<T> would be similar to shared_ptr<T const> and
    write_ptr<T> would be a unique_ptr<T> equivalent. read_ptr<T> has a
    member function which returns a write_ptr<T> through which the
    pointee can be modified. Afterwards the write_ptr<T> can be moved
    back into the read_ptr<T>:
             read_ptr<T> pr;
             if ( write_ptr<T> pw = pr.write() )
             {
                 pw->modify();
                 pr = std::move( pw );
             }
    - This possibly provides a better separation of concerns (safer,
    clearer, more flexible).
    - However, the above code is not exception-safe, if pr becomes a
    nullptr when the write() function is called. It makes exception-safe
    code harder to write.
    - In case the use_count is greater than 1: Should pr.write() make
    the copy? Or should pw.operator->() make the copy? This is not
    sufficiently discussed yet.

Thank you for all your constructive feedback!
Ralph


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk