Subject: Re: [boost] [lockfree] review
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-07-29 15:00:56
thanks for the review!
> IMO the big question with a library like this is not so much the design as
> the correctness. What can we do as reviewers to verify the correctness,
> assuming we are not lockfree gurus ourselves? Go back to the original
> sources. So I did; see below.
well, there is one problem about the publications done by the lock-free gurus:
they don't care about the implementation, but assume a sequencially consistent
memory model, so they don't need to care about the required memory barriers or
the like ...
> Could the freelist be moved into the public interface? This is a useful
> data structure and a simple way to avoid calls to the memory allocator.
i need to think about this. i have changed the freelist implementation a bit in
order to use array indices as alternative to pointers (for ll/sc platforms
without support for double-word CAS).
the new interface is rather complex because i need to resolve from a `handle'
(pointer or 16bit index) to a node pointer (and reverse). maybe tagged_pointer
and freelist can be exposed to the public interface at a later point, but for
now i hesitate to expose the full freelist interface.
maybe but maybe a simplified version could be exposed?
> - What is your evaluation of the implementation?
> I verified the queue implementation against the Michael/Scott pseudo-code.
> It matches pretty much line-for-line. Since mathematical proofs are
> difficult to grasp and these algorithms are usually proved by the test of
> time, being able to compare against known-good code is essential.
> There's one exception: an extra check on next_ptr is commented:
> * however we reuse the tagged_ptr part for the and clear the next part
> during node * allocation. we can observe a null-pointer here.
> There's a word missing there. Are you saying that next_ptr can be null
> unlike in the original algo, because null is used to mark a new node?
> Does this affect any other part of the algo? This should also get a
> paragraph in the Rationale.
good eye, the missing word is `freelist'. the michael/scott paper only mentions
that they use a freelist, but they simply omit the that part from the
pseudocode. in my implementation, the same memory location is used for linking
the nodes of the queue and the nodes of the freelist.
> - What is your evaluation of the documentation?
> It seems rather light.
> I'd like to see references for the other two algorithms, which the doc
> currently ascribes to "folklore" - if someone would like to verify them in
> the way I did for the queue, where would they go? What could they read to
> understand how it works?
i should probably add a recommendation to `the art of multiprocessor
programming'. iirc it also covers the michael/scott queue ...
> I'd like to see some performance comparisons on a couple of architectures.
> It is okay to have a big disclaimer saying these results may not be
> typical and the user should do their own benchmarks.
i've started to implement a set of simple benchmarks, for including results, i
will need some help (my access to multiprocessor hardware is limited to my
workstation and my laptop)
> I am not experienced enough to evaluate whether the use of barriers and the
> c++ memory model is correct. I would like to see something in the
> rationale describing how these choices are made. I also thought I
> recalled Tim saying that one of the algorithms might require more barriers
> for untested architectures, but I couldn't find anything about that in the
hopefully this is not the case anymore: all memory barriers are abstracted in
the memory model of atomic<>. no memory barrier needs to be issued manually, but
all barriers are issued via atomic<> members.
> As Phil mentioned, the documentation should have a section about when/when
> not to use lock-free. It doesn't have to be authoritative and can refer
> to other web resources. This is really just an expansion of the
> Performance section of the intro.
> - What is your evaluation of the potential usefulness of the library?
> Lock-free can perform better than locked data structures, and what's more
> important, it can have more predictable performance. I am surprised by
> Julien's finding that the fifo actually slows down with more processors;
> in my limited experience I would think this would happen with locked but
> not lockfree data structures.
lock-free data structures cannot scale linear: when multiple threads try to
perform the same operations, one compare_exchange will fail and one of the
threads has to retry. the more threads the more retries.
> - Are you knowledgeable about the problem domain?
> As well as having done a good amount of debugging concurrent programs, I
> took an excellent Multicore class with Alberto Lerner at NYU.
> For the final project, my team successfully implemented a lock-free hash
> table, the Split-Ordered-List of Shalev/Shavit, which uses the lock-free
> list-based set of Michael. It proved to have far superior performance to
> the naive implementation of simply putting a read-write lock on an
> unordered_set. (Near-linear speedup versus horrible slowdowns for the
> locked version. We did not compare against TBB.)
> However, without a double-width compare-and-swap on all the compilers we
> were using (i.e. no Boost.Atomic), we were forced to use Michael's Safe
> Memory Reclamation instead of freelists, and as the paper says, "memory
> barriers may be needed" - but where? We ran into horrible ABA problems.
> I guess that algo is patented anyway.
hazard pointers are patented, so they are not an option for boost. finding the
right positions for memory barriers can be quite hard and usually requires a
full understanding of the data structure. atomic<> will make it way easier to
write a correct algorithm, since their member functions by default to a
sequencially consistent memory model.
> We could clean up the split-ordered list code and adapt it to freelists for
> donation to Lockfree, if the library is accepted.
as a workaround for the dcas requirement one can use array indices (the reason
for my current freelist refactoring). if i feel comfortable with your
implementation, i am happy to adopt it!