Subject: Re: [boost] [lockfree] review
From: Gordon Woodhull (gordon_at_[hidden])
Date: 2011-07-29 13:00:15
Hi Tim, Hartmut,
I apologize that this is late and not an in-depth review.
I think the library should be accepted.
IMO the big question with a library like this is not so much the design as the correctness. What can we do as reviewers to verify the correctness, assuming we are not lockfree gurus ourselves? Go back to the original sources. So I did; see below.
The other important point is to verify that the author is going to maintain the library in case problems do crop up. Tim seems to have a good attitude about suggestions, so I have no doubts here.
- What is your evaluation of the design?
I am not hugely concerned about naming but it's important that the naming scheme scales as more algorithms are added. What if there are multiple good algorithms for the same functionality, with different tradeoffs?
I think the most accurate thing would be to catalogue the algorithms instead of the "data structures", which I find to be a confusing term in this domain. However, it would painful to users to have to refer to the michael_and_scott_queue_algo or whatever, so I'm not sure what the best solution is. I like Tony's suggestion, but I'm not clear if the attributes are just going to keep multiplying.
Could the freelist be moved into the public interface? This is a useful data structure and a simple way to avoid calls to the memory allocator.
- What is your evaluation of the implementation?
I verified the queue implementation against the Michael/Scott pseudo-code. It matches pretty much line-for-line. Since mathematical proofs are difficult to grasp and these algorithms are usually proved by the test of time, being able to compare against known-good code is essential.
There's one exception: an extra check on next_ptr is commented:
* however we reuse the tagged_ptr part for the and clear the next part during node
* allocation. we can observe a null-pointer here.
There's a word missing there. Are you saying that next_ptr can be null unlike in the original algo, because null is used to mark a new node? Does this affect any other part of the algo? This should also get a paragraph in the Rationale.
- What is your evaluation of the documentation?
It seems rather light.
I'd like to see references for the other two algorithms, which the doc currently ascribes to "folklore" - if someone would like to verify them in the way I did for the queue, where would they go? What could they read to understand how it works?
I'd like to see some performance comparisons on a couple of architectures. It is okay to have a big disclaimer saying these results may not be typical and the user should do their own benchmarks.
I am not experienced enough to evaluate whether the use of barriers and the c++ memory model is correct. I would like to see something in the rationale describing how these choices are made. I also thought I recalled Tim saying that one of the algorithms might require more barriers for untested architectures, but I couldn't find anything about that in the documentation.
As Phil mentioned, the documentation should have a section about when/when not to use lock-free. It doesn't have to be authoritative and can refer to other web resources. This is really just an expansion of the Performance section of the intro.
- What is your evaluation of the potential usefulness of the library?
Lock-free can perform better than locked data structures, and what's more important, it can have more predictable performance. I am surprised by Julien's finding that the fifo actually slows down with more processors; in my limited experience I would think this would happen with locked but not lockfree data structures.
Anyway, it is always good to have more choices to compare, and in realtime applications, these structures can be essential.
- Did you try to use the library? With what compiler? Did you have any
Unfortunately, I did not.
- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
I spent about 5-6 hours reading the docs, comparing code against pseudo-code, and writing this.
- Are you knowledgeable about the problem domain?
As well as having done a good amount of debugging concurrent programs, I took an excellent Multicore class with Alberto Lerner at NYU.
For the final project, my team successfully implemented a lock-free hash table, the Split-Ordered-List of Shalev/Shavit, which uses the lock-free list-based set of Michael. It proved to have far superior performance to the naive implementation of simply putting a read-write lock on an unordered_set. (Near-linear speedup versus horrible slowdowns for the locked version. We did not compare against TBB.)
However, without a double-width compare-and-swap on all the compilers we were using (i.e. no Boost.Atomic), we were forced to use Michael's Safe Memory Reclamation instead of freelists, and as the paper says, "memory barriers may be needed" - but where? We ran into horrible ABA problems. I guess that algo is patented anyway.
We could clean up the split-ordered list code and adapt it to freelists for donation to Lockfree, if the library is accepted.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk