Boost logo

Boost :

Subject: [boost] Lockfree review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-07-28 15:14:41

Dear All,

Here is a quick review of the proposed Boost.Lockfree library.


For some proposed libraries their utility is beyond doubt because they
are already being used in some significant application. In contrast
most GSoC projects are developed more in the "hope that they will be
useful" without a motivating application. So we have to ask ourselves,
does this library meet a need? To me, lock free algorithms and data
structures have always seemed appealing at first sight but, apart from
some trivial cases like counters, on close inspection they turn out to
be too complicated to bother with. (They also lack required features,
i.e. waiting for non-empty; I don't think I've ever written a
producer-consumer application that didn't need the consumer to block on empty.)

The author acknowledges that these structures will typically be slower
than conventional implementations using mutexes, but claims that their
benefit comes in terms of better worst-case latency. Specifically, if
a thread is pre-empted while it holds a lock, another thread will have
to wait until the first thread is re-scheduled and can unlock before it
can itself lock. But this case is exactly the one that priority
inversion is supposed to address: when the second thread tries to lock,
the kernel immediately re-schedules the first thread. Yes, because of
the kernel involvement this will probably still be slower than the
lock-free approach, but perhaps not by much. It seems to me that the
author should present benchmarks to demonstrate the actual worst-case
latency difference between this code and a lock-based implementation.


I am happy with the design of the interface, as far as it goes; there
are a few things that could be added though:

- Debug modes that check the thread requirements.
- Other combinations of single/multiple producer/consumer.


I have looked only at the implementation of the ringbuffer class
template. It lacks comments. There are some oddities like this:

         size_t ret = read_index - write_index - 1;
         if (write_index >= read_index)
           ret += max_size;
         return ret;

Here, a negative value may be assigned to the unsigned size_t. I think
this is actually safe, but it looks odd. I have not found any bugs.

It looks like every call to enqueue() and dequeue() will access both
the read and write indexes. Won't this cause poor performance due to
cache issues? I would have been tempted to avoid this by keeping local
'available' counts: (PSEUDO_CODE)

   atomic<size_t> read_index_;
   atomic<size_t> write_index_;
   size_t read_available_;
   size_t write_available_;

   bool enqueue(...) {
     if (write_available_==0) {
       // In this case, we access read_index_ which is probably cached
by the consumer thread,
       // which is expensive. But this is rare.
       write_available_ = (read_index_ + max_size - write_index_ - 1) % max_size;
     if (!write_available_) return false; // full
     buffer[write_index_.load(memory_order_relaxed)] = data;;

   // dequeue similarly...


I'm not impressed by the documentation. It lacks rationale and
information about the implementation. It has some formatting issues
and typos (e.g. lose/loose). Platform requirements (i.e. the
Boost.Atomic and/or std::atomic requirements and supported
architectures) should be explained.

The reference documentation involves more clicks than I would expect to
reach the actual content.

Some functions are marked as "for debugging only", e.g. the empty() and
reset() [which should probably be called clear()] methods. It's not
clear to me why these methods must be unsafe; for example, for the
ringbuffer, I would expect the consumer thread to safely be able to
call empty() and the producer to safely be able to call full() (which
is not provided). Rationale for these restrictions (or relaxations of
them) would be useful.

- Did you try to use the library? With what compiler? Did you have
any problems?

I considered exercising the library on my ARM systems, but I realised
that it requires 64-bit atomics; currently, Boost.Atomic only provides
32-bit atomics on ARM. (This raises the whole issue of the
Boost.Atomic dependency. I'm presuming that Tim intends not to bundle
a Boost.Atomic snapshot with the library if approved. If that changes,
we really need another review.)

- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?

A few hours reading the docs and code.

- Are you knowledgeable about the problem domain?

I know a certain amount about atomics and lock-based concurrent
programming, but not much about lock-free data structures.

- Please always state in your review, whether you think the library
should be accepted as a Boost library!

I think it should be accepted provided that:
* The author demonstrates some measurable benefit over lock-based data structures.
* The documentation is improved to describe the implementation and to
provide rationale for design decisions.
* The library is not added to Boost until Boost.Atomic has been
reviewed and approved (or, available std::atomic implementations work
with it).

Regards, Phil.

Boost list run by bdawes at, gregod at, cpdaniel at, john at