Subject: Re: [boost] [Review] Lockfree review: two more days left
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-07-26 10:55:24
> First, I have re read the doc and started to look at the implementation and
> I found something that is apparently a contradiction:
> - In the doc you say "Spinlocks do not directly interact with the operating
> system either. However it is possible that the owning thread is preempted
> by the operating system, which violates the lock-free property"
> - In the code however (for fifo<T>), the code seems to be "spinning" all
> the time using "for (;;)" loops.
> Could you give me some precisions on why this is ok ?
spinning is not the problem, locking is. the spinning in fifo (aka queue) and
stack basically has the semantics `try an operation, if it fails because another
thread interferes with it, try again'. if the os preempts a thread while
spinning, it does not block the other threads.
> Then, I started to wrote a test program to compare performance with Intel's
> Thread Building Block library (3.0 update 7, the latest stable). I am far
> from finished, but my first results are that boost::lockfree is slower on
> my machine by about 40%.
> I use the TBB concurrent_queue<T> class that is according to the doc
> "non-blocking". I had a quick look at the implementation it seems to be
> using a similar technique as fifo<T> (that is spinning, no system locks).
> Do you know whether this is actually a lock free structure or not ?
i checked tbb's concurrent_queue several year ago, but at that time, it seemed
to be using spinlocks. maybe this has changed. non-blocking does not necessarily
mean lock-free, but it could be an obstruction-free queue ...
one point, where i really question the non-blocking claim is the signature of
the push() function:
void push( const T& source );
it it either needs to allocate memory dynamically from the OS, where the memory
allocator may block, or it throws an exception (and afaict this will also cause
some memory allocations).
> My question is not demanding that you match Intel's perf ;) but rather if
> you have any ideas on what could be done to improve the performance further
> (Intel's using assembly for instance, no idea if it has a big impact,
> another suspect may be a better atomic implementation).
some people suggest that implementing a backoff could improve the performance.
otoh, herlihy and shavit suggest that the performance of a backoff is heavily
depends on the backoff parameters and these parameters differ among different
cpu type and even clock speeds.
because of this, and because implementing a backoff will require some assembly
code, i decided not to include a backoff in boost.lockfree.
> I have attached the code FYI (please keep in mind that this is a quick
> test). The interesting part is the "SimpleTest" functions.
> I compiled on Ubuntu 11.04 / gcc 4.5 in C++0x mode.
> I measured the time using the "time" command. I ran the sample on an AMD
> cpu which cpuinfo are attached.
one point: the enqueue/push functions of boost.lockfree do not necessarily
succeed. in your code, it should be fine, because the freelist that is used will
allocate new internal nodes, if the freelist is empty. however this means that
the push operation may block, if the allocator blocks.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk