Boost logo

Boost :

Subject: Re: [boost] Lockfree review
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-07-28 17:05:36


hi phil,

> For some proposed libraries their utility is beyond doubt because they
> are already being used in some significant application. In contrast
> most GSoC projects are developed more in the "hope that they will be
> useful" without a motivating application. So we have to ask ourselves,
> does this library meet a need?

first: the GSoC is about reviewing this library. it has been developed long
before because i needed some lock-free data structures for a soft real-time
system, which eventually became the project of my master thesis.

so (a) i have been using this code for quite some time (the oldest codepieces
are few years old). (b) over the last one or two years several projects have
been using boost.lockfree. sometimes i got emails from name_at_[hidden]
asking for something or reporting a problem (especially maintaining the asm code
was quite a hell before i migrated to boost.atomic)

> To me, lock free algorithms and data
> structures have always seemed appealing at first sight but, apart from
> some trivial cases like counters, on close inspection they turn out to
> be too complicated to bother with.

they are not the best choice for all applications. but i have never mentioned
that :)
having a background in low-latency real-time audio synthesis (deadlines of about
1 ms), i cannot rely on high-performance concurrent data structures, if they
would eventually block the real-time thread.

> (They also lack required features,
> i.e. waiting for non-empty; I don't think I've ever written a
> producer-consumer application that didn't need the consumer to block on
> empty.)

my real-world example has a rate-monotonic scheduler. whenever the scheduler is
called, i need to poll a queue to see of there are new jobs.

> The author acknowledges that these structures will typically be slower
> than conventional implementations using mutexes, but claims that their
> benefit comes in terms of better worst-case latency. Specifically, if
> a thread is pre-empted while it holds a lock, another thread will have
> to wait until the first thread is re-scheduled and can unlock before it
> can itself lock. But this case is exactly the one that priority
> inversion is supposed to address: when the second thread tries to lock,
> the kernel immediately re-schedules the first thread.

in theory, yes, but it assumes the following points:
* the context switch is cheap
* the scheduler itself is does not need to wait for a lock that is held by a
  different process for too long
* the mutex is not implemented as combination of spinlock and waiting lock. iirc
  this is done sometimes in order to avoid the cost of a system call.

actually, there is an interesting blog entry `time waits for nothing' [1],
containing sections from the audio system documentation of microsoft, apple and
linus, which all clearly state that it is not acceptable to use any blocking
primitives from their real-time callbacks.

> I am happy with the design of the interface, as far as it goes; there
> are a few things that could be added though:
>
> - Debug modes that check the thread requirements.

hm, i agree this might be useful, but i am not sure, if it will really be
possible to catch all thread requirements accurately. but i will have a look.

> - Other combinations of single/multiple producer/consumer.

other people have asked for this, too ... it is probably a good future
extension.

> Implementation
> --------------
>
> I have looked only at the implementation of the ringbuffer class
> template. It lacks comments. There are some oddities like this:

hm, yes ... the index computation can probably be cleaned up.

> It looks like every call to enqueue() and dequeue() will access both
> the read and write indexes. Won't this cause poor performance due to
> cache issues?

i don't think so: considering enqueue, the enqueue thread will both read and
write write_index, but the dequeue thread will only read it. when the dequeue
thread reads the value, it can use the one in the local cache, but if the
enqueue thread writes the new index atomically, i will need to trigger the cache
coherency protocol (e.g. broadcast the new value to other cores or invalidate
the cache lines of other values).

so afaik, accessing write_index is cheap, because reading does not need to deal
with cache coherency.

> information about the implementation. It has some formatting issues
> and typos (e.g. lose/loose). Platform requirements (i.e. the
> Boost.Atomic and/or std::atomic requirements and supported
> architectures) should be explained.

i'll probably write some more about the internals and the design.

> The reference documentation involves more clicks than I would expect to
> reach the actual content.

already fixed.

> Some functions are marked as "for debugging only", e.g. the empty() and
> reset() [which should probably be called clear()] methods. It's not
> clear to me why these methods must be unsafe; for example, for the
> ringbuffer, I would expect the consumer thread to safely be able to
> call empty() and the producer to safely be able to call full() (which
> is not provided). Rationale for these restrictions (or relaxations of
> them) would be useful.

good points: clear is more consistent to other stl/boost data structures.
providing full() and empty() with a defined rationale sounds good, too ...

> - Did you try to use the library? With what compiler? Did you have
> any problems?
>
> I considered exercising the library on my ARM systems, but I realised
> that it requires 64-bit atomics; currently, Boost.Atomic only provides
> 32-bit atomics on ARM.

actually, the requirement for 64bit atomics will be slightly relaxed for bounded
and fixed-sized data strucures by using 16 bit array indices instead of
pointers.

> (This raises the whole issue of the
> Boost.Atomic dependency. I'm presuming that Tim intends not to bundle
> a Boost.Atomic snapshot with the library if approved. If that changes,
> we really need another review.)

i have a local branch, where i am bundling boost.atomic. but there are some
problems:
* as the review has been done without taking boost.atomic into account, there
  would have to be a second review (implementation only)
* it would be a fork of the boost.atomic library, so maintaining both
  side-by-side could become quite painful
* if the implementation will be reviewed, then it is already half the work for a
  full review.

> I think it should be accepted provided that:
> * The author demonstrates some measurable benefit over lock-based data
> structures.

as stated above (and cited from [1]), there are situations, where lock-based
programming is not an option.
i have already stated in the docs that lock-free programming is not the best
approach for all applications and lock-based data structures can be more
efficient in some (many) cases. probably the first line of the doc should clear
up the misconception that lock-free data structures are the high-performance
data structures ... like it is a misconception that `real-time' means `really
fast' ;)

> * The documentation is improved to describe the implementation
> and to provide rationale for design decisions.

agreed

> * The library is not added to Boost until Boost.Atomic has been
> reviewed and approved (or, available std::atomic implementations work
> with it).

this has already been stated in the announcement of the review: boost.lockfree
can be accepted, but it won't be merged before boost.atomic is accepter.

cheers, tim

[1] http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-
for-nothing




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk