Boost logo

Boost :

Subject: Re: [boost] [lockfree] review
From: Hans Boehm (Hans.Boehm_at_[hidden])
Date: 2011-08-31 14:05:45


Alexander Terekhov <terekhov <at> web.de> writes:

>
>
> Hans Boehm wrote:
> [...]
> > > I'm really interested to know the exact number/names of algorithms.
> > I asked Doug Lea this question 3 or 4 years ago when this was still being
> > actively discussed. He's more of an expert on practical concurrent
> > algorithms. I got back a bunch of answers. One simple and common example
is
> > the algorithm for releasing a queued lock. I need to set the lock state to
> > unlocked and then check the queue for waiters. If those are reordered, I
get
> > lost wakeup problems. ...
>
> Dejavu...
>
> http://www.mailarchive.ca/lists/comp.unix.solaris/2003-12/0937.html#start937
>
> The relevant part is:
>
> ------
> > > > and subsequent store-load barrier + load to check whether you have a
> > > > blocked waiter to unblock? Store-load barrier is known to be "the most
> > > > expensive"...
> > >
> > > The store-load barrier is avoided by carefully setting up the sleeping
and
> > > blocking interactions so that the above code + the slow paths can
> > > never loose a waiter.
> >
> > And you can do it without atomic read-{modify-}write on the fast
> > path in mutex_exit()? Hmm. Please elaborate.
>
> Casper already got this, but here:
>
> 1. As long as the thread is on-CPU, you spin. Blocking is a four-stage
> process:
> 2. atomically set the waiters bit,
> 3. re-scan all of the other CPUs on the system, checking to make sure
> the owner is still not running on them.
> 4. check that the lock hasn't changed while you were looking.
> 5. Finally, block. If any of the above steps fail, you go back to
> spinning.
> 6. if a thread is in the "critical section" (i.e. after the load and
> before the clear) when we are resuming it (either from an interrupt or
> in a context switch), we back up the program counter to the beginning of
> mutex_exit.
>
> There are further complications, but that's the crucial bit.
>
> • jonathan
>
Although the existence of such an algorithm is interesting, the point is that
avoiding the equivalent of the sequentuially consistent store here is
incredibly complicated, appears to require scheduler help and, at least for
me, would require careful study of the proof before I were willing to
implement it. On the other hand, the algorithm using the equivalent of
sequentially consistent atomics (which I think is still far more widely used)
is orders of magnitude simpler. If you understand the complex algorithm, we
assume you can add the ordering constraints corectly.

Interestingly, there is a 2011 POPL paper by Attiya et al that proves, IIRC,
that store-load ordering or atomic RMW operations are unavoidable in some
cases. I haven't looked into this enough to understand how to reconcile that
with the above result. The kind of scheduler hack needed in step 6 above is
probably outside their model.

Hans

Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk