Boost logo

Boost :

Subject: Re: [boost] [lockfree] review
From: Hans Boehm (Hans.Boehm_at_[hidden])
Date: 2011-10-02 14:09:22


Alexander Terekhov <terekhov <at> web.de> writes:

>
>
> Hans Boehm wrote:
> >
> > Alexander Terekhov <terekhov <at> web.de> writes:
> >
> > >
> > >
> > > Hans Boehm wrote:
> > > [...]
> > > > Although the existence of such an algorithm is interesting, the point
is
> > that
> > > > avoiding the equivalent of the sequentuially consistent store here is
> > > > incredibly complicated, appears to require scheduler help and, at
least for
> > > > me, would require careful study of the proof before I were willing to
> > > > implement it. On the other hand, the algorithm using the equivalent of
> > > > sequentially consistent atomics (which I think is still far more widely
> > used)
> > > > is orders of magnitude simpler. If you understand the complex
algorithm,
> > we
> > > > assume you can add the ordering constraints corectly.
> > >
> > > With waiters bit (0x80000000) and atomic RMW, mutex is simply:
> > > (pseudo-code below)
> > >
> > > WHILE atomic_bit_test_set_hlb(&lock, 1) // efficient acquire MO
> > > lock_queue_wait(&lock, 1) // wait if locked bit is set
> > >
> > > for lock() and
> > >
> > > uintptr_t lock_queue
> > > IF atomic_decrement_rel(lock_queue = &lock) // release MO
> > > THEN lock_queue_wake(lock_queue, 1)
> > I'm not sure I understand your syntax here.
> >
> > I don't see the correctness argument. What prevents the sequence:
> >
> > 1. In thread T0, performing unlock, atomic_decr_rel update completes, sees
no
> > waiters, puts updated (released) lock word in store buffer. (I'm
informally
> > viewing memory_order_release as just putting a value in a store buffer,
which
> > I think is reasonable here.)
> > 2. Thread T1 tries to acquire the lock, sees it still held, since store
buffer
> > hasn't drained. Enqueues itself.
> > 3. Thread T0s store buffer drains. Thread T1 is now waiting on an unlocked
> > lock.
>
> lock_queue_wait() is meant to block the calling thread and set waiters
> bit only if locked bit is actually set (an RMW operation).
>
> The races for setting/clearing bits are resolved by RMWs on the same
> (&lock) memory location.
>
> The situation is no different when (&lock) memory location would be
> protected by a lock except that with RMWs the program is lock-free on
> fast paths.
>
I think I see what you're doing. By embedding the state of the queue into the
same atomic object as the lock set bit, you can rely on single-location cache
coherence to get the right ordering. That can be made to work for the low-
level bit-twiddling implementation, but not for a more generic approach that
keeps the queue and the lock-bit itself separate. I would still much rather
see a default that works for the full solution space. I prefer to have people
like us who are sometimes willing to go through subtle reasoning about memory
ordering add the annotations, rather than insisting that the average
programmer who has never heard of these issues add them to avoid subtle bugs.

Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk