Boost logo

Boost :

Subject: Re: [boost] [lockfree] review
From: Gottlob Frege (gottlobfrege_at_[hidden])
Date: 2011-08-06 01:14:50


On Mon, Aug 1, 2011 at 10:49 AM, Dave Abrahams <dave_at_[hidden]> wrote:
>
> on Mon Aug 01 2011, Gordon Woodhull <gordon-AT-woodhull.com> wrote:
>
>> On Jul 31, 2011, at 1:05 PM, Dave Abrahams wrote:
>>
>>>
>>> I don't get it.  Please, show me an example.
>>
>> I am definitely not an expert.  I'll quote a diagram from this
>> fascinating paper by Adve and Boehm [1], using Dekker's mutual
>> exclusion algorithm as an example
>>
>> Initially X = Y = 0
>> Thread 1
>> X = 1;
>> r1 = Y;
>>
>> Thread 2
>> Y = 1;
>> r2 = X;
>>
>> Can r1 = r2 = 0?
>
> Sure they can, that's a race condition: Thread1 modifies X while thread
> 2 reads it.  Anything could happen.  Why would anyone expect otherwise.
>

Most people are surprised that *both* r1 and r2 can be 0 at the same time.

We have been trained that our code (single threaded) happens sequentially. So
Y = 1;
r2 = X;

means "set Y **THEN** read X" - *in that order*

now, really, we don't know or care about the true order (from the
point of view of this CPU) until we later read it (from this CPU), but
that's how we imagine it. In the single-threaded world we grew up in.

So imagine the race you mention - thread1 is writing X while thread2
is reading it. In the traditional view - ie assuming a thread of code
happens in the same order that it was written - this means:

A. thread1 has not yet read Y (thread1 is currently writing X,
and reading Y happens next)
B. thread2 has finished setting Y (because it is currently reading
X, which happens after setting Y)

----
C. therefore thread1 *must* read Y after it was set, therefore it will
see Y==1, therefore r1 == 1.
And then do the same logic for the opposite case (exchanging which
thread goes first), and do the logic for any other order of the theads
- as long as you assume a single thread happens in the order the code
is listed in, you will find that one of the r's always comes out == 1
(if not both).  But never both == 0.
So our tradition of "code-line-N happens before code-line-N+1" plus
our logic (that all programmers apply all day) concludes that r1 == 0
and r2 == 0 can't happen.
If you grew up coding in different circumstances (you once mentioned
doing lots of music/sound stuff early on - which tends to be very
lock-free-ish), then maybe you don't make the traditional "N happens
before N+1" assumptions.
Having said all that, it doesn't mean thinking about re-orderings
helps anything.  That I agree with.  The key isn't "here's more to
think about" (reordering), it is "here's less to think about" - ie
less to assume - stop thinking a single thread happens exactly one
line after the other.  But this seems to be hard to remember NOT to
think about, since we've been thinking that way for years.
It is better to think about how writing shared variables exposes
state, and whether that correctly exposes only invariants. You may
already be thinking this way.
eg:
Imagine the invariant is assert(y >= x).  Imagine p is local, and the
invariant only need to hold once p is exposed globally.
p->x = p->y = 0;  // invariant OK
...
p->x = 10;          // invariant temporarily broken, but local, so OK
p->y = 20;          // invariant restored
// can now expose:
globalP = p; // globally expose state here, after invariant is ensured.
Thinking of "invariant exposure" makes it simple to realize that the
globalP = p is the point of exposure (or "publication").  For
threading and our memory model, this is also exactly where the barrier
is needed (if you want to think about barriers) and exactly where the
atomic operation is needed (or where the mutex can be unlocked).
So just make globalP atomic.  No need to think about reordering.  But
if people ask why the point of exposure is so important, or why
globalP needs to be atomic or how it could possibly fail if globalP
was just a pointer, you need to explain that p->y = 20 can happen (or
be seen) after globalP = p.  Which is like the r1 == r2 == 0 example -
it goes against tradition.
Most people would think that once globalP has been set, then
globalP->y = 20. Since that's the order the lines of code are written
in. But that's only true if globalP is an atomic.
Once they believe and/or understand that "invariant exposure happens
on shared variables therefore shared variables must be atomic", then
reordering can be forgotten.  You can then think of your code as
blocks that happen between read/writes of atomics (ie atomics are the
sequence points).  Just like critical sections are blocks of code.
You have blocks defined by published-invariant sequence points.  Or
something like that.
So I agree "sequential consistency" doesn't help much.  Nor does
reordering.  Think about publishing invariants.
(Actually what "sequential consistency" of the atomics means is that
you can now reason about the large code blocks defined by those
sequence points, ignoring their details.  So that is useful.)
Tony
(For the pedantic: in this example, sequential consistency was
overkill.  we only needed "release" semantics on globalP = p)

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk