|
Boost : |
Subject: Re: [boost] [lockfree] review
From: Gottlob Frege (gottlobfrege_at_[hidden])
Date: 2011-08-06 01:14:50
On Mon, Aug 1, 2011 at 10:49 AM, Dave Abrahams <dave_at_[hidden]> wrote:
>
> on Mon Aug 01 2011, Gordon Woodhull <gordon-AT-woodhull.com> wrote:
>
>> On Jul 31, 2011, at 1:05 PM, Dave Abrahams wrote:
>>
>>>
>>> I don't get it. Please, show me an example.
>>
>> I am definitely not an expert. I'll quote a diagram from this
>> fascinating paper by Adve and Boehm [1], using Dekker's mutual
>> exclusion algorithm as an example
>>
>> Initially X = Y = 0
>> Thread 1
>> X = 1;
>> r1 = Y;
>>
>> Thread 2
>> Y = 1;
>> r2 = X;
>>
>> Can r1 = r2 = 0?
>
> Sure they can, that's a race condition: Thread1 modifies X while thread
> 2 reads it. Anything could happen. Why would anyone expect otherwise.
>
Most people are surprised that *both* r1 and r2 can be 0 at the same time.
We have been trained that our code (single threaded) happens sequentially. So
Y = 1;
r2 = X;
means "set Y **THEN** read X" - *in that order*
now, really, we don't know or care about the true order (from the
point of view of this CPU) until we later read it (from this CPU), but
that's how we imagine it. In the single-threaded world we grew up in.
So imagine the race you mention - thread1 is writing X while thread2
is reading it. In the traditional view - ie assuming a thread of code
happens in the same order that it was written - this means:
A. thread1 has not yet read Y (thread1 is currently writing X,
and reading Y happens next)
B. thread2 has finished setting Y (because it is currently reading
X, which happens after setting Y)
---- C. therefore thread1 *must* read Y after it was set, therefore it will see Y==1, therefore r1 == 1. And then do the same logic for the opposite case (exchanging which thread goes first), and do the logic for any other order of the theads - as long as you assume a single thread happens in the order the code is listed in, you will find that one of the r's always comes out == 1 (if not both). But never both == 0. So our tradition of "code-line-N happens before code-line-N+1" plus our logic (that all programmers apply all day) concludes that r1 == 0 and r2 == 0 can't happen. If you grew up coding in different circumstances (you once mentioned doing lots of music/sound stuff early on - which tends to be very lock-free-ish), then maybe you don't make the traditional "N happens before N+1" assumptions. Having said all that, it doesn't mean thinking about re-orderings helps anything. That I agree with. The key isn't "here's more to think about" (reordering), it is "here's less to think about" - ie less to assume - stop thinking a single thread happens exactly one line after the other. But this seems to be hard to remember NOT to think about, since we've been thinking that way for years. It is better to think about how writing shared variables exposes state, and whether that correctly exposes only invariants. You may already be thinking this way. eg: Imagine the invariant is assert(y >= x). Imagine p is local, and the invariant only need to hold once p is exposed globally. p->x = p->y = 0; // invariant OK ... p->x = 10; // invariant temporarily broken, but local, so OK p->y = 20; // invariant restored // can now expose: globalP = p; // globally expose state here, after invariant is ensured. Thinking of "invariant exposure" makes it simple to realize that the globalP = p is the point of exposure (or "publication"). For threading and our memory model, this is also exactly where the barrier is needed (if you want to think about barriers) and exactly where the atomic operation is needed (or where the mutex can be unlocked). So just make globalP atomic. No need to think about reordering. But if people ask why the point of exposure is so important, or why globalP needs to be atomic or how it could possibly fail if globalP was just a pointer, you need to explain that p->y = 20 can happen (or be seen) after globalP = p. Which is like the r1 == r2 == 0 example - it goes against tradition. Most people would think that once globalP has been set, then globalP->y = 20. Since that's the order the lines of code are written in. But that's only true if globalP is an atomic. Once they believe and/or understand that "invariant exposure happens on shared variables therefore shared variables must be atomic", then reordering can be forgotten. You can then think of your code as blocks that happen between read/writes of atomics (ie atomics are the sequence points). Just like critical sections are blocks of code. You have blocks defined by published-invariant sequence points. Or something like that. So I agree "sequential consistency" doesn't help much. Nor does reordering. Think about publishing invariants. (Actually what "sequential consistency" of the atomics means is that you can now reason about the large code blocks defined by those sequence points, ignoring their details. So that is useful.) Tony (For the pedantic: in this example, sequential consistency was overkill. we only needed "release" semantics on globalP = p)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk