Boost logo

Boost :

Subject: Re: [boost] [lockfree] review
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-08-08 11:03:41


on Sat Aug 06 2011, Gottlob Frege <gottlobfrege-AT-gmail.com> wrote:

> On Mon, Aug 1, 2011 at 10:49 AM, Dave Abrahams <dave_at_[hidden]> wrote:
>>
>> on Mon Aug 01 2011, Gordon Woodhull <gordon-AT-woodhull.com> wrote:
>>
>>> On Jul 31, 2011, at 1:05 PM, Dave Abrahams wrote:
>>>
>>>>
>>>> I don't get it.  Please, show me an example.
>>>
>>> I am definitely not an expert.  I'll quote a diagram from this
>>> fascinating paper by Adve and Boehm [1], using Dekker's mutual
>>> exclusion algorithm as an example
>>>
>>> Initially X = Y = 0
>>> Thread 1
>>> X = 1;
>>> r1 = Y;
>>>
>>> Thread 2
>>> Y = 1;
>>> r2 = X;
>>>
>>> Can r1 = r2 = 0?
>>
>> Sure they can, that's a race condition: Thread1 modifies X while thread
>> 2 reads it.  Anything could happen.  Why would anyone expect otherwise.
>>
>
> Most people are surprised that *both* r1 and r2 can be 0 at the same time.

"At the same time" is ill-defined until you put in some kind of
synchronization. If you synchronize those two threads at the end I
expect all their effects to have taken place. But they would still have
a race condition so I wouldn't expect any particular result.

> We have been trained that our code (single threaded) happens
> sequentially. So
> Y = 1;
> r2 = X;
>
> means "set Y **THEN** read X" - *in that order*

But it doesn't mean that!

> now, really, we don't know or care about the true order (from the
> point of view of this CPU) until we later read it (from this CPU), but
> that's how we imagine it. In the single-threaded world we grew up in.

Who is "we," Kimosabe?

I *really* don't think about code that way. I generally try to avoid
writing code that makes me think about the order in which things happen.

> So imagine the race you mention - thread1 is writing X while thread2
> is reading it. In the traditional view - ie assuming a thread of code
> happens in the same order that it was written - this means:
>
> A. thread1 has not yet read Y (thread1 is currently writing X,
> and reading Y happens next)
> B. thread2 has finished setting Y (because it is currently reading
> X, which happens after setting Y)
> ----
> C. therefore thread1 *must* read Y after it was set, therefore it will
> see Y==1, therefore r1 == 1.

But my point is that even if this reasoning held, it is a hopelessly
un-scalable way to think about parallel execution. I can't reason that
way about the behavior of a complex system without having a nervous
breakdown, and I don't believe it's tractable for most other people
either. That's why I'm saying that sequential consistency is not a
game-changer in terms of reasoning about parallelism.

> If you grew up coding in different circumstances (you once mentioned
> doing lots of music/sound stuff early on - which tends to be very
> lock-free-ish), then maybe you don't make the traditional "N happens
> before N+1" assumptions.

I don't think I ever had to confront the issue.

> Having said all that, it doesn't mean thinking about re-orderings
> helps anything. That I agree with. The key isn't "here's more to
> think about" (reordering), it is "here's less to think about" - ie
> less to assume - stop thinking a single thread happens exactly one
> line after the other. But this seems to be hard to remember NOT to
> think about, since we've been thinking that way for years.
>
> It is better to think about how writing shared variables exposes
> state, and whether that correctly exposes only invariants. You may
> already be thinking this way.

Yes, that's exactly what I'm saying.

> eg:
> Imagine the invariant is assert(y >= x). Imagine p is local, and the
> invariant only need to hold once p is exposed globally.
>
> p->x = p->y = 0; // invariant OK
> ...
> p->x = 10; // invariant temporarily broken, but local, so OK
> p->y = 20; // invariant restored
> // can now expose:
> globalP = p; // globally expose state here, after invariant is ensured.
>
> Thinking of "invariant exposure" makes it simple to realize that the
> globalP = p is the point of exposure (or "publication"). For
> threading and our memory model, this is also exactly where the barrier
> is needed (if you want to think about barriers) and exactly where the
> atomic operation is needed (or where the mutex can be unlocked).

Yes.

> So just make globalP atomic. No need to think about reordering.

Only if the atomic comes with the barriers that ensure the writes to *p
have happened before globalP is modified... which is what sequential
consistency (the default for atomics in C++11) provides.

> But if people ask why the point of exposure is so important, or why
> globalP needs to be atomic or how it could possibly fail if globalP
> was just a pointer, you need to explain that p->y = 20 can happen (or
> be seen) after globalP = p. Which is like the r1 == r2 == 0 example -
> it goes against tradition.

No you don't. All you need to say is that *every* datastructure (even a
pointer like globalP) has an invariant that is broken during
modification. The implication is that unless globalP is atomic the
other thread might see a broken value of it and then all bets are off.

> Most people would think that once globalP has been set, then
> globalP->y = 20. Since that's the order the lines of code are written
> in. But that's only true if globalP is an atomic.

You don't even need to think as far as globalP->y if you assume globalP
can be totally broken unless access is synchronized.

> Once they believe and/or understand that "invariant exposure happens
> on shared variables therefore shared variables must be atomic", then
> reordering can be forgotten. You can then think of your code as
> blocks that happen between read/writes of atomics (ie atomics are the
> sequence points). Just like critical sections are blocks of code.
> You have blocks defined by published-invariant sequence points. Or
> something like that.

Something like that. *If* programming with atomics is for mortals.

> So I agree "sequential consistency" doesn't help much. Nor does
> reordering. Think about publishing invariants.

I think about "not exposing broken invariants."

> (Actually what "sequential consistency" of the atomics means is that
> you can now reason about the large code blocks defined by those
> sequence points, ignoring their details. So that is useful.)
>
> Tony
>
> (For the pedantic: in this example, sequential consistency was
> overkill. we only needed "release" semantics on globalP = p)

Right.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk