Boost :

gottlobfrege_at_[hidden]

----
C. therefore thread1 *must* read Y after it was set, therefore it will
see Y==1, therefore r1 == 1.
And then do the same logic for the opposite case (exchanging which
thread goes first), and do the logic for any other order of the theads
- as long as you assume a single thread happens in the order the code
is listed in, you will find that one of the r's always comes out == 1
(if not both).  But never both == 0.
So our tradition of "code-line-N happens before code-line-N+1" plus
our logic (that all programmers apply all day) concludes that r1 == 0
and r2 == 0 can't happen.
If you grew up coding in different circumstances (you once mentioned
doing lots of music/sound stuff early on - which tends to be very
lock-free-ish), then maybe you don't make the traditional "N happens
before N+1" assumptions.
Having said all that, it doesn't mean thinking about re-orderings
helps anything.  That I agree with.  The key isn't "here's more to
think about" (reordering), it is "here's less to think about" - ie
less to assume - stop thinking a single thread happens exactly one
line after the other.  But this seems to be hard to remember NOT to
think about, since we've been thinking that way for years.
It is better to think about how writing shared variables exposes
state, and whether that correctly exposes only invariants. You may
already be thinking this way.
eg:
Imagine the invariant is assert(y >= x).  Imagine p is local, and the
invariant only need to hold once p is exposed globally.
p->x = p->y = 0;  // invariant OK
...
p->x = 10;          // invariant temporarily broken, but local, so OK
p->y = 20;          // invariant restored
// can now expose:
globalP = p; // globally expose state here, after invariant is ensured.
Thinking of "invariant exposure" makes it simple to realize that the
globalP = p is the point of exposure (or "publication").  For
threading and our memory model, this is also exactly where the barrier
is needed (if you want to think about barriers) and exactly where the
atomic operation is needed (or where the mutex can be unlocked).
So just make globalP atomic.  No need to think about reordering.  But
if people ask why the point of exposure is so important, or why
globalP needs to be atomic or how it could possibly fail if globalP
was just a pointer, you need to explain that p->y = 20 can happen (or
be seen) after globalP = p.  Which is like the r1 == r2 == 0 example -
it goes against tradition.
Most people would think that once globalP has been set, then
globalP->y = 20. Since that's the order the lines of code are written
in. But that's only true if globalP is an atomic.
Once they believe and/or understand that "invariant exposure happens
on shared variables therefore shared variables must be atomic", then
reordering can be forgotten.  You can then think of your code as
blocks that happen between read/writes of atomics (ie atomics are the
sequence points).  Just like critical sections are blocks of code.
You have blocks defined by published-invariant sequence points.  Or
something like that.
So I agree "sequential consistency" doesn't help much.  Nor does
reordering.  Think about publishing invariants.
(Actually what "sequential consistency" of the atomics means is that
you can now reason about the large code blocks defined by those
sequence points, ignoring their details.  So that is useful.)
Tony
(For the pedantic: in this example, sequential consistency was
overkill.  we only needed "release" semantics on globalP = p)