Boost logo

Boost :

Subject: Re: [boost] [lockfree] review
From: Boehm, Hans (hans.boehm_at_[hidden])
Date: 2011-08-26 20:34:31


[Dave Abrahams pointed me at this thread.]

Alexander Terekhov <terekhov <at> web.de> writes:
 
>
> Dave Abrahams wrote:
> >
> > on Thu Aug 25 2011, Alexander Terekhov <terekhov-AT-web.de> wrote:
> >
> > > Ilya Sokolov wrote:
> > >>
> > >>
> > >> AFAIK, in the example above another thread can't observe reordering
> > >> without introducing UB.
> > >
> > > I meant "observed" as in "check the sequence of writes in the generated
> > > code".
> >
> > Do we care about what the generated code looks like? IIUC what counts
> > is the sequence of memory states that another another thread can see.
> >
> > > But int can be simply changed to relaxed atomic<int> so that another
> > > thread can observe reordering without UB.
> >
> > Sure.
>
> I meant that for the example upthread, with an int instead of relaxed
> atomic<int>, another thread's code triggering UB does not prevent us
> from observing reordered writes in the writing thread's code (in
> generated code), by looking at it.
>
> Changing int to relaxed atomic<int> simply removes UB but does not
> change the relevant code in the writing thread.
>
> regards,
> alexander.
>

I agree with Alexander and most of the recent postings. If I write

mx.lock();
x = 1;
mx.unlock();
my.lock();
y = 1;
my.unlock();

There is nothing to prevent the stores to x and y from becoming visible out of
order at the hardware level. Enforcing ordering here would add needless and
substantial cost. But a data-race-free program cannot observe such
reordering. (One reference for that is Sarita Adve's and my PLDI 08 paper.
Unofficial TR version at http://www.hpl.hp.com/techreports/2008/HPL-2008-
56.html. I believe Mark Batty et al have a much more complete and formal
proof.)

If we change x and y to be atomic variables, and the stores to be
memory_order_relaxed, then an observer can tell the difference. I.e. if x and
y are initially zero, and another thread does

r1 = y;
r2 = x;

(or the acquire versions), it can see r1 = 1 and r2 = 0, since there are no
synchronizes-with relationships between threads, and thus nothing to enforce
ordering. If you use memory_order_relaxed, or even the acquire/release forms,
things do get MUCH more complicated.

The equivalent rules for other languages are an interesting story. I believe
Posix implentors assume the above rules. I have a PPoPP 2007 paper that
explains why that's not exactly what Posix actually states. OpenMP 3.0 has
stricter rules which are also not consistently obeyed, and are somewhat fixed
in OpenMP 3.1. (See my MSPC 11 paper for some details.) Java has arguably
the same rules as C++ in this respect.

In my mind, the main way in which sequential consistency differs from the
cheaper acquire/release semantics that Alexander wants is that

a) Dekker's algorithm (which by itself isn't that practically interesting),
plus a number of real lock-free algorithms (which do matter) don't work.

b) In my mind, acquire/release is a far more difficult model to explain to non-
experts than the model for C++ with only sequentially consistent atomics. You
can no longer reason about interleaving thread actions, either for purposes of
determining the existence of a data-race, nor when defining what data-race-
free programs mean.

This was extensively discussed in C++ meetings starting in about 2005. The
current solution with default-sequentially-consistent atomics, but explicit
support for weaker ordering, was the resulting committee compromise.

Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk