Boost logo

Boost :

Subject: Re: [boost] [atomic] Help understanding consume order
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2014-06-02 07:42:04


On Sun, Jun 1, 2014 at 11:58 AM, Andrey Semashev <andrey.semashev_at_[hidden]>
wrote:

> Hi,
>
> I'm reviewing (again) Boost.Atomic code and struggling to understand the
> consume order and in particular what should it mean on architectures other
> than DEC Alpha.
>
> I read the explanation here:
>
> http://en.cppreference.com/w/cpp/atomic/memory_order
>
> but the point eludes me. Take ARM for example and the explanation in the
> "Release-Consume ordering" section. The producer thread allocates the
> string
> and stores the pointer with a release operation, so that the pointer, the
> string contents and the 'data' integer are visible to other threads.
>
> Now the consumer thread reads the pointer with a consume operation.
> According
> to the explanation in the article, on ARM the consume operation need not
> issue
> any specific fences to be able to use the pointer and the string body. In
> that
> case, the consume operation becomes equivalent to relaxed (plus prohibiting
> compiler optimizations). But is there a guarantee that the string body
> will be
> visible to the consumer? Shouldn't the consume operation be promoted to
> acquire instead?
>
>
ARM and many other RMO architectures (like PPC and unlike Alpha), guarantee
that a load and the load it depends on won't be reordered, so, together
with the release operation on the writer side, the load_consume guarantees
the visibility of the string body.

The exact definition of load dependency (basically the address of the
dependent load is computed as a function of the value returned by the
depending load) is defined at the instruction level and is quite tricky to
recover at the high level C++ language. C++11 tried to do it, but according
to a few the current working is both very hard to implement and both not
strong enough and too strict in some cases.

In the meantime GCC (and a few other compilers) punts on load_consume and
simply promotes it to load_acquire.

Note that x86, a TSO machine, has even stronger guarantees, any load is a
load_aquire.

> I guess, that's the ultimate question: how should consume ordering be
> handled
> on conventional architectures.

That's hard to do without compiler help unfortunately. Compilers have
started doing some quite aggressive optimisations (like value speculation
and PGO) that can break loads dependencies. The linux kernel for example
gets by by explicitly disabling those optimisations, not doing PGO and
targeting a specific compiler.

See n2664 and the recent epic thread on gcc-dev.

HTH,

-- gpd


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk