From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2006-12-01 06:27:55
Roland Schwarz <roland.schwarz_at_[hidden]> writes:
> I would be glad if we could (re)start a discussion about the topic.
> Perhaps I am not the only one to benefit from this.
> Following are some things I learned, but this might be wrong, and I
> would appreciate clarification. Also some questions:
> 1) atomicity (in this specialized context) is about optimizing the
> pattern: enter_critical_section; do_something; leave_critical_section;
> by making use of processor/platform specific means.
Essentially, yes. Other CPUs/threads will either see the state before the
atomic op, or after, but never a "partial" effect.
On x86, normal reads and writes to suitably-aligned 32-bit values are atomic
in this sense.
> In particular in
> presence of multiple processors. I.e. an atomic lib is primarily about
Not just about performance. It also enables the construction of the
Atomic instructions also affect visibility, which is addressed below.
> 2) atomicity better would be addressed by the compiler, given a
> suitable memory model, than as a library.
> 3) Despite 2) it would be possible to write a library, but it will
> be hard to get processor independent semantics. E.g. there is one
> concept of read/write/full memory barriers or another of acquire/release
> semantics for SMP.
I think that the memory barrier and acquire/release semantics are just two
ways of talking about the same thing.
As I understand it, on x86, the SFENCE instruction is a "Store Fence", which
is a "Write Barrier", and has "Release Semantics". Any store instructions
which happen before it on this CPU are made globally visible afterwards. No
stores instructions which occur afterwards on this CPU are permitted to be
globally visible beforehand.
Again on x86, the LFENCE instruction is a "Load Fence", which is a "Read
Barrier", and has "Acquire Semantics". Any read instructions which happen
before it on this CPU must have already completed afterwards. No loads
instructions which occur afterwards on this CPU are permitted to be executed
A full memory barrier, the MFENCE instruction on x86, does both.
There is also the concept of a "raw" atomic operation, which does not have any
impact on memory visibility, except it is either done or not done. As
described above, on x86 this applies to all suitably-aligned 32-bit reads and
Some atomic operations also incorporate a full memory barrier. On x86, these
are those ops that assert the LOCK# signal, which include XCHG (with or
without the LOCK prefix), LOCK CMPXCHG, LOCK INC and LOCK ADD, amongst others.
> 4) Does there exist a canonical set of atomic primitives, from which
> others can be built?
Yes, I'm sure there is, but I'd have to think hard to work out what the
minimal set is. I expect that there are several possible such sets.
> 5) Is it worth the effort to create a library with processor
> independent semantics, at the price of not being optimal? E.g. by doing
> away with the various kinds of barriers, instead simply requiring
> atomicity and full memory barrier semantics for the operation? Which
> operations, besides load and store would be essential?
I think it's worth the effort. For processor independence, you could just
specify that the barriers are "at least" what is specified --- if you specify
a read barrier, you might get a store barrier too, and vice versa.
> Sorry if this is not the perfect list to discuss the topic, but I think
> boost could possibly benefit from such a library, as previous
> discussions let me believe.
The details of the memory model, atomics, and visibility, and how it applies
to C++, are under discussion amongst C++ standards committee members. I would
imagine that you'd be welcome to join such discussions.
Anyway, this is important to boost, if we're going to provide a library that
-- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk