Subject: Re: [boost] Notice: Boost.Atomic (atomic operations library)
From: Helge Bahmann (hcb_at_[hidden])
Date: 2009-12-04 06:02:37
On Thu, 3 Dec 2009, Phil Endecott wrote:
>>> 1. Linux kernel provided memory-barrier and CAS operations (only);
>> does any of these arm platforms (this is pre-v6 probably?) actually support
>> smp? if not, then the barriers will probably be NOPs
> I think the barrier is a DMB instruction, but in principle the kernel could
> put nothing there on uniprocessors. There's still the small overhead of the
> call, which we could consider omitting if we were certain that it was a
out of curiosity -- DMB also enforces ordered MMIO access? This would be
stronger than required.
If this is always an "emulated" CAS then I don't think DMB would be
required under any circumstances -- if the system is uni-processor, then
obviously no barrier is required. If it is multi-processor, then the
emulation requires an internal spin-lock in the kernel, which must itself
already include sufficient memory barriers.
> Anyway, in this case I think I need to implement load, store and
> compare_exchange_weak using the kernel-provided functions and add your
> __build_atomic_from_minimal and __build_atomic_from_larger_type on top.
I'm not sure if the kernel-provided CAS is restarted or aborted on
interruption, if it is restarted then it will not fail spuriously and
qualifies for compare_exchange_strong -- in that case I would recommend to
additionally manually implement "exchange", have c_ex_weak call
c_ex_strong and use __build_atomic_from_exchange (yes, it's not that
> (BTW, why do you use leading __s ? I was under the impression that such
> identifiers were reserved.)
habit of mine to name really internal stuff that way, I can change it if
it collides with boost coding style
>>> 2. Asm load-locked/store-conditional for words (only);
>>> 3. As 2 but also for smaller types.
>> sounds like this is going to be one of the most complicated platforms, so I
>> really appreciate your experience here...
> Would it be possible to add another set of builders that could use
> load-locked and store-conditional functions from a lower layer? This could
> reduce the amount of assembler needed.
The problem is that ll/sc are quite constrained on the architectures that
I know of -- most processors will clear the reservation established by ll
when there is a memory reference to the same cacheline before the sc, some
will do this for _any_ memory reference, so that the ll/sc loop could
effectively live-lock. I don't think it is possible to constrain the
compiler sufficiently to prevent it from accidentally inserting such
memory references if you allow C++ code between these instructions (either
-O0 builds not inlining the wrapper functions, or -O2 with very aggressive
inlining moving code in between), so I fear that exposing ll/sc will be
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk