|
Boost : |
Subject: Re: [boost] Notice: Boost.Atomic (atomic operations library)
From: Helge Bahmann (hcb_at_[hidden])
Date: 2009-11-30 12:08:50
Hi Phil!
Thanks for your interest, and I appreciate any help for Arm, as I don't have
this architecture available.
Am Monday 30 November 2009 17:02:14 schrieb Phil Endecott:
[snip]
> Architecture v6 introduced 32-bit load-locked/store-conditional
> instructions. Architecture v7 introduced 16- and 8-bit versions.
The library already has infrastructure in place to emulate 8- and 16-bit
atomics by "embedding" them into a properly aligned 32-bit atomic
(created "on the fly" through appropriate pointer casts). FWIW ppc and Alpha
require this already, as they do not have 8/16-bit ll/sc. This is of course
slower than native 8-/16-bit versions, but is workable.
I will shortly be adding a small howto on adding platform support to the
library.
> ARM Linux has kernel support that provides compare-and-swap even on
> processors that don't support it by guaranteeing to not interrupt code
> in certain address ranges. This has the cost of a function call, i.e.
> it's slower than inline assembler but a lot faster than a system call.
> Kernels that don't support this are now sufficiently old that I think
> they can be ignored. Newer versions of gcc may use this mechanism when
> the atomic builtins are used, but versions of gcc that don't do this
> are sufficiently widespread that they should still be supported
> efficiently.
these functions are part of libc, glibc or the vdso?
> I believe that OS X on ARM (i.e. the iPhone) always runs on
> architecture v6 or newer. However Apple supply a version of gcc that
> is too old to support ARM atomics via the builtins. The "recommended"
> way to do atomics is via a set of function calls described here:
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPa
>ges/man3/atomic.3.html I have not looked at what these functions do or tried
> to benchmark them. They are also available on other OS X platforms.
these should easily be usable, but
- the *Barrier versions are still stronger than what is required (see below)
- there are no "Load with Barrier" and "Store with Barrier" operations, these
would have to be emulated with compare_exchange
> I note that you don't seem to use the gcc atomic builtins even on
> platforms where they have worked for a while e.g. x86. Any reason for
> that?
on x86 it would not matter; on all other platforms, the intrinsics have the
unfortunate side-effect of always acting as (usually bi-directional) memory
barriers. There are however legitimate use cases, for example the following
operation (equivalent to __sync_fetch_and_add):
atomic<int>::fetch_add(1, memory_order_acq_rel)
is 2 to 3 times slower on ppc than the version not enforcing memory ordering:
atomic<int>::fetch_add(1, memory_order_relaxed)
If you always use fully-fenced versions, then any lock-free algorithm will
usually be noticeably *slower* than the platform's native mutex lock/unlock
operation (which use only the weakest barriers necessary), making the whole
exercise rather pointless.
Cheers Helge
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk