Boost logo

Boost :

Subject: Re: [boost] Notice: Boost.Atomic (atomic operations library)
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-11-30 17:55:10


Helge Bahmann wrote:
> Hi Phil!
>
> Thanks for your interest, and I appreciate any help for Arm, as I don't have
> this architecture available.

Currently my ARM v4 (XScale) dev system is a bit broken, but I might be
able to fix it. I have working v6/v7 systems.

> Am Monday 30 November 2009 17:02:14 schrieb Phil Endecott:
> [snip]
>> Architecture v6 introduced 32-bit load-locked/store-conditional
>> instructions. Architecture v7 introduced 16- and 8-bit versions.
>
> The library already has infrastructure in place to emulate 8- and 16-bit
> atomics by "embedding" them into a properly aligned 32-bit atomic
> (created "on the fly" through appropriate pointer casts). FWIW ppc and Alpha
> require this already, as they do not have 8/16-bit ll/sc. This is of course
> slower than native 8-/16-bit versions, but is workable.
>
> I will shortly be adding a small howto on adding platform support to the
> library.

That will be useful.

>> ARM Linux has kernel support that provides compare-and-swap even on
>> processors that don't support it by guaranteeing to not interrupt code
>> in certain address ranges. This has the cost of a function call, i.e.
>> it's slower than inline assembler but a lot faster than a system call.
>> Kernels that don't support this are now sufficiently old that I think
>> they can be ignored. Newer versions of gcc may use this mechanism when
>> the atomic builtins are used, but versions of gcc that don't do this
>> are sufficiently widespread that they should still be supported
>> efficiently.
>
> these functions are part of libc, glibc or the vdso?

It's something provided by the kernel in a vdso-like way; I'm not sure
if it's actually vdso. For the details google for __kernel_cmpxchg
and/or look at entry-armv.S in the kernel source.

>> I believe that OS X on ARM (i.e. the iPhone) always runs on
>> architecture v6 or newer. However Apple supply a version of gcc that
>> is too old to support ARM atomics via the builtins. The "recommended"
>> way to do atomics is via a set of function calls described here:
>> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPa
>>ges/man3/atomic.3.html I have not looked at what these functions do or tried
>> to benchmark them. They are also available on other OS X platforms.
>
> these should easily be usable, but
> - the *Barrier versions are still stronger than what is required (see below)
> - there are no "Load with Barrier" and "Store with Barrier" operations, these
> would have to be emulated with compare_exchange

Since these devices are (currently) all uniprocessor, many of these
issues are (currently) unimportant.

>> I note that you don't seem to use the gcc atomic builtins even on
>> platforms where they have worked for a while e.g. x86. Any reason for
>> that?
>
> on x86 it would not matter; on all other platforms, the intrinsics have the
> unfortunate side-effect of always acting as (usually bi-directional) memory
> barriers. There are however legitimate use cases, for example the following
> operation (equivalent to __sync_fetch_and_add):
>
> atomic<int>::fetch_add(1, memory_order_acq_rel)
>
> is 2 to 3 times slower on ppc than the version not enforcing memory ordering:
>
> atomic<int>::fetch_add(1, memory_order_relaxed)
>
> If you always use fully-fenced versions, then any lock-free algorithm will
> usually be noticeably *slower* than the platform's native mutex lock/unlock
> operation (which use only the weakest barriers necessary), making the whole
> exercise rather pointless.

Right.

Cheers, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk