Boost logo

Boost :

From: Kim Barrett (kab_at_[hidden])
Date: 2007-09-09 13:52:17


At 11:11 AM +0200 9/9/07, Corrado Zoccolo wrote:
> when performance tuning a really tight loop in a simple program, I
> found that reading the value of an atomic count is unexpectedly slow
> on a gcc platform.
>
> [...]
>
> Is there a compelling reason to use the locked operation with gcc, or
> a simpler volatile access can serve the same purpose?
>
> [...]
>
> Do you see any drawback in changing the access to the counter to a
> simple volatile access, at least when the platform is known to be an
> IA32?

Don't do that. It won't work properly on a multi-processor
system. Memory barriers are needed to ensure correct operation on such
systems, and gcc (x86) does not generate a memory barrier for a
volatile load.

> [... quoting existing implementation for gcc ...]
> operator long() const
> {
> return __exchange_and_add(&value_, 0);
> }

The use of __exchange_and_add here is a way to perform a load-acquire
operation (a somewhat clumsy way, presumably necessary in the absence
of a more direct (and possibly better performing) mechanism). The
"acquire" qualifier indicates the kind of memory barrier needed.

> I checked other implementations, and for example solaris has the
>much lighter:
> operator uint32_t() const
> {
> return static_cast<uint32_t const volatile &>( value_ );
> }

Because the (current) standard does not address threads and such at
all, different implementations have associated different semantics
with "volatile" in the presence of threads. I expect that *on
solaris* one would find a memory barrier generated for this code
sequence.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk