Boost logo

Boost :

From: Corrado Zoccolo (czoccolo_at_[hidden])
Date: 2007-09-09 05:11:08


Hi boosters,
when performance tuning a really tight loop in a simple program, I
found that reading the value of an atomic count is unexpectedly slow
on a gcc platform.
I checked the code, and the implementation used, (I have a checkout of
some month ago of boost HEAD, when there was still the cvs in place),
is:

    operator long() const
    {
      return __exchange_and_add(&value_, 0);
    }

I checked other implementations, and for example solaris has the much lighter:
    operator uint32_t() const
    {
        return static_cast<uint32_t const volatile &>( value_ );
    }

Is there a compelling reason to use the locked operation with gcc, or
a simpler volatile access can serve the same purpose?

A simple test program shows a large overhead for the locked operation.

#include <boost/timer.hpp>
#include <boost/detail/atomic_count.hpp>
#include <iostream>

namespace cz {
  using __gnu_cxx::__atomic_add;
  using __gnu_cxx::__exchange_and_add;

class atomic_count
{
public:

    explicit atomic_count(long v) : value_(v) {}

    void operator++()
    {
        __atomic_add(&value_, 1);
    }

    long operator--()
    {
        return __exchange_and_add(&value_, -1) - 1;
    }

    operator long() const
    {
      return static_cast<_Atomic_word volatile const &>(value_);
    }

private:

    atomic_count(atomic_count const &);
    atomic_count & operator=(atomic_count const &);

    mutable _Atomic_word value_;
};
}

int main(int argc, char **argv) {
  {
  boost::timer t;
  for(boost::detail::atomic_count i(0);i<10000000;++i);
  std::cout<<t.elapsed()<<std::endl;
  }
  {
  boost::timer t;
  for(cz::atomic_count i(0);i<10000000;++i);
  std::cout<<t.elapsed()<<std::endl;
  }
}

Compiled with g++ -O3, and run on a Linux box equipped with Pentium IV
2.8GHz (HT disabled), I obtain the following result:
[corrado_at_et2 test]$ ./a.out
1.4
0.75

i.e. the "for" that uses the volatile access is almost twice as fast
than the other one.

Do you see any drawback in changing the access to the counter to a
simple volatile access, at least when the platform is known to be an
IA32?

Corrado

-- 
__________________________________________________________________________
dott. Corrado Zoccolo                          mailto:zoccolo_at_[hidden]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk