Boost logo

Boost :

Subject: Re: [boost] [lock-free] CDS -yet another lock-free library
From: Khiszinsky, Maxim (Maxim.Khiszinsky_at_[hidden])
Date: 2010-03-31 06:00:55


Helge Bahmann wrote

>> Second solution: use 64bit CAS to load/store 64bit values on x86. It
>> seems too heavy for just loading/storing it isn't?

> this is actually what I do in Boost.Atomic; I *think* it is cheaper than
> shuffling around the values between SSE and general purpose registers (it
> sure is cheaper than MMX considering you also have to issue emms)

It's easy to test!
Express test: CDS's RecursiveSpinLock<atomic64_t> (load64 is used actively by TATAS algo for busy wait when CAS acquiring the lock is failed)
Equipment: WinXP Intel Core2 (3GHz, 2 core, no HT), MSVC++ 2008, release build with full optimization

SSE2 load64:
static inline atomic64_t load64( atomic64_t volatile const * pMem ) {
  __m128i volatile v = _mm_loadl_epi64( (__m128i const *) pMem ) ;
  return v.m128i_i64[0] ;
}

result (one of, average):
Spinlock_MT::recursiveSpinLock64
           Lock test, thread count=8 loop per thread=1000000...
             Duration=2.21852

CAS64 load64 (no CAS loop):
static inline atomic64_t load64( atomic64_t volatile const * pMem ) {
  atomic64_t cur = 0 ;
  return _InterlockedCompareExchange64( const_cast<atomic64_t volatile *>(pMem), cur, cur ) ;
}

result (one of, average):
Spinlock_MT::recursiveSpinLock64
           Lock test, thread count=8 loop per thread=1000000...
             Duration=2.79662

+20% performance for SSE2. Not so bad I wait more :)
Unfortunately, I have no access to multi-processor Win32 server for testing now.
Note, boost.atomic uses CAS-based loop for load64, so, I think the performance gain could be more.

Regards, Max


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk