|
Boost : |
Subject: Re: [boost] [lock-free] CDS -yet another lock-free library
From: Khiszinsky, Maxim (Maxim.Khiszinsky_at_[hidden])
Date: 2010-03-31 06:00:55
Helge Bahmann wrote
>> Second solution: use 64bit CAS to load/store 64bit values on x86. It
>> seems too heavy for just loading/storing it isn't?
> this is actually what I do in Boost.Atomic; I *think* it is cheaper than
> shuffling around the values between SSE and general purpose registers (it
> sure is cheaper than MMX considering you also have to issue emms)
It's easy to test!
Express test: CDS's RecursiveSpinLock<atomic64_t> (load64 is used actively by TATAS algo for busy wait when CAS acquiring the lock is failed)
Equipment: WinXP Intel Core2 (3GHz, 2 core, no HT), MSVC++ 2008, release build with full optimization
SSE2 load64:
static inline atomic64_t load64( atomic64_t volatile const * pMem ) {
__m128i volatile v = _mm_loadl_epi64( (__m128i const *) pMem ) ;
return v.m128i_i64[0] ;
}
result (one of, average):
Spinlock_MT::recursiveSpinLock64
Lock test, thread count=8 loop per thread=1000000...
Duration=2.21852
CAS64 load64 (no CAS loop):
static inline atomic64_t load64( atomic64_t volatile const * pMem ) {
atomic64_t cur = 0 ;
return _InterlockedCompareExchange64( const_cast<atomic64_t volatile *>(pMem), cur, cur ) ;
}
result (one of, average):
Spinlock_MT::recursiveSpinLock64
Lock test, thread count=8 loop per thread=1000000...
Duration=2.79662
+20% performance for SSE2. Not so bad I wait more :)
Unfortunately, I have no access to multi-processor Win32 server for testing now.
Note, boost.atomic uses CAS-based loop for load64, so, I think the performance gain could be more.
Regards, Max
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk