Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-09-06 13:44:40


Hi Peter,

Peter Dimov wrote:
> Phil Endecott:
>> I note that shared_ptr uses architecture-specific assembler for the
>> atomic operations needed for thread safe operations, on x86, ia64 and
>> ppc; it falls back to pthreads for other architectures. Has anyone
>> quantified the performance benefit of the assembler?
>>
>> Assuming that the benefit is significant, I'd like to implement it for
>> ARM. Has anyone else looked at this?
>>
>> ARM has a swap instruction. I have a (very vague) recollection that
>> perhaps some of the newer chips have some other locked instructions
>> e.g. test-and-set, but I would want to code to the lowest common
>> denominator i.e. swap only. Is this sufficient for what shared_ptr wants?
>>
>> I note that since 4.1, gcc has provided built-in functions for atomic
>> operations. But it says that "Not all operations are supported by all
>> target processors", and the list doesn't include swap; so maybe this
>> isn't so useful after all.
>
> Can you try the SVN trunk version of shared_ptr and look at the assembly?
> detail/sp_counted_base.hpp should choose sp_counted_base_sync.hpp for g++
> 4.1 and higher and take advantage of the built-ins.

Well it's quicker for me to try this:

int x;

int main(int argc, char* argv[])
{
   __sync_fetch_and_add(&x,1);
}

$ arm-linux-gnu-g++ --version
arm-linux-gnu-g++ (GCC) 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)

$ arm-linux-gnu-g++ -W -Wall check_sync_builtin.cc
check_sync_builtin.cc:3: warning: unused parameter ‘argc’
check_sync_builtin.cc:3: warning: unused parameter ‘argv’
/tmp/ccwWxfsT.o: In function `main':
check_sync_builtin.cc:(.text+0x20): undefined reference to `__sync_fetch_and_add_4'
collect2: ld returned 1 exit status

(It does compile on x86, and the disassembly includes a "lock addl" instruction.)

As I mentioned before, gcc doesn't implement these atomic builtins on
all platforms, i.e. it doesn't implement them on platforms where the
hardware doesn't provide them. I don't fully understand how this all
works in libstdc++ (there are too many levels of #include and #if for
me to follow) but there seems to be a __gnu_cxx::__mutex that they can
use in those cases.

> To answer your question: no, a mere swap instruction is not enough for
> shared_ptr, it needs atomic increment, decrement and compare and swap.

Well, I think you can implement a spin-lock mutex with swap:

int mutex=0; // 0 = unlocked, 1 = locked

void lock() {
   do {
     int n=1;
     swap(mutex,n); // atomic swap instruction
   } while (n==1); // if n is 1 after the swap, the mutex was already locked
}

void unlock() {
   mutex=0;
}

So you could using something like that to protect the reference counts,
rather than falling back to the pthread method. Or alternatively,
could you use a sentinel value (say -1) in the reference to indicate
that it's locked:

int refcount;

int read_refcount() {
   do {
     int r = refcount;
   } while (r==-1);
   return r;
}

int adj_refcount(int adj) {
   int r=-1;
   do {
     swap(refcount,r);
   } while (r==-1);
   refcount = r+adj;
}

(BTW, for gcc>=4.1 on x86 would you plan to use the gcc builtins or the
existing Boost asm?)

Regards,

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk