|
Boost : |
Subject: Re: [boost] [fiber] new version in vault
From: Helge Bahmann (hcb_at_[hidden])
Date: 2009-12-01 07:42:55
On Tue, 1 Dec 2009, Anthony Williams wrote:
> Helge Bahmann <hcb_at_[hidden]> writes:
>
>> On Mon, 30 Nov 2009, Phil Endecott wrote:
>>
>>> My work on this was backed up with extensive benchmarking,
>>> disassembly of the generated code, and other evaluation. You can
>>> find some of the results in the list archive from about two years
>>> ago. There are many different types of system with different
>>> characteristics (uniprocessor vs multiprocessor, two threads vs
>>> 10000 threads, etc etc). Two particular cases that I'll mention
>>> are:
>>
>> I guess this is the code you used for testing?
>>
>> https://svn.chezphil.org/mutex_perf/trunk
>>
>> I would say that your conclusions are valid for ARM only (I don't know
>> the architecture or libc peculiarities), for x86 there are some
>> subtleties which IMHO invalidate the comparison.
>>
>> Your spinlock implementation defers to __sync_lock_test_and_set, which
>> in turn generates an "xchgl" instruction, and NOT an "lock xchgl"
>> instruction (yes, these gcc primitives are tricky which is why I avoid
>> them).
>
> On x86 these are equivalent --- the LOCK prefix is automatically
> asserted for XCHG. See the XCHG instruction docs in the Intel manual
> volumne 2B.
Yes you're right, forgot this odd one :/ Which still makes me wonder what
is going on -- it's the first time I see "lock xchgl" being noticeably
faster than "lock cmpxchgl".
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk