Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-04-04 14:41:01


Alexander Terekhov wrote:
> Peter Dimov wrote:
> [...]
>> If you are interested, please take a look at the file
>
> Looks correct, but sill not quite optimal to my taste.
>
> A) There's no need to hinder compiler's ability to cache/reorder
> across increments. So you need neither __volatile__ nor "memory"
> clobber in increments case (lock prefix is still needed to ensure
> MP safety of competing read-modify-write operations).

Yep. This doesn't make a difference in my tests here (single Athlon).

> B) Something branchless is better for unconditional increments.

xadd is branchless; it just returns the old value, whereas inc doesn't. MSVC
always generates lock xadd, even for _InterlockedIncrement, BTW. So there's
probably no difference between the two. But I don't have a P4 or an Athlon
64 here to verify that. If someone wants to play, the version in the CVS now
has atomic_increment; uncomment the one-liner at the top and comment out the
__asm__ statement to compare the performance of the two versions.

> C) In the case of decrements on weak_count, there's no need to
> make all clients pay the price of rather expensive interlocked
> operation even if they don't use weak pointers. I'd use "may
> not store zero" decrement. You'll need __volatile__ and "memory"
> as compiler fence, and as for hardware, that initial load does
> have acquire semantics and lock cmpxchg does have "msync::hsb"
> which we need here.

I wanted to get it to work first ;-)

    void release() // nothrow
    {
        if( atomic_exchange_and_add( &use_count_, -1 ) == 1 )
        {
            dispose();

            if( (long volatile&)weak_count_ == 1 ) // no weak ptrs
            {
                destroy();
            }
            else
            {
                weak_release();
            }
        }
    }

?

> P.S. When are you going to kick start an incarnation for Itanic
> with value dependent cmpxchg.rel-vs-cmpxchg.acq? ;-)

IA64 assembly by hand? No thanks. I'll probably use _Interlocked* on Intel
and __sync_* on g++. But x86 and PPC (CW and g++ versions) have priority
over IA64.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk