|
Boost : |
Subject: Re: [boost] [atomic] comments
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2011-10-31 14:29:35
> considering the cost of cmpxchg8b itself, the cost of a branch -- if done
> correctly [1] -- is most likely immeasurable
Probably. But I'm a perfectionist. :)
> > Unfortunately, cmpxchg16b is not as common as cmpxchg8b, so a dynamic
> > check would be desirable. However, I would prefer that there were no
> > if's like the one above. Perhaps, a global table of pointers to the
> > actual function implementations would be better. Initially pointers
> > should point to functions that perform cpuid and initialize this table
> > and then call the real functions for the detected hardware. This way we
> > eliminate almost all overhead in the long run, including call_once.
>
> the processor most likely has more difficulties correctly predicting the
> code flow through a register-indirect branch than a static one, so I am not
> really sure this is cheaper, but it is in any case worth trying out
Yes, this needs testing, however I hope that unconditional jump should be
quite well predictable. My main concern is that without this trick you'll end
up calling pthread_once or something like that on every call and this will be
worse than simply jumping to the final destination. Also, this jump is likely
to be inlined anyway (and transformed into call).
> also, this would not be a "single" function pointer but a whole bunch of
> them to cover the different atomic operations (reducing everything to CAS
> generates more lock/unlock cycles in the fallback path otherwise)
Sure, like I said - a table of pointers.
> [1] I'm thinking of forcing the fallback path out-of-line such that
> cmpxchg8b is fall-through
Hmm, yeah, that may help inlining the cmpxchg8b part into the calling code.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk