Boost logo

Boost :

Subject: Re: [boost] [atomic] comments
From: Helge Bahmann (hcb_at_[hidden])
Date: 2011-10-31 12:58:41


On Friday 28 October 2011 17:43:43 Andrey Semashev wrote:
> On Friday, October 28, 2011 17:12:55 Domagoj Saric wrote:
> > On 21.10.2011. 13:06, Tim Blechmann wrote:
> > >>> compile-time vs run-time dispatching:
> > >>> some instructions are not available on every CPU of a specific
> > >>> architecture. e.g. cmpxchg8b or cmpxchg16b are not available on all
> > >>> ia32/x86_64 cpus. i would appreciate if these instructions would not
> > >>> be
> > >>> used before performing a CPUID check, whether these instructions are
> > >>> really available (at least in a legacy mode)
> > >>
> > >> the correct way to do that is to have different libraries for
> > >> sub-architectures and have the runtime- linker decide... this requires
> > >> infrastructure not present in boost
> > >
> > > it would be equally correct to have something like:
> > > static bool has_cmpxchg16b = query_cpuid_for_cmpxchg16b()
> > >
> > > if (has_cmpxchg16b)
> > >
> > > use_cmpxchg16b();
> > >
> > > else
> > >
> > > use_fallback();
> > >
> > > less bloat and prbly only a minor performance hit ;)
> >
> > cmpxchg8b is available since the original Pentium. Preferably dynamic
> > support for such ancient hardware, if supported at all, should not be on
> > by default (by forcing dynamic dispatching on everyone).

considering the cost of cmpxchg8b itself, the cost of a branch -- if done
correctly [1] -- is most likely immeasurable

> Unfortunately, cmpxchg16b is not as common as cmpxchg8b, so a dynamic check
> would be desirable. However, I would prefer that there were no if's like
> the one above. Perhaps, a global table of pointers to the actual function
> implementations would be better. Initially pointers should point to
> functions that perform cpuid and initialize this table and then call the
> real functions for the detected hardware. This way we eliminate almost all
> overhead in the long run, including call_once.

the processor most likely has more difficulties correctly predicting the code
flow through a register-indirect branch than a static one, so I am not really
sure this is cheaper, but it is in any case worth trying out

also, this would not be a "single" function pointer but a whole bunch of them
to cover the different atomic operations (reducing everything to CAS
generates more lock/unlock cycles in the fallback path otherwise)

[1] I'm thinking of forcing the fallback path out-of-line such that cmpxchg8b
is fall-through

Best regards
Helge


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk