Boost logo

Boost :

Subject: Re: [boost] [atomic] comments
From: Helge Bahmann (hcb_at_[hidden])
Date: 2011-10-22 15:39:11


On Saturday 22 October 2011 20:32:44 Tim Blechmann wrote:
> > > then we need some kind of interprocess-specific atomic ... maybe as
> > > part of boost.interprocess ... iac, maybe we should provide an
> > > implementation which somehow matches the behavior of c++11 compilers
> > > ...
> >
> > well if the atomics are truely atomic, then BOOST_ATOMIC_*_LOCK_FREE == 2
> > and I find a platform where you cannot use them safely between processes
> > difficult to imagine (not that something like that could not exist)
>
> one would have to do the dispatching logic in the preprocessor, so one
> cannot dispatch depending on the typedef operator.

it's certainly possible to build a helper template to map types to these macro
values (map to the value of BOOST_ATOMIC_INT_LOCK_FREE for all types T with
sizeof(T) == sizeof(int) for example)

> > if they are not atomic, then you are going to hit a "fallback-via
> > locking" path in whiche case you are almost certainly better off picking
> > an interprocess communication mechanism that just uses locking directly
>
> true, but at the cost of increasing the program logic. however there are
> cases, when you are happy that you don't have to change the program at the
> cost of performance on legacy hardware.

okay that's a valid point -- not sure how common this use case is, but I do
not think it deserves penalizing the process-local path

doing it in Boost.Interprocess might be something to consider however

> > > it would be equally correct to have something like:
> > > static bool has_cmpxchg16b = query_cpuid_for_cmpxchg16b()
> > >
> > > if (has_cmpxchg16b)
> > >
> > > use_cmpxchg16b();
> > >
> > > else
> > >
> > > use_fallback();
> > >
> > > less bloat and prbly only a minor performance hit ;)
> >
> > problematic because the compiler must insert a lock to ensure thread-safe
> > initialization of the "static bool" (thus it is by definition not
> > "lock-free" any more)
>
> well, one could also set a static variable with a function called before
> main (e.g. via __attribute__(constructor))

might be possible, but this will then cost everyone the cpuid at load time

I am currently trying out something different, namely a tristate variable
("unknown", "has_cmpxchg8b", "lacks_cmpxchg8b") with a benign race where (in
bad cases) multiple threads might end up doing "cpuid" concurrently until all
threads "see" that it has a state other than "unknown"

> > > in the average, but not in the worst case. for real-time systems it is
> > > not acceptable that the os preempts a real-time thread while it is
> > > holding a spinlock.
> >
> > prio-inheriting mutexes are usually much faster than cmpxchg16b -- use
> > these for hard real-time (changing the fallback path to use PI mutexes as
> > well might even be something to consider)
>
> do you have some numbers which latencies can be achieved with PI mutexes?

no I don't, but the literature measuring wakeup latencies in operating systems
is plentiful

I only have throughput numbers, and these peg a double-word-CAS operation as
slightly less than twice as expensive as single-word-CAS -- considering that
most protocols need one pair of (either single- or double-word) CAS, and
considering that PI mutex lock/unlock can essentially be just a CAS on the
lock variable (to store/clear the owner id) in the fast path, PI mutexes
usually end up faster

Nevertheless I will add cmpxchg16b for experimentation.

Best regards
Helge


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk