Subject: Re: [boost] [spinlock] Spin on volatile read, NUMA fairness?
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2014-12-03 18:41:47
On 3 Dec 2014 at 20:48, Benedek Thaler wrote:
> I was reading this Intel paper , and this section grabbed my attention:
> "One common mistake made by developers developing their own spin-wait loops
> is attempting to spin on an atomic instruction instead of spinning on a
> volatile read. Spinning on a dirty read instead of attempting to acquire a
> lock consumes less time and resources. This allows an application to only
> attempt to acquire a lock only when it is free."
> As I can tell by looking at the source code, spinlock spins on atomic
Spinlock does a speculative consume load to check if the lock is
locked, and if it is it spins on that. If the consume load says the
lock is unlocked, it then tries a compare exchange with acquire on
success and consume on failure.
> I wonder if a volatile read would produce better performance
I think the Intel paper was referring to MSVC only, which is an
unusual compiler in that its atomics all turn into InterlockedXXX
functions irrespective of what you ask for. In other words, all
One way of working around that is to use the volatile read = acquire
and volatile write = release semantics MSVC added in I think VS2005.
Now, I did benchmark the difference originally, and found no benefit
to one or the other on VS2013, so I left it with the non-undefined
behaviour variant which a cast from atomic to a volatile T *
requires. However I went ahead and put it back if the
BOOST_SPINLOCK_USE_VOLATILE_READ_FOR_AVOIDING_CMPXCHG macro is
defined just in case you'd like to see for yourself.
> AFAIK spinlocking is not necessarily fair on a NUMA architecture. Is there
> something already implemented or planned in Boost.Spinlock to ensure
> I'm thinking of something like this: 
If you want fairness, use the forthcoming C11 permit object, which is
effectively a fair CAS lock, which will be the base kernel wait
object in forthcoming non-allocating constexpr basic_future. That
object has been tuned to back off and create fairness when heavily
contended. Such fairness tuning is very much not free unfortunately.
On 3 Dec 2014 at 23:29, Andrey Semashev wrote:
> Generally speaking, things are more complicated than that. First,
> would probably be spinning with a relaxed read, not consume, which
> promoted to acquire on most, if not all, platforms.
Currently all platforms I believe. Consume semantics have not proven
themselves worth compiler vendor effort in their present design.
Therefore a consume is currently equal to an acquire.
> Acquire memory
> ordering is not required for spinning, and on architectures that
> support it it can be much more expensive than relaxed. Second, even
> relaxed atomic read is formally not equivalent to a volatile read.
> latter is not guaranteed to be atomic. Lastly, on x86 all this is
> mostly moot because compilers typically generate small volatile
> as a single instruction, which is equivalent to an acquire or
> atomic read on this architecture, as long as alignment is correct.
I'll be honest: benchmarking whether I can drop that precheck to
relaxed is on my todo list. As Intel can't do relaxed loads, I had
been waiting for my ARM board, which actually arrived some months
I'm also pretty conservative when it comes to memory ordering, and I
would default to stronger atomic semantics rather than weaker until I
see a compelling reason why not.
> > 2)
> > AFAIK spinlocking is not necessarily fair on a NUMA architecture.
> > something already implemented or planned in Boost.Spinlock to
> > fairness?
> > I'm thinking of something like this: 
> I can't tell for Boost.Spinlock (do we have that library?), but
> when you need fairness, spinlocks are not the best choice.
It's forthcoming. It contains proposed concurrent_unordered_map and
will contain the non-allocating constexpr basic_future.
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk