From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2005-09-23 16:17:05
"Peter Dimov" <pdimov_at_[hidden]> writes:
> Anthony Williams wrote:
>> "Peter Dimov" <pdimov_at_[hidden]> writes:
>>> BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp
>>> (macro vs inline aside). The aim of this header is only to provide
>>> the Interlocked* functions as specified and documented by Microsoft
>>> without including <windows.h>; it is not meant to introduce new
>>> unspecified and undocumented functionality.
>> Fair enough. I'll move them elsewhere. I used macros rather than
>> inline functions, for consistency with the rest of the INTERLOCKED
>> stuff. Maybe inline functions are more appropriate, since these are
>> users of the INTERLOCKED functions rather than direct mappings.
>> Moved to boost/thread/detail/interlocked_read_win32.hpp.
> Where is BOOST_INTERLOCKED_READ being used, by the way? I don't follow the
> thread_rewrite branch closely but a quick glance didn't reveal anything. The
> semantics of InterlockedRead are probably a fully-fenced read? Few lock-free
> algorithms need that.
It's used in thread/detail/lightweight_mutex_win32.hpp,
thread/detail/read_write_mutex_win32.hpp and thread/detail/condition_win32.hpp
I'm using it to ensure that a read from a variable is either before or after
an interlocked_exchange or interlocked_increment, not midway through. I
figured that if one use of a variable was interlocked, others better had be
Maybe I'm wrong. I haven't thought about it *that* hard.
>>> Finally, I believe that for correct double-checked locking you only
>>> need a load with acquire barrier on the fast path - which maps to an
>>> ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a
>>> fully locked cmpxchg you're introducing a performance penalty (the
>>> philosophical debate of whether InterlockedCompareExchange is
>>> guaranteed to enforce memory ordering when the comparison fails
>> Is there an intrinsic function for that? I couldn't find one, which
>> is why I left it at InterlockedCompareExchange. I guess it could use
>> InterlockedCompareExchangeAcquire, which reduces the locking penalty.
> No, there is no documented way to implement ld.acq using the Windows API. A
> volatile read appears to work properly on all Windows targets/compilers, and
> there are probably thousands of lines of existing code that depend on it,
> but this wasn't specified anywhere.
> The newer MSVC 8 documentation finally promises that a volatile read has
> acquire semantics and that a volatile store has release semantics, even on
> IA-64, and the compiler also seems to understand these reordering
> The Intel compiler seems to have an option, serialize-volatile, that appears
> to be on by default; so it seems to also enforce acq/rel volatiles.
> As I see it, the implementation options are (1) use a volatile read, live
> dangerously, be ridiculed by Alexander Terekhov, (2) use inline assembly
> (painful), (3) use a fully-locked implementation and suffer the performance
> consequences - my preference is InterlockedExchangeAdd with zero.
> Either way, the actual helper function should be named atomic_load_acq and
> specified to promise acquire semantics, in my opinion.
Thank you for the details. I don't fancy either of the first two options, as I
don't know IA-64 or AMD64 assembly, and I don't feel safe relying on volatile
semantics unless it's really guaranteed correct on all supported compilers.
Is InterlockedExchangeAdd faster/more reliable in some way than
-- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk