|
Boost : |
From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-09-23 13:35:15
Anthony Williams wrote:
> "Peter Dimov" <pdimov_at_[hidden]> writes:
>
>> BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp
>> (macro vs inline aside). The aim of this header is only to provide
>> the Interlocked* functions as specified and documented by Microsoft
>> without including <windows.h>; it is not meant to introduce new
>> unspecified and undocumented functionality.
>
> Fair enough. I'll move them elsewhere. I used macros rather than
> inline functions, for consistency with the rest of the INTERLOCKED
> stuff. Maybe inline functions are more appropriate, since these are
> users of the INTERLOCKED functions rather than direct mappings.
>
> Moved to boost/thread/detail/interlocked_read_win32.hpp.
Where is BOOST_INTERLOCKED_READ being used, by the way? I don't follow the
thread_rewrite branch closely but a quick glance didn't reveal anything. The
semantics of InterlockedRead are probably a fully-fenced read? Few lock-free
algorithms need that.
>> Finally, I believe that for correct double-checked locking you only
>> need a load with acquire barrier on the fast path - which maps to an
>> ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a
>> fully locked cmpxchg you're introducing a performance penalty (the
>> philosophical debate of whether InterlockedCompareExchange is
>> guaranteed to enforce memory ordering when the comparison fails
>> aside.)
>
> Is there an intrinsic function for that? I couldn't find one, which
> is why I left it at InterlockedCompareExchange. I guess it could use
> InterlockedCompareExchangeAcquire, which reduces the locking penalty.
No, there is no documented way to implement ld.acq using the Windows API. A
volatile read appears to work properly on all Windows targets/compilers, and
there are probably thousands of lines of existing code that depend on it,
but this wasn't specified anywhere.
The newer MSVC 8 documentation finally promises that a volatile read has
acquire semantics and that a volatile store has release semantics, even on
IA-64, and the compiler also seems to understand these reordering
constraints.
http://msdn2.microsoft.com/en-us/library/12a04hfd
The Intel compiler seems to have an option, serialize-volatile, that appears
to be on by default; so it seems to also enforce acq/rel volatiles.
As I see it, the implementation options are (1) use a volatile read, live
dangerously, be ridiculed by Alexander Terekhov, (2) use inline assembly
(painful), (3) use a fully-locked implementation and suffer the performance
consequences - my preference is InterlockedExchangeAdd with zero.
Either way, the actual helper function should be named atomic_load_acq and
specified to promise acquire semantics, in my opinion.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk