From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2007-03-23 05:08:08
"Peter Dimov" <pdimov_at_[hidden]> writes:
> Anthony Williams wrote:
>> "Peter Dimov" <pdimov_at_[hidden]> writes:
>>> I don't see call_once in jss_thread.zip, by the way; maybe you
>>> forgot to put it into the archive?
>> Oops. Thanks for spotting that. I've added it to the archive, and
>> updated it to take multiple arguments in passing.
> Some comments on that:
Thanks for taking the time to look at this.
> template<typename Function>
> void call_once(once_flag& flag,Function f)
> // Try for a quick win: if the proceedure has already been called
> // just skip through:
> long const function_complete_flag_value=0xc15730e2;
> char mutex_name[::jss::detail::once_mutex_name_length];
> void* const
> detail::win32::handle_holder const closer(mutex_handle);
> detail::win32_mutex_scoped_lock const lock(mutex_handle);
> The first load needs to be a load_acquire; the second can be ordinary since
> it's done under a lock. The store needs to be store_release.
I didn't want to think about acquire/release semantics when I wrote that, so I
just went for "ordered" ops.
Agreed that the second read can be ordinary. Actually I think the store can be
ordinary too since it's also done under a lock, and the unlock has (or should
have, anyway) release semantics.
I agree that the first read needs to be load_acquire, though: without the
acquire, there's no synchronization in the case that the flag has been set,
and there's nothing to "release".
> An interlocked_read is stronger ('ordered') and more expensive than needed
> on a hardware level, but is 'relaxed' on a compiler level under MSVC 7.1
> (the optimizer moves code around it). It's 'ordered' for the compiler as
> well under 8.0; the intrinsics have been changed to be compiler barriers as
> well. InterlockedExchange is similar.
Have you got a reference for that? I would be interested to read about the
details; MSDN is sketchy.
> A load_acquire can be implemented as a volatile read under 8.0, and a
> volatile read followed by _ReadWriteBarrier under 7.1.
Why don't you need the barrier on 8.0? You need something there in order to
prevent the CPU from doing out-of-order reads (and stores), even if the
compiler won't reorder things. In fact, looking at the assembly code
generated, I believe you need more than a _ReadWriteBarrier in both cases, as
it seems to be purely a compiler barrier, and not a CPU barrier.
On x86, I think a load_acquire needs to either be a simple load followed by an
MFENCE, or a fully ordered RMW operation. The compiler Interlocked intrinics
will generate the latter, but I don't know how to do the former short of
writing inline assembly.
-- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk