|
Boost : |
Subject: Re: [boost] [lockfree] review
From: Hans Boehm (Hans.Boehm_at_[hidden])
Date: 2011-08-29 14:59:09
Peter Dimov <pdimov <at> pdimov.com> writes:
>
> Alexander Terekhov wrote:
>
> > Consider also that
> >
> > "Load Seq_Cst: MOV (from memory)
> > Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE"
> >
> > is an overkill for typical use cases...
>
> But that's not a problem because everyone who understands should use
> explicit constraints, even if they happen to be memory_order_seq_cst.
> Relying on the SC default is bad practice because it can (and, to be on the
> safe side, should) be interpreted to mean that the author just hasn't
> figured out the minimum requirements.
>
I would have stated this differently, though probably with the same result.
At least when writing application-level code, I would always rely on the
default initially, and not worry about ordering. I would explicitly specify
the ordering only when it turns out that memory_order_seq_cst introduces a
performance problem.
If nothing else, this would allow me to separate out debugging of memory model
issues.
My experience is that very few people manage to get memory ordering right. My
PPoPP 07 and MSPC 11 papers both have examples of commonly used mutex
implementations getting it wrong in various interesting ways. We didn't
understand what the specs actually required, but on top of that some of the
implementations got it wrong in ways that were clearly independent of any
misunderstanding of the spec. Given that the experts can't figure it out for
what should be the easy cases, I'd much rather most people just stick the
sequentially consistent default.
This is entirely consistent with Peter's claim that using the sequentially
consistent default means I haven't thought about it. But in many cases I
really don't want to think about it, and that may be a fine state of affairs.
For example, if I use an atomic counter, it's very likely that either:
1. It's not performance critical, I'm using atomics because they're more
direct than mutexes in this case, or because I need the signal
handler/interrupt safety, and the SC version is fine, or
2. It is performance critical, and I probably want to think hard about
alternate solutions the keep thread-local counts.
In both cases, it's unlikely that memory ordering will significantly impact
application performance. Of course this doesn't apply to all use cases.
Hans
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk