|
Boost : |
Subject: Re: [boost] [lockfree] review
From: Hans Boehm (Hans.Boehm_at_[hidden])
Date: 2011-08-31 13:49:38
Alexander Terekhov <terekhov <at> web.de> writes:
>
>
> Hans Boehm wrote:
> [...]
> > For what it's worth, Sarita Adve is both an author of the report you cite
and
> > the original and perhaps strongest advocate for the "sequential consistency
> > for data-race-free programs" programming model.
>
> I'm not contra "sequential consistency for data-race-free programs"
> programming model for programs using locks. On PPC, for example, such
> programs don't even need hwsync. For programs with lock-free atomics
> OTOH, the races (concurrent accesses to the same locations with loads
> competing with concurrent stores) is a feature, not a bug, and SC is
> simply way too expensive (e.g. it needs hwsync on PPC) for use in
> default mode for lock-free atomics: C/C++ is "you don't pay for what you
> don't need".
>
> regards,
> alexander.
>
The question is when the "sequential consistency for data-race-free programs"
should extend to programs using atomic load, store, and RMW operations. The
C++ committee, including me, came to the conclusion that the answer needs to
be the yes; there are many cases in which the use of atomics is fairly
straightforward and useful. And it should be possible to use them without
leaving this relatively simple programming model. By doing so, you get a safe
programming model by default. Since we do have explicit ordering primitives,
you have the option of only paying for what you need. But 90%, or probably
99% of programmers will not know what they need here. And that's fine.
This is entirely consistent with many other C++ design decisions. The default
operator new allocates memory that lives as long as the process, even though
that's more expensive than allocating memory local to the current stack frame
or thread, and often one of those latter two options would be sufficient. But
it would be nasty to use that as default behavior.
The overhead of enforcing sequential consistency is unfortunately currently
very platform-specific. On X86, it's increasingly minor, since it's possible
to confine the added cost to stores and, as far as I can tell, the added cost
is becoming much less than the cost of a coherence miss. And if your
performance is limited by the cost of stores to shared variables, you are
fairly likely to unavoidably see lots of coherence misses, so there is a hand-
wavy argument that this is likely to be a minor perturbation. On other
architectures, the costs are unfortunately larger. But my impression is that
they are decreasing everywhere, as architects pay more attention to
synchronization costs.
Hans
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk