|
Boost Users : |
From: Matt Hurd (matt.hurd_at_[hidden])
Date: 2005-02-10 22:37:55
>Paul <elegant_dice_at_[hidden]> wrote:
> good lord, assembler? doesn't the 'volatile' keyword fix the problem
> you describe?
I'm afraid it doesn't. You might want to check the archives for
mentions regarding the double checked locking issues which highlight
the issue, or read the paper
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> if the flag is going to be flipping back and forth, then sure, even with
> memory barriers you will have threads which think the value is
> different... eg, you have 2 long-running threads that check the value
> and do some calculation that take 5 seconds or so. lets assume they
> dont sync when reading the flag, so then thread A could check the flag
> and read false, while thread B checked the flag 4 seconds ago and
> currently thinks its true. so you have inconsistencies there anyway, right?
It depends, there are lock free ways to do things.
The important thing is that after the memory barrier, everyone is
guaranteed to be on the same pages w.r.t. the value when they next
check it.
> otherwise, the flag is set and its only a matter of time before the CPU
> sees the correct value ('die' in the previous email's case) and dies.
> if the flag is for protecting resources, then of course you need
> volatile/guards/etc, otherwise it can be just a 'lazy cancel', when the
> thread finally reads the right value, it quits. how many cycles-lag
> would that be anyway? is this only a problem on exotic platforms?
I have found it to be a big issue for my work on a simple win32 dual
proc box. It can be a surprisingly long time before a value
propagates. I'm not entirely certain the value is guaranteed to ever
propagate, which is an issue.
Another example, checking whether a message queue is empty can be
significantly faster ( 2 to 4 times on ia32 ) with the appropriate use
of memory barriers rather than a mutex / critical section.
> is this cpu-cache problem actually a problem in this 'kill-flag' case,
> do you really think you need memory barriers on a flag like this?
On a single IA32 processor, no. On a dual processor, maybe, it
depends. Also, I'm not sure about the guarantee of the value
propagating eventually, I have encountered cases where it does seem to
propagate at all probably due to the compiler optimizations.
I've been meaning to submit an interface for memory fences based on
the work of others I've seen poking around. It seems that to cope
with most architectures you need stuff like:
load_load, load_store, store_store, store_load
barriers and the like to support memory architecture with more relaxed
semantics, some of which are no-ops on IA32. This should be a
fundamental building block on which boost can build. It would also
mean having a different category of builds for such a library, as a
compilation would be based on only on the os, compiler and stl, but
the hardware architecture as well.
For example, pre lfence, sfence and mfence on ia32 you had to do a
cpuid as a memory fence. Though a locked bus instruction might have
been enough, I think. So an implementation even just on plain ia32
would have to have different architecture #defines or for different
generational ia32s. The later fences came with SSE2 for example.
Check out http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1680.pdf
for some more insight from cleverer people than I.
Regards,
Matt.
matthurd_at_[hidden]
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net