|
Boost : |
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-09-21 13:12:34
Anthony Williams wrote:
>>> it looks like you've opted for a
>>> check/sleep/check/sleep loop for threads that are waiting for another thread
>>> to finish running the routine. This is a bad idea. Blocking of this nature
>>> should be done by waiting on an OS primitive rather than with a wait loop.
>>
>> Why is it that bad? This is safier since there is no opportunity to get an
>> error on the threading primitive construction, it doesn't use system
>> resources like kernel objects and it solves the fundamental problems of
>> creating and destroying those threading primitives in run time. And it will
>> be run only once after all, so performance is not an issue.
>
> I think that performance *is* an issue, even though this will only be run once
> per thread.
>
> A check/sleep polling loop is a bad idea, as it consumes CPU time that could
> be spent actually running the once routine (or another thread that doesn't
> need to wait). By waiting on an OS primitive, the OS can take the thread out
> of the schedule until the primitive is ready to be acquired.
>
> Not only that, but a check/sleep loop forces a latency of at least the
> specified sleep time on the waiting thread. If the initialization being waited
> for only takes a few microseconds (or less --- if it's just a simple
> initialization it might take only a few nanoseconds), then waiting a whole
> millisecond is an unnecessary delay.
>
> POSIX provides pthread_once. We should use it.
Do have a look at the analysis that I did for my ARM atomic shared_ptr code:
http://thread.gmane.org/gmane.comp.lib.boost.devel/164564/focus=164893
If the probability of contention is very low, then on average adding
even one instruction to the non-contended case, or occupying more
icache space with yield() calls, may slow the program down more than
yielding on contention would speed it up.
The probability of contention depends crucially on the duration of the
critical section, and I imagine that this could vary enormously for
"once" functions, i.e. anything from a couple of instructions to
seconds. So it might be worthwhile having different types of "once"
for these different cases - and the same could also be said of mutexes.
Take care with the pthreads option. I spent a while trying to
understand what the Linux pthreads implementation (in glibc) does (for
ARM), and it eventually boils down to much the same as I had written.
However it's almost an order of magnitude slower, and I believe that's
because it involves a couple of function calls while mine is inline.
Since pthreads is a C API, I think that the function call overhead is
inevitable. So I have put investigating replacing the pthreads mutexes
used by boost.threads with asm on my to-do list (though it may never
reach the top).
Having said all that, does anyone really worry much about "once"
performance? It's not like shared_ptr, where code that uses it may be
doing atomic reference count changes fairly continuously.
Regards,
Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk