From: Aaron W. LaFramboise (aaronrabiddog51_at_[hidden])
Date: 2004-07-05 20:46:16
> It sounds reasonable that the kind of lock described will perform badly in
> high-contention situations. But what evidence do you have that there is any
> probability of high contention, especially when, as you say, the locks are
> short-lived? These locks aren't taken every time the smart pointer is accessed;
> they're only taken when a copy is made. So to get the high contention you speak
> of, you would have to have two threads repeatedly making copies of the "same"
> smart pointer. From where I stand, this seems sufficiently rare that the extra
> performance gain in the common cases is worth it.
[these comments are specific to the win32 spinlock discussed]
I have two specific concerns.
1) Personally, I value predictable performance. I would prefer slight
performance degradation (Note, however, that I am not examining the
actual numbers.) to the case where I may have an occasional very long
wait. (Specific example: If a graphically intensive program is aiming
for 30 frames per second, that is only about 30ms per frame. It would
be very unfortunate if, even occasionally, a single smart pointer copy
accounted for 1/3 of the time it took to render a frame.)
2) In "thundering herd" situations, contention is unavoidable, and
performance may be paramount. The possibility of a single thread
halting many other threads for 10ms on a machine with many processors is
disturbing. (Specific example: In a case where thread is producing work
for many consumer threads on a multiprocessor machine, and obtaining the
work involves copying a smart pointer, it is possible that some amount
of threads will forced to sleep every single time, which might be a
substantial percentage of total job wall clock time.)
In any case, I feel fairly strongly that sleeping is not the right thing
to do. Despite the weakness in appeal to authority, I will note that I
have seen Butenhof make the same assertion on Usenet, surely in a manner
more eloquent than mine.
I previously neglected to suggest alternatives.
1) A generally sound 8-byte mutex could be used. I am not sure how bad
this would be in terms of storage efficiency. It may be possible to
implement it using less storage, but I have no specific information on this.
2) I've seen a shared_ptr count implementation that used
InterlockedIncrement instead of the lwm code. I have not examined this
in detail; however it seems that an approach like this is better in all
respects than trying to manage locks if the only need is to maintain a
count (which is what many mutex primatives are doing anyway). Is there
some reason this cannot be used?
What do you think?
Aaron W. LaFramboise
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk