Boost logo

Boost :

From: Mattias Flodin (flodin_at_[hidden])
Date: 2004-07-06 00:13:23


Quoting "Aaron W. LaFramboise" <aaronrabiddog51_at_[hidden]>:
> 1) Personally, I value predictable performance. I would prefer slight
> performance degradation (Note, however, that I am not examining the
> actual numbers.) to the case where I may have an occasional very long
> wait. (Specific example: If a graphically intensive program is aiming
> for 30 frames per second, that is only about 30ms per frame. It would
> be very unfortunate if, even occasionally, a single smart pointer copy
> accounted for 1/3 of the time it took to render a frame.)
>
> 2) In "thundering herd" situations, contention is unavoidable, and
> performance may be paramount. The possibility of a single thread
> halting many other threads for 10ms on a machine with many processors is
> disturbing. (Specific example: In a case where thread is producing work
> for many consumer threads on a multiprocessor machine, and obtaining the
> work involves copying a smart pointer, it is possible that some amount
> of threads will forced to sleep every single time, which might be a
> substantial percentage of total job wall clock time.)

Thanks.

> In any case, I feel fairly strongly that sleeping is not the right thing
> to do. Despite the weakness in appeal to authority, I will note that I
> have seen Butenhof make the same assertion on Usenet, surely in a manner
> more eloquent than mine.

What I'm thinking here is that smart pointers are a special case, and what may
apply in
the general case may not apply given that constraint. The more specific
information you have about the task, the more you can optimize.

However, despite your examples being slightly far-fetched (Sleep(1) would wait
for ~1ms, not 10ms, and a consumer/producer system would most likely have a
synchronized queue which would avoid any congestion around the smart pointers),
they are a
good enough hint to convince me that there are real-world applications that
would suffer from the problem. As you say, stable performance is more important
for a generic implementation.

> I previously neglected to suggest alternatives.
>
> 1) A generally sound 8-byte mutex could be used. I am not sure how bad
> this would be in terms of storage efficiency. It may be possible to
> implement it using less storage, but I have no specific information on this.
>
> 2) I've seen a shared_ptr count implementation that used
> InterlockedIncrement instead of the lwm code. I have not examined this
> in detail; however it seems that an approach like this is better in all
> respects than trying to manage locks if the only need is to maintain a
> count (which is what many mutex primatives are doing anyway). Is there
> some reason this cannot be used?

I was a bit surprised by the use of a lock as well, and given that
InterlockedIncrement is an obvious solution for this kind of thing, I assumed
there were non-obvious reasons that it couldn't be used. My guess is exception
safety, but I would like to hear from the original authors (or anybody else in
the know) about this. Perhaps explaining rationale in the documentation would
be in order.

> What do you think?

Alternative 2 would be a superior solution if it's feasible. Alternative 1 is
not bad performance-wise if implemented using CRITICAL_SECTION. My only worry
is about resource usage, since mutexes are kernel objects. I can imagine
that situations where hundreds of thousands of smart pointers are used may end
up having an
impact on overall performance. In some cases, kernel memory usage is
restricted. I'm not sure if mutexes belong to that category. The number of
outstanding overlapped file operations is an example that does (in the order of
1000 outstanding operations from my measures, on a machine with 512 MB ram).

I believe the majority of threaded applications do not need to share smart
pointers between threads to any great extent. Unfortunately the choice to avoid
a policy-based design implies that optional thread safety might add something
like three extra smart pointer classes to the six already existing ones.

/Mattias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk