Boost logo

Boost :

From: Aaron W. LaFramboise (aaronrabiddog51_at_[hidden])
Date: 2004-07-06 00:00:14


Mattias Flodin wrote:

> Quoting "Aaron W. LaFramboise" <aaronrabiddog51_at_[hidden]>:

> However, despite your examples being slightly far-fetched (Sleep(1) would wait
> for ~1ms, not 10ms, and a consumer/producer system would most likely have a
> synchronized queue which would avoid any congestion around the smart pointers),
> they are a
> good enough hint to convince me that there are real-world applications that
> would suffer from the problem. As you say, stable performance is more important
> for a generic implementation.

Sorry. I was mistakenly thinking that it was not possible to sleep for
less than the system clock granularity (10ms), but now that I tested it,
this assertion appears to not be true. Maybe that was only true for
older systems.

I will investigate further how bad the degenerate performance might
actually be with smart_ptr.

>>2) I've seen a shared_ptr count implementation that used
>>InterlockedIncrement instead of the lwm code. I have not examined this
>>in detail; however it seems that an approach like this is better in all
>>respects than trying to manage locks if the only need is to maintain a
>>count (which is what many mutex primatives are doing anyway). Is there
>>some reason this cannot be used?
>
> I was a bit surprised by the use of a lock as well, and given that
> InterlockedIncrement is an obvious solution for this kind of thing, I assumed
> there were non-obvious reasons that it couldn't be used. My guess is exception
> safety, but I would like to hear from the original authors (or anybody else in
> the know) about this. Perhaps explaining rationale in the documentation would
> be in order.

Well, here is the implementation that I have seen:
http://www.pdimov.com/cpp/shared_count_x86_exp2.hpp
This implementation seems like it would be faster than what is presently
being used, in all cases.

I do not know why it is not used presently. Hopefully Peter Dimov will
comment.

> Alternative 2 would be a superior solution if it's feasible. Alternative 1 is
> not bad performance-wise if implemented using CRITICAL_SECTION. My only worry
> is about resource usage, since mutexes are kernel objects. I can imagine
> that situations where hundreds of thousands of smart pointers are used may end
> up having an
> impact on overall performance. In some cases, kernel memory usage is
> restricted. I'm not sure if mutexes belong to that category. The number of
> outstanding overlapped file operations is an example that does (in the order of
> 1000 outstanding operations from my measures, on a machine with 512 MB ram).

Well, even critical sections, Windows's fastest mutex primative, are
much slower in the noncontended case than a spinlock. A two-stage
method is needed to match the performance of the present spinlock: a
lightweight atomic operation followed by a heavy-weight mutex if the
lock is contended. This is why I was mentioned 8 bytes (one word for
the critical section, one word for the atomic operation) would be necessary.

> I believe the majority of threaded applications do not need to share smart
> pointers between threads to any great extent. Unfortunately the choice to avoid
> a policy-based design implies that optional thread safety might add something
> like three extra smart pointer classes to the six already existing ones.

Ideally, the smart pointer classes would work for more than the
majority; they would work for everyone. I also agree that it is
unfortunate that users who do not need threads might have to pay for
threads anyway, or a more complicated smart pointer library.

Aaron W. LaFramboise


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk