Boost logo

Boost :

From: Don G (dongryphon_at_[hidden])
Date: 2005-02-15 04:24:59

Hi Peter,

Again, thanks for taking all the time to patiently reply. This will
in all likelihood be my final post on this topic, though I will read
thoroughly any replies from you or others.

>> I guess this is the crux of the my contention. I think all must
>> agree there must be: automatic counting; basic pointer syntax
>> (including any implicit conversion from derived to base, and
>> various operators). The features beyond those are of the
>> debatable variety.
> There are two features beyond what you listed:
> 1. "Do the right thing" destruction;
> 2. Weak pointer support.

I meant that I view these as debatable, not "absolutely essential".

> Note that (1) does not equal "custom deleter support". Custom
> deleters do not add any further overhead (when not used) over "do
> the right thing" destruction.
> By that I mean that shared_ptr<X> can always be destroyed when it
> has been constructed properly, no matter whether at the point of
> destruction X is a complete type with an accessible (and virtual,
> if the actual object is not of type X) destructor

But if I have been using T* all these years, "do the right thing" is
really just adding overhead for no benefit to me. I have to rethink
all my options in light of this feature, because it is not how C++
pointers behave. Some may argue "better"; others will argue

> or whether 'operator delete' at the point of destruction can
> deallocate from the 'operator new' heap at the point of

Quite an esoteric condition. I cannot imagine that shared_ptr would
fit such an ABI even given this ability. In my experience, when this
kind of isolation is in play, self destruction (via vtable as in COM)
or similar techniques are also used.

> The theoretical minimum cost of a non-intrusive pointer is:
> - per-instance pointer to the count;
> - one word for the count.

I agree, but would say it this way: non-intrusion costs sizeof
for the pointer to the count. The count itself doesn't seem fair to
even call overhead<g> - I mean, you asked for this part by entering
the world of reference counting! :)

> Feature (1) above adds:
> - vtable
> - pointer
> The functor need not be present when not used, although the current
> code base does not do that. It can also be optimized out when it's
> an empty class with the help of boost::compressed_pair. Again, the
> current code base doesn't do it, but we're talking about the
> design-imposed overhead here.

I will look at compressed_pair, but I don't see how the functor could
not exist or take no space. Perhaps I am too focused on the details
of the current implementation.

> Feature (2) adds
> - an extra count.
> For platforms lacking CAS (are there any?) Tyson Whitehead has
> posted an implementation:
> that locks a mutex only when weak_ptr::lock is used. If weak
> pointers aren't used, the implementation has essentially the same
> performance as an ordinary non-intrusive smart pointer. The mutex
> pool trick can be used to not include a per-count mutex.

I think most platforms would have atomic inc/dec, but the more exotic
compareExchange (or equivalently powerful primitive) is what I was
concerned with being less available.

> One situation where custom deleters come in handy is when you want
> to use your own smart pointer in an application, but a third-party
> library takes a shared_ptr. With a shared_ptr not supporting
> deleters, you'd be forced to use shared_ptr in your application,
> too.

Welcome to the happy world of 3rd party libraries :) Seriously, it is
very common to encounter this type of thing, and it again comes back
to the problem that custom deleters add most (all?) of their weight
even if you don't use them. If it were otherwise, I would argue for
their inclusion. I can imagine cases where they would be useful.

> You can always switch to active notification and avoid the need for
> a passive observer such as weak_ptr, but this often makes your
> design worse, because now the observed objects need to know about
> the observers in order to notify them.

I agree that active notification tends to be too complicated and,
therefore, not desirable. What I meant was that eliminating all need
to observe is better than either. In many (not all) cases, if A and
B are cyclic such that A owns B, but B needs access to its owner,
one can refactor things such that A & B jointly own a C. The stuff
B needed from A moves to C and no more cycle. I had a major redesign
recently where I did exactly this and things got much better as a
result. Mileage will vary, of course.

> There's also the 'shared from this' problem.

Not a good thing in a constructor - been there; wish I hadn't done
that! :)

>>> The good thing is that once you have a stable design, efficiency
>>> usually follows, as happened here.
>> True enough, but optimizations leveraging advanced, platform-
>> specific capabilities do not offer much benefit to those writing
>> portable code. For them, the cost remains the same. In this case,
>> this will always be so.
> Portable code? shared_ptr doesn't need to be portable, if it comes
> with your standard library.

By "portable code" I mean code that uses shared_ptr on multiple OSes
or compilers. The cost may be acceptable on some, but not all of the
compilers in use.

>> Now we've come back to my original point: the very design of
>> shared_ptr requires a certain overhead that is _irreducible_ in
>> the general case. And that overhead is:
>> A. Not clearly specified or articulated (for those who need to
>> know)
>> B. _Much_ more than one would naively expect
> All designs require a certain irreducible overhead, do they not?
> Think of it as a std::map. The overhead of a map is not clearly
> specified or articulated. (B) probably applies as well. But it
> works.

Absolutely, all designs have certain irreducible overhead - I didn't
mean to imply otherwise. In the case of std::map, one should expect
an instance to be on the large-ish side. Also, one should expect the
per node overhead to be two pointers and probably another word for
the tree balancing algorithm. If an implementation does worse than
this, it is the fault of the implementation. The design of std::map
allows for a realization that has that level of overhead, but no

What made me focus so on shared_ptr is that the number of instances I
would expect to see is quite large: all dynamically allocated objects
in the ideal case. I could probably enumerate the number of places in
our 1M+ lines of code where we used std::map.

Not that you have to appease me <g>, but since its cost aren't
additive, I would love to at least have some way to "opt out" of the
majority of shared_ptr overhead. For example, a way to indicate that
shared_ptr<T> should use an intrusive count, disallow weak_ptr and
put the kibosh on all the other magic. Why not use "intrusive_ptr"?
It's not the same name (and hence templates using shared_ptr would

As it sits, all optimization opportunity is in the hands of the RTL
implementor. As a user I will have to either accept all the overhead
(in the worst case on all _my_ target platforms), or go elsewhere.

Since the outcome of all this discussion will likely have no effect
on the design of shared_ptr (not that I expected it would nor do I
mean to slight anyone), my final plea is that the TR1 implementation
include any and all optimizations so as to increase the likelyhood
that other implementations will follow (like the "return object by
value" optimization). Leaving these as "exercises" would be a great
disservice since I expect that most implementors would only slowly
rediscover all the optimizations that you and others have already
imagined. I don't know that language could be put in place to either
encourage or require some minimum set of optimizations in a
implementation (like "std::map::find is O(lg(n))" or such).

Best regards,


Do you Yahoo!?
Yahoo! Mail - now with 250MB free storage. Learn more.

Boost list run by bdawes at, gregod at, cpdaniel at, john at