Boost logo

Boost :

Subject: Re: [boost] [Smart Ptr] make_shared slower than shared_ptr(new) on VC++9 (and 10) with fix
From: Jeffrey Lee Hellrung, Jr. (jeffrey.hellrung_at_[hidden])
Date: 2012-04-25 17:37:52


On Wed, Apr 25, 2012 at 2:02 PM, Yuriy Zubritsky <mt.wizard_at_[hidden]>wrote:

> I've discovered the same problem a year ago and got own fix for it. It
> outperforms both old boost::make_shared and std::make_shared (just a bit
> for the last one).
> I used implementation of VC's make_shared as base.
> If someone interested I can send patch, but I don't know the correct
> process for this.
>
> Thanks
>
> 25 ËצÔÎÑ 2012 Ò. 12:48 Ivan Erceg <ierceg_at_[hidden]> ÎÁÐÉÓÁ×:
>
> > Hi all,
> >
> > Before I switch to using boost::make_shared<> I wanted to test its
> > purported performance advantage. I created a simple benchmark for
> measuing
> > raw allocation throughput for 3 classes of different sizes with a common
> > base class (constructors and destructors trivial). The number of
> > allocations was set to 40,000,000 as it was roughly giving me 10 seconds
> > running time per test.
> >
> > Unfortunatelly it turns out that on VC++9 (release target with default
> > optimizations) boost::make_shared is significantly slower than simply
> doing
> > boost::shared_ptr(new). Here's the benchmark output:
> >
> > TestBoostMakeShared 10.577s 3.78179e+006 allocs/s
> > TestBoostSharedPtrNew 8.907s 4.49085e+006 allocs/s
> >
> > As you can see boost::make_shared is over 15% slower than
> > boost::shared_ptr(new) idiom.
> >
> > Having available VC++10 compiler as well I then compared these results
> with
> > std::shared_ptr and std::make_shared implementations that come with that
> > compiler (but not VC++9). Here are the results:
> >
> > TestBoostMakeShared 9.688s 4.12882e+006 allocs/s
> > TestBoostSharedPtrNew 8.252s 4.84731e+006 allocs/s
> > TestStdMakeShared 5.07s 7.88955e+006 allocs/s
> > TestStdSharedPtrNew 8.159s 4.90256e+006 allocs/s
> >
> > While std::shared_ptr(new) performs about the same as
> > boost::shared_ptr(new), std::make_shared really blows away
> > boost::make_shared and both shared_ptr(new) tests, being almost twice as
> > fast as boost::make_shared.
> >
> > I then profiled the boost::make_shared test to see what's the biggest
> > performance bottleneck when compared to boost::shared_ptr(new) profiler
> > run. The culprit was immediately obvious: boost::make_shared test was
> > spending above 25% of its time in "type_info::operator==(class type_info
> > const &) const" function. This function was being called indirectly from
> > boost::make_shared through boost::get_deleter. After digging some more
> > through the implementation I came to the conclusion that, in this
> > particular case, we are guaranteed to always be requesting deleter for
> the
> > right class (namely T from boost::make_shared<T>). Since
> boost::shared_ptr
> > doesn't have a way to retrieve the deleter without using RTTI I decided
> to
> > add one and use it from an alternative boost::make_shared. So I did the
> > following:
> >
> > 1. I added a virtual function to detail::sp_counted_base
> > (detail\sp_counted_base_w32.hpp):
> >
> > virtual void * get_raw_deleter( ) = 0;
> >
> > 2. I implemented get_raw_deleter() function in sp_counted_impl_p
> > (detail\sp_counted_impl.hpp):
> >
> > virtual void * get_raw_deleter( )
> > {
> > return 0;
> > }
> >
> > 3. I implemented get_raw_deleter() function in sp_counted_impl_pd
> > (detail\sp_counted_impl.hpp):
> >
> > virtual void * get_raw_deleter( )
> > {
> > return &reinterpret_cast<char&>( del );
> > }
> >
> > 4. I implemented get_raw_deleter() function in sp_counted_impl_pda
> > (detail\sp_counted_impl.hpp):
> >
> > virtual void * get_raw_deleter( )
> > {
> > return &reinterpret_cast<char&>( d_ );
> > }
> >
> > 5. I added the following function to detail::shared_count:
> >
> > void * get_raw_deleter( ) const
> > {
> > return pi_? pi_->get_raw_deleter( ): 0;
> > }
> >
> > 6. I added the following function to shared_ptr<>:
> >
> > void * _internal_get_raw_deleter( ) const
> > {
> > return pn.get_raw_deleter( );
> > }
> >
> > 7. I made a separate copy of boost::make_shared function and replaced a
> > single line from:
> >
> > boost::detail::sp_ms_deleter< T > * pd = boost::get_deleter<
> > boost::detail::sp_ms_deleter< T > >( pt );
> >
> > to:
> >
> > boost::detail::sp_ms_deleter< T > * pd =
> > static_cast<boost::detail::sp_ms_deleter< T >
> > *>(pt._internal_get_raw_deleter());
> >
> > Benchmarking the results afterwards gave me the following results on
> VC++9:
> >
> > TestBoostSharedPtrNew 9.204s 4.34594e+006 allocs/s
> > TestBoostMakeShared 10.499s 3.80989e+006 allocs/s
> > TestBoostMakeSharedAlt 7.831s 5.1079e+006 allocs/s
> >
> > My changes translated into almost 35% improvement in allocation speed
> over
> > the current implementation of boost::make_shared. Or to put it
> differently,
> > they amount to 25+% decrease in running time as we could have supposed
> from
> > the profiling results.
> >
> > Results on VC++10 are similar:
> >
> > TestBoostSharedPtrNew 8.487s 4.71309e+006 allocs/s
> > TestBoostMakeShared 9.609s 4.16276e+006 allocs/s
> > TestStdSharedPtrNew 8.283s 4.82917e+006 allocs/s
> > TestStdMakeShared 5.039s 7.93808e+006 allocs/s
> > TestBoostMakeSharedAlt 6.802s 5.88062e+006 allocs/s
> >
> > VC++10's std::make_shared is still much faster (almost 35% faster than
> > boost::shared_ptr) and we will be switching to it once we switch to
> VC++10.
> > But in the meantime it seems to me that boost::make_shared should be
> fixed
> > to improve the performance. Again, this is only one compiler and other
> > compilers might not have such a severe RTTI performance issue but I still
> > think it would be well worth avoiding unnecessary calls to RTTI during
> > performance-relevant operations such as heap allocations.
> >
> > The testing and changes were done on Boost 1.48.0 but I compared Smart
> Ptr
> > library sources with Boost 1.49.0 and the above changes should work there
> > equally well.
> >
> > Thanks,
> > Ivan
>

I don't see any mention of this issue in the trac database, so be sure one
of you adds it, preferably with a patch! Even two competing patches is
better than none (and maybe better than one). This sounds like a worthy
improvement, but I'm not familiar at all with the internals of
boost::shared_ptr to know if the present implementation uses RTTI for
reasons other than to retrieve the deleter...

- Jeff


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk