[Boost-bugs] [Boost C++ Libraries] #6830: make_shared slower than shared_ptr(new) on VC++9 and 10

Subject: [Boost-bugs] [Boost C++ Libraries] #6830: make_shared slower than shared_ptr(new) on VC++9 and 10
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2012-04-25 22:04:49


#6830: make_shared slower than shared_ptr(new) on VC++9 and 10
------------------------------+---------------------------------------------
 Reporter: ierceg@… | Owner: pdimov
     Type: Patches | Status: new
Milestone: To Be Determined | Component: smart_ptr
  Version: Boost 1.48.0 | Severity: Optimization
 Keywords: make_shared |
------------------------------+---------------------------------------------
 I created a simple benchmark for measuing raw allocation throughput for 3
 classes of different sizes with a common base class (constructors and
 destructors trivial). The number of allocations was set to 40,000,000 as
 it was roughly giving me 10 seconds running time per test.

 it turns out that on VC++9 (release target with default optimizations)
 boost::make_shared is significantly slower than simply doing
 boost::shared_ptr(new). Here's the benchmark output:

 TestBoostMakeShared 10.577s 3.78179e+006 allocs/s

 TestBoostSharedPtrNew 8.907s 4.49085e+006 allocs/s

 As you can see boost::make_shared is over 15% slower than
 boost::shared_ptr(new) idiom.

 One suggested solution:

 boost::shared_ptr doesn't have a way to retrieve the deleter without using
 RTTI which is what is slowing down the execution on VC++9/10. I decided to
 add one and use it from an alternative boost::make_shared. So I did the
 following:

 1. I added a virtual function to detail::sp_counted_base
 (detail\sp_counted_base_w32.hpp):

   virtual void * get_raw_deleter( ) = 0;

 2. I implemented get_raw_deleter() function in sp_counted_impl_p
 (detail\sp_counted_impl.hpp):

   virtual void * get_raw_deleter( )
   {
     return 0;
   }

 3. I implemented get_raw_deleter() function in sp_counted_impl_pd
 (detail\sp_counted_impl.hpp):

   virtual void * get_raw_deleter( )
   {
     return &reinterpret_cast<char&>( del );
   }

 4. I implemented get_raw_deleter() function in sp_counted_impl_pda
 (detail\sp_counted_impl.hpp):

   virtual void * get_raw_deleter( )
   {
     return &reinterpret_cast<char&>( d_ );
   }

 5. I added the following function to detail::shared_count:

   void * get_raw_deleter( ) const
   {
     return pi_? pi_->get_raw_deleter( ): 0;
   }

 6. I added the following function to shared_ptr<>:

   void * _internal_get_raw_deleter( ) const
   {
     return pn.get_raw_deleter( );
   }

 7. I made a separate copy of boost::make_shared function and replaced a
 single line from:

   boost::detail::sp_ms_deleter< T > * pd = boost::get_deleter<
 boost::detail::sp_ms_deleter< T > >( pt );

 to:

   boost::detail::sp_ms_deleter< T > * pd =
 static_cast<boost::detail::sp_ms_deleter< T >
 *>(pt._internal_get_raw_deleter());


 Benchmarking the results afterwards gave me the following results on
 VC++9:

 TestBoostSharedPtrNew 9.204s 4.34594e+006 allocs/s

 TestBoostMakeShared 10.499s 3.80989e+006 allocs/s

 TestBoostMakeSharedAlt 7.831s 5.1079e+006 allocs/s

 These changes translated into almost 35% improvement in allocation speed
 over the current implementation of boost::make_shared. Or to put it
 differently, they amount to 25+% decrease in running time as we could have
 supposed from the profiling results.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/6830>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:09 UTC