|
Boost : |
Subject: [boost] [pool][Ticket #2359]Performance impact of bug fix
From: Peter Hurley (phurley_at_[hidden])
Date: 2009-07-25 11:17:08
The bug fix implemented for this ticket has a significant (approx 25~35%)
performance impact when using fast_pool_allocator for the shared_ptr<>
allocator, as in,
static fast_pool_allocator<T> pool;
shared_ptr<T> a(T,Tdestroyer,pool);
Background:
Although http://svn.boost.org/trac/boost/ticket/2359 has an extensive
analysis and discussion of the problem, the key issue was that static
non-local fast_pool_allocators did *not* force the prior construction of
the underlying singleton_pool instance (class template static data members
have unordered initialization).
The fix implemented was to call
singleton_pool<...>::is_from(0);
in the ctors of fast_pool_allocate to enforce the proper construction
order (for global ctors). Of course, this fix effects all scope & lifetime
fast_pool_allocators.
Problem:
However, is_from() performs other, non-trivial work as well. From
singleton_pool.hpp,
template <...>
struct singleton_pool {
.
.
static bool is_from(void * const ptr)
{
pool_type & p = singleton::instance();
details::pool::guard<Mutex> g(p);
return p.p.is_from(ptr);
}
Although the impact at startup is neglible, the copy ctor of
fast_pool_allocator is called during the construction -and- the
destruction of *every* shared_ptr<T> a(P,D,A). (Because the shared_ptr
needs to rebind the allocator from <T> to sp_counted_impl_pda<P,D,A> ).
The net effect is that the pool is locked and accessed *twice* every time
a shared_ptr is constructed or destructed. Because of the locking
overhead, even in a single-threaded environment, shared_ptr<T>
a(P,D,fast_pool_allocator<T> > does not out-perform shared_ptr<T> a(P,D)!
This situation is especially painful in a contentious MP environment.
One solution (although by-no-means exhaustive) would be to only perform
the bare minimum necessary to force prior construction of the
singleton_pool instance. For example, in boost/pool/singleton_pool.hpp,
template <...>
struct singleton_pool {
.
.
static void force_construction()
{
singleton::instance();
}
and in boost/pool/pool_alloc.hpp
fast_pool_allocator()
{
singleton_pool<fast_pool_allocator_tag, sizeof(T),
UserAllocator, Mutex, NextSize>::force_construction();
}
template <typename U>
fast_pool_allocator(
const fast_pool_allocator<U, UserAllocator, Mutex, NextSize> &)
{
singleton_pool<fast_pool_allocator_tag, sizeof(T),
UserAllocator, Mutex, NextSize>::force_construction();
}
Regards,
Peter Hurley
PS - Also, casual profiling (gcc,x86,windows) seems to indicate that using
boost::detail::spinlock as the default lock from
<boost/smart_ptr/detail/spinlock.hpp> would yield add'l performance
benefits over details::pool::default_mutex. Faster yet would be a native
locked_compare_exchange spinlock similar to the
atomic_conditional_increment() implemented in the
<sp_counted_base_***.hpp> headers...
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk