Boost logo

Boost :

From: Douglas Gregor (doug.gregor_at_[hidden])
Date: 2005-12-29 12:49:40

On Dec 28, 2005, at 4:15 AM, Alex Besogonov wrote:

> Vladimir Prus wrote:
>>> I don't have any problem with boost::function speed of invocations
>>> (though FastDelegate is two times faster).
>> I see. Still, would be nice to see specific numbers.
> I've attached a test program (you need FastDelegate from http://
> to compile it).
> Results:
> =========================
> C:\temp\delegates>gcc -O3 -funroll-loops -fomit-frame-pointer
> test.cpp -Ic:/tools/boost -lstdc++
> C:\temp\delegates>a.exe
> Time elapsed for FastDelegate: 1.191000 (sec)
> Time elapsed for simple bind: 0.010000 (sec)
> Time elapsed for bind+function: 33.118000 (sec)
> Time elapsed for pure function invocation: 3.705000 (sec)
> =========================
> (GCC 4.1.0 was used)
> You can see that boost::function + boost::bind is an order of
> magnitude slower than FastDelegate. Even a mere invocation of a
> boost::function is slower than complete bind+invoke for FastDelegate.

The major performance problem in this example is the memory
allocation required to construct boost::function objects. We could
implement SBO directly in boost::function, but we're trading off
space and performance. Is it worth it? It depends on how often you
copy boost::function objects vs. how many of them you store in memory.

Would a pooling allocator solve the problem? I tried switching the
boost::function<> allocator to boost::pool_allocator and
boost::fast_pool_allocator (from the Boost.Pool library), but
performance actually got quite a bit worse with this change:

Time elapsed for simple bind: 2.050000 (sec)
Time elapsed for bind+function: 43.120000 (sec)
Time elapsed for pure function invocation: 2.020000 (sec)
Time elapsed for bind+function+pool: 130.750000 (sec)
Time elapsed for bind+function+fastpool: 108.590000 (sec)

Pooling is not feasible, so we need the SBO for performance, but not
all users can take the increase in boost::function size. On non-
broken compilers, we could use the Allocator parameter to implement
the SBO. At first I was hoping we could just make boost::function
smart enough to handle stateful allocators, then write an SBO
allocator. Unfortunately, this doesn't play well with rebinding:

template<typename Signature, typename Allocator>
class function : Allocator
   template<typename F>
   function(const F& f)
     typedef typename Allocator::template rebind<F>::other my_allocator;
     my_allocator alloc(*this);
     F* new_F = alloc.allocate(1); // where does this point to?
     // ...

Presumably, a SBO allocator's allocate() member would return a
pointer into it's own buffer, but what happens when you rebind for
the new type F and then allocate() using that rebound allocator? You
get a pointer to the wrong buffer.

So the SBO needs to be more deeply ingrained in boost::function. The
common case on most 32-bit architectures is an 8-byte member function
pointer and a 4-byte object pointer, so we need 12 bytes of storage
to start with for the buffer; boost::function is currently only 12
bytes (4 bytes of that is the buffer). boost::function adds to this
the "manager" and "invoker" pointers, which would bring us to 20
bytes in the SBO case. But, we can collapse the manager and invoker
into a single vtable pointer, so we'd get back down to 16 bytes.
Still larger than before, but that 4-byte overhead could drastically
improve performance for many common cases. I'm okay with that. We'll
probably have to give up the no-throw swap guarantee, and perhaps
also the strong exception safety of copying boost::function objects,
but I don't think anyone will care about those. The basic guarantee
is good enough.


Boost list run by bdawes at, gregod at, cpdaniel at, john at