|
Boost : |
From: Douglas Gregor (doug.gregor_at_[hidden])
Date: 2006-01-07 14:11:00
I've now implemented the small buffer optimization for
Boost.Function. The patch is attached, but I have yet to check it in.
Here's the executive summary:
Performance difference: up to 6x faster when the SBO applies
Space difference: boost::function takes an extra 4 bytes (now, it's
16 bytes)
Semantics: Assignment operators now give the basic guarantee (was
the strong guarantee); swap() can now throw.
(We're now less TR1-conforming, but we
could claim that the TR is wrong to be so strict).
Usability: The optimization won't help much in practice unless
Boost.Bind objects become smaller :(
I've extended the performance test (attached) with a "smallbind"
function object and its tests. The tests that follow use both
"smallbind" and "bind", separately, because the former fits in the 12-
byte SBO buffer whereas the latter does not.
I tested on GCC 4.2.0 (bleeding edge, straight from CVS) and GCC 3.3
(both Apple and FSF) on a Dual G5 running Mac OS X Panther and on an
Athlon XP system running Linux. The newer compiler gave us the
performance boost we wanted, with about a 6x improvement when the SBO
is used. We get more like 2x with GCC 3.3, although I have a trick or
two left that may improve things.
In the data that follows, there are 3 versions of Boost.Function
being tested:
1.33.1: This is Boost.Function as released in Boost 1.33.1. No SBO
applied, of course.
1.34.0 w/ vtables: This is Boost.Function as it currently stands in
Boost CVS. It uses vtables for a space optimization (a
boost::function object requires only 8 bytes of storage), but does
not implement the SBO.
1.34.0 w/ tables and SBO: This is Boost.Function in Boost CVS with
the attached patch applied. It uses vtables and contains a 12-byte
(actually, the size of a member pointer + the size of a void*) buffer
for the SBO optimization.
Even with the SBO in Boost.Function, users won't immediately realize
the benefits. The problem is that Boost.Bind produces function
objects whose size is not minimal. For instance, boost::bind
(&Test::func, &func, _1) returns a function object that is 16 bytes.
That 4 bytes of wasted space doesn't matter most of the time, but
here is means the difference between using the SBO and not using the
SBO :(
So, Peter, any chance of getting a slightly more optimized Boost.Bind
that can fit boost::bind(&Test::func, &func, _1) into 12 bytes?
Doug
On my Athlon XP box
-------------------------
OS: Gentoo Linux ("old")
Compiler: GCC 3.3.6
Flags: -O3 -funroll-loops -fomit-frame-pointer)
[1.33.1]
Time elapsed for simple bind: 1.360000 (sec)
Time elapsed for smallbind+function (size=12): 10.870000 (sec)
Time elapsed for bind+function (size=16): 11.770000 (sec)
Time elapsed for pure function invocation: 1.590000 (sec)
Time elapsed for bind+function+pool: 29.690000 (sec)
Time elapsed for bind+function+fastpool: 6.560000 (sec)
[1.34.0 w/ vtables]
Time elapsed for simple bind: 1.410000 (sec)
Time elapsed for smallbind+function (size=12): 11.260000 (sec)
Time elapsed for bind+function (size=16): 12.500000 (sec)
Time elapsed for pure function invocation: 1.530000 (sec)
Time elapsed for bind+function+pool: 30.360000 (sec)
Time elapsed for bind+function+fastpool: 7.370000 (sec)
[1.34.0 w/ vtables and SBO]
Time elapsed for simple bind: 1.360000 (sec)
Time elapsed for smallbind+function (size=12): 5.190000 (sec)
Time elapsed for bind+function (size=16): 13.150000 (sec)
Time elapsed for pure function invocation: 1.660000 (sec)
Time elapsed for bind+function+pool: 29.730000 (sec)
Time elapsed for bind+function+fastpool: 7.060000 (sec)
On my Athlon XP Linux box
-------------------------
OS: Gentoo Linux ("old")
Compiler: GCC 4.2.0 (20051122, experimental)
Flags: -O3 -funroll-loops -fomit-frame-pointer)
[1.33.1]
Time elapsed for simple bind: 0.850000 (sec)
Time elapsed for smallbind+function (size=12): 14.230000 (sec)
Time elapsed for bind+function (size=16): 15.100000 (sec)
Time elapsed for pure function invocation: 1.430000 (sec)
Time elapsed for bind+function+pool: 26.930000 (sec)
Time elapsed for bind+function+fastpool: 9.060000 (sec)
[1.34.0 w/ vtables]
Time elapsed for simple bind: 0.020000 (sec)
Time elapsed for smallbind+function (size=12): 13.410000 (sec)
Time elapsed for bind+function (size=16): 13.360000 (sec)
Time elapsed for pure function invocation: 1.350000 (sec)
Time elapsed for bind+function+pool: 25.570000 (sec)
Time elapsed for bind+function+fastpool: 7.590000 (sec)
[1.34.0 w/ vtables and SBO]
Time elapsed for simple bind: 0.020000 (sec)
Time elapsed for smallbind+function (size=12): 2.640000 (sec)
Time elapsed for bind+function (size=16): 12.940000 (sec)
Time elapsed for pure function invocation: 1.460000 (sec)
Time elapsed for bind+function+pool: 25.260000 (sec)
Time elapsed for bind+function+fastpool: 7.430000 (sec)
On my dual G5 PowerMac
----------------------
OS: Panther (10.3.9)
Compiler: Apple GCC 3.3
Flags: -O3
[1.33.1]
Time elapsed for simple bind: 2.330000 (sec)
Time elapsed for smallbind+function (size=12): 22.080000 (sec)
Time elapsed for bind+function (size=16): 29.370000 (sec)
Time elapsed for pure function invocation: 2.590000 (sec)
Time elapsed for bind+function+pool: 38.810000 (sec)
Time elapsed for bind+function+fastpool: 21.460000 (sec)
[1.34.0 w/ vtables]
Time elapsed for simple bind: 1.180000 (sec)
Time elapsed for smallbind+function (size=12): 24.050000 (sec)
Time elapsed for bind+function (size=16): 25.860000 (sec)
Time elapsed for pure function invocation: 2.590000 (sec)
Time elapsed for bind+function+pool: 42.000000 (sec)
Time elapsed for bind+function+fastpool: 23.590000 (sec)
[1.34.0 w/ vtables and SBO]
Time elapsed for simple bind: 1.200000 (sec)
Time elapsed for smallbind+function (size=12): 8.140000 (sec)
Time elapsed for bind+function (size=16): 24.180000 (sec)
Time elapsed for pure function invocation: 2.590000 (sec)
Time elapsed for bind+function+pool: 39.210000 (sec)
Time elapsed for bind+function+fastpool: 21.280000 (sec)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk