|
Boost : |
From: Robert Zeh (razeh_at_[hidden])
Date: 2004-12-02 09:32:21
I profiled speed_test.C with quantify to determine where the signals
library was spending its time. On my SPARC/Solaris system with gcc
3.3.2 it was spending about 1/3 of its time in malloc, mostly as the
result of the cache construction inside of slot_call_iterator.
At home I decided to try replacing the shared_ptr in
slot_call_iterator with an instance variable of the result_type.
Since slot_call_iterator uses the shared_ptr to determine if the cache
is valid, I had to add a bool indicating if the cached value is valid
or not. These changes made the benchmark run about two to four times
faster (on my home system, a 650 MHz Duron with gcc 3.3.2 running
Debian). I expect better results with my SPARC machine, because the
malloc implementation seems to be slower.
There are several drawbacks to my modifications. It's harder to
maintain because of the added bool. The result_type used by
slot_call_iterator must now have a default constructor. If it is
expensive to copy the result_type, and slot_call_iterator is copied a
lot, replacing the shared_ptr with an instance variable will actually
make things slower.
I don't know enough about the internals to weigh how important these
issues are.
I believe the correct way to do things is to create a cache interface
that encapsulates the behavior the slot_call_iterator needs, and to
then choose the appropriate cache implementation at runtime using the
mpl.
I've attached both a diff for my changes to slot_call_iterator.hpp and
the modified file. I'd be interested in knowing how it changes the
performance on other platforms.
Current performance on my 650 MHz Duron
===== 1000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 1000 0.0073 0.0002
10 100 0.0025 0.0001
50 20 0.0023 0.0001
100 10 0.0021 0.0001
250 4 0.0021 0.0001
500 2 0.0024 0.0001
1000 1 0.0026 0.0002
===== 10000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 10000 0.0680 0.0022
10 1000 0.0258 0.0007
50 200 0.0214 0.0006
100 100 0.0209 0.0006
250 40 0.0209 0.0006
500 20 0.0218 0.0008
1000 10 0.0246 0.0008
5000 2 0.0254 0.0025
10000 1 0.0257 0.0027
===== 100000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 100000 0.7370 0.0224
10 10000 0.2573 0.0090
50 2000 0.2284 0.0067
100 1000 0.2691 0.0072
250 400 0.2111 0.0069
500 200 0.2202 0.0094
1000 100 0.2635 0.0259
5000 20 0.2684 0.0318
10000 10 0.2749 0.0330
50000 2 0.2629 0.0266
100000 1 0.2638 0.0290
===== 1000000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 1000000 6.9143 0.2232
10 100000 2.5620 0.0752
50 20000 2.2086 0.0669
100 10000 2.1817 0.0724
250 4000 2.1839 0.0894
500 2000 2.1643 0.1066
1000 1000 2.6981 0.3238
5000 200 2.7720 0.3870
10000 100 2.7220 0.3980
50000 20 2.7763 0.3479
100000 10 2.8006 0.3774
500000 2 2.6703 0.2991
slot_call_iterator with an instance variable instead of a shared_ptr
performance on my 650MHz Duron.
===== 1000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 1000 0.0014 0.0002
10 100 0.0004 0.0001
50 20 0.0003 0.0001
100 10 0.0406 0.0001
250 4 0.0003 0.0001
500 2 0.0004 0.0001
1000 1 0.0007 0.0002
===== 10000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 10000 0.0145 0.0022
10 1000 0.0039 0.0007
50 200 0.0030 0.0006
100 100 0.0029 0.0006
250 40 0.0029 0.0006
500 20 0.0033 0.0008
1000 10 0.0066 0.0009
5000 2 0.0073 0.0025
10000 1 0.0076 0.0029
===== 100000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 100000 0.1446 0.0219
10 10000 0.0385 0.0075
50 2000 0.0302 0.0066
100 1000 0.0288 0.0065
250 400 0.0296 0.0068
500 200 0.0344 0.0083
1000 100 0.0949 0.0257
5000 20 0.0844 0.0318
10000 10 0.0819 0.0329
50000 2 0.0825 0.0294
100000 1 0.0741 0.0282
===== 1000000 Total Calls =====
Num Slots Calls/Slot Boost Lite
--------- ---------- ------- -------
1 1000000 1.5345 0.2376
10 100000 0.6345 0.0758
50 20000 0.6562 0.2402
100 10000 0.2960 0.0686
250 4000 0.3477 0.0932
500 2000 0.5850 0.1311
1000 1000 1.7935 0.3861
5000 200 1.0274 0.4182
10000 100 1.1677 0.4073
50000 20 1.1002 0.7818
100000 10 1.5208 0.4197
500000 2 0.7921 0.2938
http://home.earthlink.net/~rzeh
Robert Zeh
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk