Boost logo

Boost :

Subject: Re: [boost] [function] function wrappingwithnoexceptionsafetyguarantee
From: Domagoj Saric (dsaritz_at_[hidden])
Date: 2010-11-08 19:19:30


"Daniel Walker" <daniel.j.walker_at_[hidden]> wrote in message
news:AANLkTinT6ofcXAi3TsBCDoDqLVgLn-sK4g0pV9pPOGu7_at_mail.gmail.com...

> On Sat, Oct 30, 2010 at 1:25 PM, Domagoj Saric <dsaritz_at_[hidden]> wrote:
>> e.g. http://lists.boost.org/Archives/boost/2010/01/160908.php
>
> Thanks. I added a tarball, signal_benchmark.tar.bz2, with a jamfile
> and source code so that anyone who's interested can easily reproduce
> this benchmark. The benchmark measures the impact of the static empty
> scheme on the time per call of boost::signals2::signal using the code
> Domagoj linked to. Thanks to Christophe Prud'homme for the original
> benchmark!
>
> Here are the results I got, again, using the build of g++ 4.2 provided
> by my manufacturer.
>
> Data (Release):
> | function | function (static empty)
> time/call | 3.54e-07s | 3.51e-07s
> space/type | 64B | 80B
>
> Data (Debug):
> | function | function (static empty)
> time/call | 2.05e-06s | 2.04e-06s
> space/type | 64B | 80B
>
> You can see that removing the empty check from boost::function yields
> about a 1% improvement in time per call to boost::signal. The
> increased space per type overhead is the same as before: 16B.

You just missed one important detail mentioned in the original post which is
to use a dummy mutex...
The fact that you were able to consistently measure _any_ difference (even
with your own simple modification) for something that should ideally be a
'simple' indirect call while 'surrounded' with all the dynamic memory
allocations, mutex locking, local shared_ptr/guard objects and other complex
internal signals2 logic... speaks volumes about the actual overhead at hand
(for which you now
seem to want to claim as insignificant)...

You also misinterpreted the benchmark itself and used an incorrect
'formula'/logic to count the number of boost::function invocations. Note
that this is/was a boost::signals(2) benchmark and the number of
boost::function invocations is not the same as the number of
boost::signal(2) invocations...for example 25% of the time the benchmark
code you posted is invoking a signal with no handler/boost::function
assigned at all (that you take into account as boost::function
invocations)...Even when the count part of the 'formula' is corrected, the
end result 'name', 'average time per call', is still a misnomer as the
calculation still/also includes signal creation, 'resizing' etc (which OTOH
then also implictly benchmarks boost::function copy-construction and
assignment, another sub-optimal area of the current implementation)...
The correct way to use and interpret the benchmark is exactly the way its
original author did...to simply compare total times (for intermediate sizes
or for the whole benchmark)...
Additionally the N chosen is IMO not large enough for the latest
architectures (e.g. an i5_at_4+ GHz that also constantly dynamically adjusts
its frequency) to achieve stable enough results...

The patch provided switches to a dummy mutex, adds two zeros to N, adjusts
the benchmark's priority, corrects the number-of-calls calculation and skips
the invocation of empty signals...

The differences in the
'average-time-per-call-that-is-actually-something-else' number that the
benchmark, patched and compiled with MSVC++ 10 (/Oxs), shows between
https://svn.boost.org/svn/boost/sandbox/function/boost/function
and
https://svn.boost.org/svn/boost/trunk/boost/function
are
Via C7-M ~6%
Intel i5 ~8%
AMD Athlon64 ~24%

(Yes, the Athlon number is correct, I measured it several times...if there
is an AMD architecture expert lurking around here I'd love to hear his/hers
thoughts about the result ;)

> So, basically, in the use-case measured by this benchmark the time
> overhead of boost::function is dwarfed by the combined costs of
> boost::signal and the target function, and so using the static empty
> scheme does not yield much benefit.

Even if it were 'dwarfed' (which it isn't) that result would still not imply
that the change is somehow irrelevant or not worth doing (if there are no
drawbacks to it, and there aren't, aiming specifically at your
claims/'concerns' about static space size)...
Justifying inefficient code with the fact that there exists even slower code
that wraps it is just downright wrong (even though so frequently done) as
the same logic could then be used to justify just about any bad thing in the
Universe since you can always find something worse...

-- 
"What Huxley teaches is that in the age of advanced technology, spiritual
devastation is more likely to come from an enemy with a smiling face than
from one whose countenance exudes suspicion and hate."
Neil Postman 



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk