Boost logo

Boost :

From: Matthew Hurd (matt_at_[hidden])
Date: 2004-02-12 04:56:31


> On Behalf Of Matthew Hurd
> Subject: RE: [boost] [function] timing trivia
>
>
>
> > On Behalf Of Douglas Paul Gregor
> > Subject: Re: [boost] [function] timing trivia
> >
> > On Thu, 12 Feb 2004, Matthew Hurd wrote:
> > > Just completed some timings of the cost of a boost function call using
> a
> > > boost::function<void (void)> and a raw function pointer versus
> inlining
> > the
> > > function's code directly into the source.
> > >
> > > For my circumstances (your mileage may vary):
> > > Calling via raw function pointer = 314 microseconds +/- 1
> > > Calling via boost function variable = 328 microseconds +/- 1
> > >
> > > Only 14 microseconds of difference. Nice work, I was expecting much
> > more.
> > > Note: Optimised VC7.1 output, winXP 1a, Athlon 2800+ (2.09GHz)
> >
> > Looks like VC 7.1 is doing a good job optimizing this. I get asked about
> > the performance of function<> every once in a while; mind if I stick
> this
> > data in the FAQ?
> >
> > Doug
>
> Doug,
>
> These results are invalid.
>
> I think I am embarrassing myself here... my benchmarking code is reporting
> the function call overhead in proportion to the size of the loop internal
> to
> a function :-( I'm doing something wrong here, it might be the optimizer
> eliding some code due to it detecting an unused variable.
>
> I'll investigate and report some results I'm more confident in.
>
> Regards,
>
> Matt Hurd.

I've been staring at measurements and double checked many things, but I'm at
a loss to explain what I see.

The good news is I'm confident the overhead of boost function with a static
method call or a plain function for optimised code is within 60 nanoseconds
of embedding the code itself for code that is a tight loop irrespective of
the size of the loop for my circumstances. Importantly, sometimes
boost::function measures faster to the resolution of my high resolution
timer.

That is, for the optimized case, I measure no consistent discernable
abstraction penalty at all.

Lemma: boost::function is beautiful

What am I measuring? This method, or the equivalent function...

                static double not_empty()
                {
                        static double sum;
                        static double i;

                        sum = 0.0;
                        for (i = 0.0; i < MAX_FN_LOOP ; ++i)
                        {
                                sum += i * i;
                        }

                        return sum;
                }

I see some strange behaviour for unoptimised debug code (median of 1000
measurements):

Iterations in the
"internal loop of the cost in microseconds of
function" boost::function<double (void)>
--------------------------------------------------------------
         1 0.182
        10 0.196
       100 0.183
     1,000 0.059
    10,000 -1.176
   100,000 -14.767
 1,000,000 +67.110
10,000,000 -1,131.9

Can't explain it. Boost function can't be faster for the 10 million loop by
more than a millisecond. I see none of this whacky behaviour with release
optimized code.

The code is pretty straight forward at the core.

                        double answer1 = 0.0;
                      for (size_t j = 0; j< num_trials; ++j )
                        {
                                trial.restart();
                                answer1 += fn();
                                now = trial.elasped();

                                timing1[j] = now;
                        }

                        double answer2 = 0.0;
                        for (size_t j = 0; j< num_trials; ++j )
                        {
                                trial.restart();

                                static double sum;
                                static double i;
                                sum = 0;
                                for (i = 0; i < MAX_FN_LOOP ; ++i)
                                {
                                        sum += i * i;
                                }

                                answer2 += sum;

                                now = trial.elasped();

                                timing2[j] = now;
                        }

Doug, I hope this says something, but the strangeness leaves me uneasy.

Regards,

Matt Hurd.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk