Boost logo

Boost :

Subject: Re: [boost] proposal - Statistically robust function timing for performance decisions.
From: Edward Grace (ej.grace_at_[hidden])
Date: 2009-07-18 11:04:14


Dear All,

As suggested I have packaged up a rough-n-ready version and stuffed
it in the vault.

It includes Doxygen generated documentation (HTML/PDF) and an example
(example_timer.cpp) which itself has copious comments.

Platform specifics (as far as I know) will be related to the
chronometer. You may have to wrap your own chronometer (e.g. using
something like gettimeofday() or getticks() from FFTW). I do not
recommend using std::clock() for the included example as it is simply
not precise enough - consequently you will grow bored waiting for it
to finish!

http://www.boostpro.com/vault/index.php?
action=downloadfile&filename=ejg_timer.zip&directory=Tools

http://tinyurl.com/nejw9o

On my Mac, the example can be built and run by unzipping the file and
doing the following:

$ cd ejg_timer/
$ g++ -DNDEBUG -O3 -I. -o example_timer ejg/example_timer.cpp
$ ./example_timer
Calibrating the clock, for a slow timer this may take a while!
Point estimate of clock overhead (t_c): 0.0986196
Potential jitter on t_c (sigma_c) : 0.633371
Increasing precision until I can discriminate between function a and b.
The nominal confidence interval (minimum, [median], maximum)
represents the region: 1-alpha=0.95
Timing at a nominal precision of 50%: We cannot distinguish between A
and B: (-1.62543 [-0.339245] 1.78913)%
Timing at a nominal precision of 25%: We cannot distinguish between A
and B: (0 [0] 2.38656)%
Timing at a nominal precision of 12.5%: We cannot distinguish between
A and B: (0 [0] 0.69492)%
Timing at a nominal precision of 6.25%: We cannot distinguish between
A and B: (0 [1.84168] 3.71728)%
Timing at a nominal precision of 3.125%: We cannot distinguish
between A and B: (-0.173508 [0.757778] 0.924288)%
Timing at a nominal precision of 1.5625%: A is faster than B by:
(0.563103 [0.690461] 0.940433)%

As you will note, there's a fair bit of extra cruft that has come
along for the ride, in particular I think it is worth being aware of:

striding_iterator -- Perhaps boost could do with this as a separate
thing - it's pretty handy.
robust_linear_fit -- A least absolute deviation robust linear fit,
used to calibrate the clock overhead. This is preferable to least
squares as it is insensitive to outliers.
statistics::ks_test -- This implements a Kolmogorov-Smirnov two-
sample test and is critical to answering the question 'have we taken
enough samples'.
wilcoxon_ci -- Carry out a Wilcoxon signed-rank test. Used to
determine confidence intervals on paired samples for the relative
timing.

I will take some time to carefully review my thought processes to
check that what I did is what I intended to do. There are some subtle
points which stretch my statistics understanding regarding the
repeated application of the ks_test - I suspect this is best
approached with some Baysian prior technique - that said what I have
seems to work well! Perhaps more by luck than judgement!

Comments and questions are welcome.

-ed

------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk