
Boost : 
Subject: Re: [boost] Boost.Plot?  Scalable Vector Graphics (was [expressive] performance tuning
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 20090730 18:21:26
Edward Grace wrote:
>> Personally I perform several runs of the same benchmark and then
>> take the minimum time as the time I report.
>
> Be very cautious about that for a number of reasons.
>
> a) By reporting the minimum, without qualifying your statement, you
> are lying. For example if you say "My code takes 10.112 seconds to
> run." people are unlikely to ever observe that time  it is by
> construction atypical.
I try to be explicit about explaining what metric I'm reporting. If I'm reporting the minimum of ten trials I report it as the minimum of ten trials. Taking the minimum of ten trials and comparing it to the minimum of twenty trials and not qaulifying it would be even more dishonest because it isn't fair. I compare same number of trails, report that it is the minimum and the number of trails.
> b) If ever you do this (use the minimum or maximum) then you are
> unwittingly sampling an extreme value distribution:
>
> http://en.wikipedia.org/wiki/Extreme_value_distribution
>
> For example  say you 'know' that the measured times are normally
> distributed with mean (and median) zero and a variance of 10. If you
> take, say 100 measurements and then take the minimum  how is that
> minimum distributed? In other words if your repeat this  again and
> again  what does the histogram of the minimum look like?
>
> What you will find is that the dispersion (variance or otherwise) on
> the median is small say \sigma^2=0.15 whereas the dispersion on the
> minimum can be much larger \sigma^2=1.8  *and* the distribution is
> highly asymmetrical.
>
> If your two times were nominally separated by around 0.5 you would
> not necessarily reliably determine which was fastest from the minima
> since the distributions of the minima could overlap.
>
> Attached are two examples, histograms of the observed minimum for
> many trials and the observed median for many trials of the above
> scenario. Notice that the histogram of the observed minimum is *far*
> broader than the observed median.
In this particular case of runtime the distribution is usually not normal. There is some theoritical fastest the program could go if nothing like the OS or execution of other programs on the same machine cause it to run slower. This is the number we actually want. We can't measure it, but we observe that there is a "floor" under our data and the more times we run the more clear it becomes where that floor is. It just won't run faster than that no matter how often we run. The minimum value tends to be closer to the mode of the distribution than to the median because runtimes cluster around the minimum value and tail off above. The idea is that using the minimum as the metric eliminates most of the "noise" introduced into the data by stuff unrelated to what you are trying to measure. Taking the median is just taking the median of the noise. In an ideal compute environment your runtimes would be so similar from run to run you wouldn't bother to run more than once. If there is noise in your data set your statistic should be designed to eliminate the noise, not characterize it.
In my recent boostcon presentation my benchmarks were all for a single trial of each algorithm for each benchmark because noise would be eliminated during the curve fitting of the data for analysis of how well the algorithms scaled (if not by the conversion to logarithmic axis....) If I had taken the minimum of some number of trials I would have reported that in the paper.
Regards,
Luke
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk