Subject: Re: Streamlining benchmarking process
From: stefan (stefan_at_[hidden])
Date: 2019-05-09 12:45:41
Hi Olzhas, all,
On 2019-05-09 5:09 a.m., Olzhas Zhumabek wrote:
> I thought about introducing a benchmarks folder and some build scripts to
> streamline benchmarking process.
Wonderful. I was trying to collect my thoughts to send out a similar
mail, but you beat me to it ! :-)
> First, let me list the problems that this will hopefully solve, in
> decreasing order of importance:
> 1. Simplify performance issue submission
> 2. Make it easy to get rough approximation of original environment (those
> that can be made using code only)
> 3. Quickly accept or reject issue (e.g. if it is caused by GIL itself or
> some environment issue)
> 4. Check if performance degraded significantly
> Now let me list the cons of the idea that I have thought about:
> 1. There is not much to benchmark yet
> 2. Arrival frequency of performance related issues is very low
I agree to most of the above. I'm not sure about the last point. It's
true that over recent months we have been concerned with coding issues
and compile-time performance, but the fact that we didn't have a
benchmarking infrastructure to measure the impact any of this had on
(runtime) performance doesn't mean there wasn't any.
I'm fully expecting us to run the benchmark suite (once there is one) on
par with the test suite for any (significant) code change that may
affect performance, and I in particular expect this to be of great use
over the next couple of months as we'll focus on image processing
So, this is excellent timing !
> I've wondered around boost libraries and it seems like ublas has a
> benchmarks folder, but uses homegrown benchmarking facility, which might
> slightly complicate reproduction.
Not really, given that I wrote it. :-)
But that point aside, I'm normally quite averse to the NotInventedHere
syndrome, i.e. I'd rather avoid reinventing wheels, if possible.
But before diving into a tools discussion, let's quickly collect
requirements that any tool(s) we agree on needs to meet. Here is my list
of use-cases. Feel free to augment and complement:
* We should define one benchmark per algorithm, parametrized around
various axes, such as value- and layout types (pixels, channels, etc.)
to make it easy to compare different instances of the same algorithm.
* Likewise, we should define benchmarks to be able to run over a range
of (image-) sizes, as performance will vary greatly on that (and depend
on the hardware we run on, including but not limited to cache sizes).
* It should be possible to run a single benchmark instance, and produce
a benchmark result file, containing a table (list of (size,time) pairs).
* It should then be possible to take multiple such files as input, and
produce a comparative chart.
* It should be possible to implement benchmarks for a given algorithm,
using external (non Boost.GIL) implementations, to compare Boost.GIL to
other libraries (OpenCV, for example).
* It should also be possible to later add additional implementations to
Boost.GIL, and thus augment the parameter space (last summer I mentored
a Boost.uBLAS project that added GPU support using OpenCL backends, so
the ability to compare a GPU-based implementation with a host-only
implementation, especially over a range of problem sizes, was extremely
We have already started discussing some of this with Samuel Debionne in
https://github.com/boostorg/gil/issues/234. Hopefully he is reading this
mail and can jump in to participate.
> I propose the following changes:
> 1. Create benchmarks folder in root of GIL.
> 2. (Optional) write some simple benchmark to check if google-benchmark is
> installed properly
Yes. I have quickly looked at google-benchmark, and while I'm not yet
convinced this meets my expectations, I'm certainly willing to try it
out and experiment.
> 3. Write build scripts (jamfile, cmake+conan) to provide an option to build
> benchmarks and optionally install google-benchmark using conan
Right, we need to hook the benchmarking up to whatever build system
people use, to get visibility, and ultimately feedback.
On Boost.uBLAS we decided to have the CI builds build the benchmarks (to
make sure they at least compile), but not run them, since doing serious
benchmark work requires controlling the hardware you are running on,
i.e. controlling what else is running at the same time, etc., so not a
good use of a containerized environment, I'd expect.
Similarly, I'd think that we should allow travis-ci, appveyor, etc. to
build benchmarks, but also should give clear instructions to users for
running benchmarks manually, including analyzing the results.
> 4. Import all existing performance issues into that folder
I'm not quite sure what you mean by that.
> 5. Mention in contributing.md that performance issues should preferably be
> reproduced in that folder as google benchmark and the results embedded into
I expect that once we have a benchmarking suite, we can start collecting
issues that focus on performance (with a well-defined process to
reproduce problems locally). In fact, at that point we could add a new
issue category called "performance".
> What do you think?
> Is this idea even worth it?
> Or it could be put a bit further into to-do list?
> If worth it, what changes exactly should I introduce?
I very much agree that this is worth pursuing. And while I'm not yet
convinced of Google-benchmarks being the right tool for the job, I'm
open to giving it a try. We can iterate over this for a while, as long
as we focus on the usability, so we have enough infrastructure ready
when we start seriously working on new IP algorithms in a few weeks.
-- ...ich hab' noch einen Koffer in Berlin...
Boost list run by Boost-Gil-Owners