Boost logo

Boost :

Subject: Re: Streamlining benchmarking process
From: Olzhas Zhumabek (anonymous.from.applecity_at_[hidden])
Date: 2019-05-09 21:40:03


I'll answer to Stefan's email first, as most of the points overlap with
Mateusz's email.

>From Stefan's email:

> But before diving into a tools discussion, let's quickly collect
> requirements that any tool(s) we agree on needs to meet. Here is my list
> of use-cases. Feel free to augment and complement:
> * We should define one benchmark per algorithm, parametrized around
> various axes, such as value- and layout types (pixels, channels, etc.)
> to make it easy to compare different instances of the same algorithm.
> * Likewise, we should define benchmarks to be able to run over a range
> of (image-) sizes, as performance will vary greatly on that (and depend
> on the hardware we run on, including but not limited to cache sizes).
> * It should be possible to run a single benchmark instance, and produce
> a benchmark result file, containing a table (list of (size,time) pairs).
> * It should then be possible to take multiple such files as input, and
> produce a comparative chart.
> * It should be possible to implement benchmarks for a given algorithm,
> using external (non Boost.GIL) implementations, to compare Boost.GIL to
> other libraries (OpenCV, for example).
> * It should also be possible to later add additional implementations to
> Boost.GIL, and thus augment the parameter space (last summer I mentored
> a Boost.uBLAS project that added GPU support using OpenCL backends, so
> the ability to compare a GPU-based implementation with a host-only
> implementation, especially over a range of problem sizes, was extremely
> useful.

I don't have anything to add in mind at the moment. May be writing more
benchmarks will reveal more feature requirements. I believe this is a great
starter set.

Here is the list of tools I'm going to implement the benchmarks with for
now:

1. Code by Stefan:
https://github.com/boostorg/ublas/blob/develop/benchmarks/benchmark.hpp
2. Google Benchmark: https://github.com/google/benchmark
3. Celero: https://github.com/DigitalInBlue/Celero

The code for 2 is already written:
https://github.com/boostorg/gil/issues/234#issuecomment-489749263 . I am
unsure if reimplementing all of the benchmarks there is feasible, but I
will try at least those that contribute into understanding of the issue the
most.

Right, we need to hook the benchmarking up to whatever build system
> people use, to get visibility, and ultimately feedback.

It seems like CMake already has sort of similar functionality for
extensions and examples (downloading using conan or requiring find module),
so code for CMake is going to be mostly copy pasting and tweaking. I have
one concern about conan option though: there is no way to download only
some dependencies from conanfile.txt. I was thinking about each folder
having their own conanfile.txt, but then the dependency collection process
(which is done using appending into gil-dependencies) will get a bit
obfuscated, in my opinion. There is also an option to directly specify the
repo to download from, but not sure how much maintainable that will be.

Jamfile should be even easier, as it doesn't seem to support conan.

On Boost.uBLAS we decided to have the CI builds build the benchmarks (to
> make sure they at least compile), but not run them, since doing serious
> benchmark work requires controlling the hardware you are running on,
> i.e. controlling what else is running at the same time, etc., so not a
> good use of a containerized environment, I'd expect.
> Similarly, I'd think that we should allow travis-ci, appveyor, etc. to
> build benchmarks, but also should give clear instructions to users for
> running benchmarks manually, including analyzing the results

Completely agree with you here.

>From Mateusz's email:

Now, I'm wondering this:
> if we maintain benchmark structure similar to tests e.g.
> - test/algo1.cpp, benchmark/algo1.cpp
> - test/algo2.cpp, benchmark/algo2.cpp
> could we define build config that allows to build
> 1) exe per .cpp - useful to bisect, find regressions
> 2) exe from multiple algorithms, related to the same problem of course,
> with one is used as baseline - comparative benchmarking

I am in favor of 1). It seems like both celero and google-benchmark support
running a subset of benchmarks, though celero seems to have only group
granularity, while google-benchmark supports regex. Using ublas's facility
will probably require implementing some command line driver, which could be
shared between projects.

Celero has this nice notion/feature of `baseline` against which other
> algorithms
> are compared. Here you can see a sample with `strncmp` as baseline:
>
> https://github.com/mloskot/string_benchmark/blob/master/benchmark_ends_with.cpp
> and table with results here
>
> https://github.com/mloskot/string_benchmark/blob/master/results/gcc63_ends_with.csv
> and pretty graphs here
> http://mloskot.github.io/string_benchmark/results/

Thanks for mentioning this, I almost forgot about it. I really hope to get
the baseline concept in. There is no notion of being faster or slower if
there is nothing to compare against.

>>I've wondered around boost libraries and it seems like ublas has a
> >>benchmarks folder,
> I'd prefer to call it `benchmark/`.
> There are `test/` folders, not `tests/`.

Agreed

>>4. Import all existing performance issues into that folder
> >
> >I'm not quite sure what you mean by that.
> Me neither.

I mentioned code to reproduce these:
https://github.com/boostorg/gil/issues/234 ,
https://github.com/boostorg/gil/issues/204 .

I believe this is everything I wanted to mention.

I'll push the new branch when I'll get google benchmark benchmarks to
compile and run correctly, as it is easiest one.

Best,
Olzhas


Boost list run by Boost-Gil-Owners