Boost logo

Boost :

Subject: Re: Streamlining benchmarking process
From: Mateusz Loskot (mateusz_at_[hidden])
Date: 2019-05-09 22:11:08


On 19-05-09 17:32:53, stefan wrote:
>On 2019-05-09 5:11 p.m., Mateusz Loskot wrote:
>>Now, I'm wondering this:
>>if we maintain benchmark structure similar to tests e.g.
>>- test/algo1.cpp, benchmark/algo1.cpp
>>- test/algo2.cpp, benchmark/algo2.cpp
>>could we define build config that allows to build
>>1) exe per .cpp - useful to bisect, find regressions
>>2) exe from multiple algorithms, related to the same problem of course,
>>  with one is used as baseline - comparative benchmarking
>
>I'm in favor of 1). Whether that's truly one algorithm or not,
>however, depends on whether the same code is parametrized to yield
>multiple implementations.

My (loose) idea was more like this:
algo1.cpp is strstr()
algo2.cpp is std::string::find()
Each algo[1-2].cpp solves the same problem but algorithm.

1) build to run just benchmark of strstr() (parmeterised for problem size)
2) build to run both together with e.g. strstr() as baseline - yes, I'm a bit
   biased by Celero framework here.

We want to compare apples to apples, obviously.

Again, it was loose brainstorm. Doesn't have to be important for now about
how we aggregate benchmark implementations.

>In
>https://github.com/boostorg/ublas/blob/develop/benchmarks/mm_prod.cpp
>you can see how a single compilation unit / executable can represent
>the same algorithm for multiple (template) parameters - value-types,
>or in our case, pixel types or other compile-time parameters.

I see. However, it's a bit different to what I had in mind.
I'm more fond of benchmark cases 'generator' similar to this:

    BOOST_AUTO_TEST_CASE_TEMPLATE(test_case_name,
        formal_type_parameter_name, collection_of_types);

https://www.boost.org/doc/libs/1_70_0/libs/test/doc/html/boost_test/tests_organization/test_cases/test_organization_templates.html#ref_BOOST_AUTO_TEST_CASE_TEMPLATE

Unfortunately, google/benchmark does not offer that.
It should not be aproblem to copy BOOST_AUTO_TEST_CASE_TEMPLATE technique :-)

>Even if we combine multiple algorithms into a single executable, I
>think an important requirement is to be able to perform benchmark runs
>on algorithms individually, as that allows to generate comparison
>charts freely. And it scales better as we add more variants.

I see convenience of building benchmark of all threshold algorithms we've got
as single executable, and producing the CSV table I can stick to the repo
for reference, and view it like this:
https://github.com/mloskot/string_benchmark/blob/master/results/gcc63_starts_with.csv
Having such dataset of results, producing pretty graphs can be done easily.

Having to run individually for experiment parameters, like:

    $ benchmark_algo --pixel-type [uint8, uint16, ...]

is prone to human-error, producing incomplete results.

If something is compile-time configurable, let's keep it as the code (and
verion it!).
Then, a built executable becomes a self-contained pre-configurable
benchmark that can be re-run many times, and without any run-time input.

Best regards,

-- 
Mateusz Loskot, http://mateusz.loskot.net
Fingerprint=C081 EA1B 4AFB 7C19 38BA  9C88 928D 7C2A BB2A C1F2

Boost list run by Boost-Gil-Owners