Boost logo

Boost :

Subject: Re: Streamlining benchmarking process
From: Stefan Seefeld (stefan_at_[hidden])
Date: 2019-05-09 22:39:58


On 2019-05-09 6:11 p.m., Mateusz Loskot wrote:
> On 19-05-09 17:32:53, stefan wrote:
>> On 2019-05-09 5:11 p.m., Mateusz Loskot wrote:
>>> Now, I'm wondering this:
>>> if we maintain benchmark structure similar to tests e.g.
>>> - test/algo1.cpp, benchmark/algo1.cpp
>>> - test/algo2.cpp, benchmark/algo2.cpp
>>> could we define build config that allows to build
>>> 1) exe per .cpp - useful to bisect, find regressions
>>> 2) exe from multiple algorithms, related to the same problem of course,
>>>   with one is used as baseline - comparative benchmarking
>>
>> I'm in favor of 1). Whether that's truly one algorithm or not,
>> however, depends on whether the same code is parametrized to yield
>> multiple implementations.
>
> My (loose) idea was more like this:
> algo1.cpp is strstr()
> algo2.cpp is std::string::find()
> Each algo[1-2].cpp solves the same problem but algorithm.
>
> 1) build to run just benchmark of strstr() (parmeterised for problem
> size)
> 2) build to run both together with e.g. strstr() as baseline - yes,
> I'm a bit
>   biased by Celero framework here.

My use-case / requirement was that it must be possible to run algorithms
independently. Of course, in the end, you may want to "aggregate"
results from multiple runs, to do some comparison or other analysis. But
that step is fully orthogonal to the run itself.

And even if you *want* both together, you can always write a little
script that performs multiple runs in a patch.

> We want to compare apples to apples, obviously.
>
> Again, it was loose brainstorm. Doesn't have to be important for now
> about
> how we aggregate benchmark implementations.

True, as long as the requirements are met. (I'm mentioning it because
before my re-implementation, Boost.uBLAS benchmarks would perform
multiple algorithm executions from the same executable (process), which
prevents me from recording results separately. This was one reason for
me not to keep the code as it was.

>
>> In
>> https://github.com/boostorg/ublas/blob/develop/benchmarks/mm_prod.cpp
>> you can see how a single compilation unit / executable can represent
>> the same algorithm for multiple (template) parameters - value-types,
>> or in our case, pixel types or other compile-time parameters.
>
> I see. However, it's a bit different to what I had in mind.
> I'm more fond of benchmark cases 'generator' similar to this:
>
>    BOOST_AUTO_TEST_CASE_TEMPLATE(test_case_name,
>        formal_type_parameter_name, collection_of_types);

That's an implementation detail, which just so happens to use some "data
driven test" idioms to express the parametrization. I'm not saying
that's necessarily a bad choice (in fact, it's quite similar to my code
above, I believe), as long as it lets you (I repeat myself, I know)
control via the CLI which parameter type you would like to execute the
benchmark with. (Running the whole set of types *might* work, but seems
to impose requirements  on how to store benchmark results. That is, you
loose the ability to store them independently, and later aggregate them
freely. The aggregation is already baked into the code structure.

>
> https://www.boost.org/doc/libs/1_70_0/libs/test/doc/html/boost_test/tests_organization/test_cases/test_organization_templates.html#ref_BOOST_AUTO_TEST_CASE_TEMPLATE
>
>
> Unfortunately, google/benchmark does not offer that.
> It should not be aproblem to copy BOOST_AUTO_TEST_CASE_TEMPLATE
> technique :-)
>
>> Even if we combine multiple algorithms into a single executable, I
>> think an important requirement is to be able to perform benchmark
>> runs on algorithms individually, as that allows to generate
>> comparison charts freely. And it scales better as we add more variants.
>
> I see convenience of building benchmark of all threshold algorithms
> we've got
> as single executable, and producing the CSV table I can stick to the repo
> for reference, and view it like this:
> https://github.com/mloskot/string_benchmark/blob/master/results/gcc63_starts_with.csv
>
> Having such dataset of results, producing pretty graphs can be done
> easily.
>
> Having to run individually for experiment parameters, like:
>
>    $ benchmark_algo --pixel-type [uint8, uint16, ...]
>
> is prone to human-error, producing incomplete results.

...and it complexifies the user interface, if we want to store results
per benchmark, not per set-of-benchmarks.

> If something is compile-time configurable, let's keep it as the code (and
> verion it!).
> Then, a built executable becomes a self-contained pre-configurable
> benchmark that can be re-run many times, and without any run-time input.

Sure.

Stefan

-- 
       ...ich hab' noch einen Koffer in Berlin...

Boost list run by Boost-Gil-Owners