Subject: Re: [boost] [convert] Performance
From: Vladimir Batov (Vladimir.Batov_at_[hidden])
Date: 2014-06-10 19:22:52
Joel, thank you for the pointers. Much appreciated. Black art, you say?
Any blacker-than-black adjectives available that I might use?
On 06/11/2014 09:03 AM, Joel de Guzman wrote:
> On 6/11/14, 4:55 AM, Andrey Semashev wrote:
>> On Wednesday 11 June 2014 06:46:53 Vladimir Batov wrote:
>>> And indeed BOOST_ASSERT seems to be heavier than BOOST_TEST due to
>>> expression-validity check done with
>>> __builtin_expect(expr, 1)
>> It's not a validity check, it's a hint to the compiler to help branch
>> prediction. Assertion failures are assumed to be improbable.
>> In any case, when testing performance you should be building in
>> release mode,
>> where all asserts are removed.
> Benchmarks are a black art. See how we do our performance tests in
> You can use our benchmark facility where all the black art is contained:
> using this strategy:
> // Strategy: because the sum in an accumulator after each call
> // depends on the previous value of the sum, the CPU's pipeline
> // might be stalled while waiting for the previous addition to
> // complete. Therefore, we allocate an array of accumulators,
> // and update them in sequence, so that there's no dependency
> // between adjacent addition operations.
> // Additionally, if there were only one accumulator, the
> // compiler or CPU might decide to update the value in a
> // register rather that writing it back to memory. we want each
> // operation to at least update the L1 cache. *** Note: This
> // concern is specific to the particular application at which
> // we're targeting the test. ***
> // This has to be at least as large as the number of
> // simultaneous accumulations that can be executing in the
> // compiler pipeline. A safe number here is larger than the
> // machine's maximum pipeline depth. If you want to test the L2
> // or L3 cache, or main memory, you can increase the size of
> // this array. 1024 is an upper limit on the pipeline depth of
> // current vector machines.
> A naive test implementation will give you *funny* results, depending
> on the machine you are running on.