Subject: Re: [boost] [boostcon12]Trouble with tuples very compiler dependent
From: Eric Niebler (eric_at_[hidden])
Date: 2012-11-19 13:11:26
On 11/19/2012 8:38 AM, Larry Evans wrote:
> On 11/19/12 10:19, Larry Evans wrote:
>> In contrast to the gcc4_8 compiler, with the clangxx compiler, the
>> relative qualitative performance, is just the opposite. IOW, the
>> bcon12_horizontal implementation is faster than the bcon12_vertical
>> implementation. In fact, the rate of change of the performance
>> difference accelerates as tree depth goes from 2 to 4. The rate of
>> change is so stark that it suggests, at least to me, there may be some
>> bug in clang. Of course that conclusion is based on almost no
>> knowledge, on my part, of the clang implementation.
>> The tuple_benchmark_filt.py can be modified to filter out other parts
>> of the benchmark run output, which is here:
> For example, when the filter criteria restricts
> TUPLE_UNROLL_MAX to 10 (the same as TUPLE_SIZE),
> then, with compiler=clangxx, bcon12_vertical
> performs relatively better than bcon12_horizontal
> as TREE_DEPTH increases, as shown in the attached.
All the measured times are below one second. Benchmarks become more
meaningful when the thing being measured takes more than a few seconds
to finish. If you don't mind, can you step up the limits? Once we have a
few data points in the tens of seconds and minutes, we'll have a better
idea of how the different compilers are performing.
Thanks for doing these. It's very interesting.
-- Eric Niebler BoostPro Computing http://www.boostpro.com