Boost logo

Boost :

Subject: Re: [boost] [multiprecision]: 15% slowdown using -std=c++11 compiler flag
From: Kim Walisch (kim.walisch_at_[hidden])
Date: 2017-11-20 07:16:54


On Nov 19, 2017 14:54, "John Maddock via Boost" <boost_at_[hidden]>
wrote:

On 19/11/2017 10:32, Kim Walisch via Boost wrote:

> I have been investigating a 15% performance regression in my C++
> primesum program (https://github.com/kimwalisch/primesum/tree/256-bit)
> over the last 2 days.
>
> By lots of benchmarking I was able to identify the boost
> multiprecision library together with -std=c++11 (or -std=gnu++11)
> as the culprit for the performance regression because I have also
> a version of the primesum program which does not use the boost
> multiprecision library and in this version there is no performance
> regression when compiling using -std=c++11.
>
> I have tested using multiple versions of the boost multiprecision
> library including the latest 1.65.1. The slowdown happens on
> both GCC (versions: 5.4, 6.4, 7.2) and Clang (version 3.8) on
> x86_64 Linux. I am only using the int256_t and uint256_t types
> (hence cpp_int backend) and I am doing only simple integer
> arithmetic: +, - and *.
>
> Is this a known issue and is there a known workaround e.g. special
> compiler flag? I could revert to C++98 but I really don't want to do
> that...
>

No, not known, and if anything C++11 should speed things up by enabling
rvalue-references etc. If you can narrow it down some more I'll certainly
investigate.

Thanks for the heads up, John.

---
This email has been checked for viruses by AVG.
http://www.avg.com
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman
/listinfo.cgi/boost
OK, I'll try to narrow it down. The simplest algorithm using the
int256_t type in primesum is S2_trivial. You can have a look at
the algorithm here:
https://github.com/kimwalisch/primesum/blob/256-bit/src/
deleglise-rivat/S2_trivial.cpp#L59
There are only 2 lines of code (62-63) using the int256_t type in
this algorithm:
maxint_t diff = prime_sums[pi[y]] - prime_sums[pi[xn]];
s2_trivial += prime * diff;
Note that maxint_t is a typedef for int256_t. The first line does an
__int128_t substraction and converts the result (impliciltly) to int256_t.
The second code line does an int256_t multiplication and adds the
result to the int256_t s2_trivial variable.
As soon as I add -std=c++11 to the compiler flags the algorithm runs
15% slower (using Clang and GCC on Linux x86_64). Funnily, if I
change the code lines to:
maxint_t diff = prime_sums[pi[y]] - prime_sums[pi[xn]];
maxint_t prime2 = prime;
diff *= prime2;
s2_trivial += diff;
This code runs already 11% faster using -std=c++11 even though it does
exactly the same (and only 4% slower than without -std=c++11).
Without -std=c++11 this code does not run faster. My code mixes
__int128_t with int256_t a lot and one of my guesses on what causes
the slowdown is that the __int128_t to int256_t conversion has become
much slower (in some cases) using -std=c++11.
Kim

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk