Subject: Re: [boost] [multiprecision]: 15% slowdown using std=c++11 compiler flag
From: Kim Walisch (kim.walisch_at_[hidden])
Date: 20171120 07:16:54
On Nov 19, 2017 14:54, "John Maddock via Boost" <boost_at_[hidden]>
wrote:
On 19/11/2017 10:32, Kim Walisch via Boost wrote:
> I have been investigating a 15% performance regression in my C++
> primesum program (https://github.com/kimwalisch/primesum/tree/256bit)
> over the last 2 days.
>
> By lots of benchmarking I was able to identify the boost
> multiprecision library together with std=c++11 (or std=gnu++11)
> as the culprit for the performance regression because I have also
> a version of the primesum program which does not use the boost
> multiprecision library and in this version there is no performance
> regression when compiling using std=c++11.
>
> I have tested using multiple versions of the boost multiprecision
> library including the latest 1.65.1. The slowdown happens on
> both GCC (versions: 5.4, 6.4, 7.2) and Clang (version 3.8) on
> x86_64 Linux. I am only using the int256_t and uint256_t types
> (hence cpp_int backend) and I am doing only simple integer
> arithmetic: +,  and *.
>
> Is this a known issue and is there a known workaround e.g. special
> compiler flag? I could revert to C++98 but I really don't want to do
> that...
>
No, not known, and if anything C++11 should speed things up by enabling
rvaluereferences etc. If you can narrow it down some more I'll certainly
investigate.
Thanks for the heads up, John.
 OK, I'll try to narrow it down. The simplest algorithm using the int256_t type in primesum is S2_trivial. You can have a look at the algorithm here: https://github.com/kimwalisch/primesum/blob/256bit/src/ delegliserivat/S2_trivial.cpp#L59 There are only 2 lines of code (6263) using the int256_t type in this algorithm: maxint_t diff = prime_sums[pi[y]]  prime_sums[pi[xn]]; s2_trivial += prime * diff; Note that maxint_t is a typedef for int256_t. The first line does an __int128_t substraction and converts the result (impliciltly) to int256_t. The second code line does an int256_t multiplication and adds the result to the int256_t s2_trivial variable. As soon as I add std=c++11 to the compiler flags the algorithm runs 15% slower (using Clang and GCC on Linux x86_64). Funnily, if I change the code lines to: maxint_t diff = prime_sums[pi[y]]  prime_sums[pi[xn]]; maxint_t prime2 = prime; diff *= prime2; s2_trivial += diff; This code runs already 11% faster using std=c++11 even though it does exactly the same (and only 4% slower than without std=c++11). Without std=c++11 this code does not run faster. My code mixes __int128_t with int256_t a lot and one of my guesses on what causes the slowdown is that the __int128_t to int256_t conversion has become much slower (in some cases) using std=c++11. Kim
