Subject: Re: [boost] [multiprecision]: 15% slowdown using -std=c++11 compiler flag
From: Kim Walisch (kim.walisch_at_[hidden])
Date: 2017-11-20 07:16:54
On Nov 19, 2017 14:54, "John Maddock via Boost" <boost_at_[hidden]>
On 19/11/2017 10:32, Kim Walisch via Boost wrote:
> I have been investigating a 15% performance regression in my C++
> primesum program (https://github.com/kimwalisch/primesum/tree/256-bit)
> over the last 2 days.
> By lots of benchmarking I was able to identify the boost
> multiprecision library together with -std=c++11 (or -std=gnu++11)
> as the culprit for the performance regression because I have also
> a version of the primesum program which does not use the boost
> multiprecision library and in this version there is no performance
> regression when compiling using -std=c++11.
> I have tested using multiple versions of the boost multiprecision
> library including the latest 1.65.1. The slowdown happens on
> both GCC (versions: 5.4, 6.4, 7.2) and Clang (version 3.8) on
> x86_64 Linux. I am only using the int256_t and uint256_t types
> (hence cpp_int backend) and I am doing only simple integer
> arithmetic: +, - and *.
> Is this a known issue and is there a known workaround e.g. special
> compiler flag? I could revert to C++98 but I really don't want to do
No, not known, and if anything C++11 should speed things up by enabling
rvalue-references etc. If you can narrow it down some more I'll certainly
Thanks for the heads up, John.
--- This email has been checked for viruses by AVG. http://www.avg.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost OK, I'll try to narrow it down. The simplest algorithm using the int256_t type in primesum is S2_trivial. You can have a look at the algorithm here: https://github.com/kimwalisch/primesum/blob/256-bit/src/ deleglise-rivat/S2_trivial.cpp#L59 There are only 2 lines of code (62-63) using the int256_t type in this algorithm: maxint_t diff = prime_sums[pi[y]] - prime_sums[pi[xn]]; s2_trivial += prime * diff; Note that maxint_t is a typedef for int256_t. The first line does an __int128_t substraction and converts the result (impliciltly) to int256_t. The second code line does an int256_t multiplication and adds the result to the int256_t s2_trivial variable. As soon as I add -std=c++11 to the compiler flags the algorithm runs 15% slower (using Clang and GCC on Linux x86_64). Funnily, if I change the code lines to: maxint_t diff = prime_sums[pi[y]] - prime_sums[pi[xn]]; maxint_t prime2 = prime; diff *= prime2; s2_trivial += diff; This code runs already 11% faster using -std=c++11 even though it does exactly the same (and only 4% slower than without -std=c++11). Without -std=c++11 this code does not run faster. My code mixes __int128_t with int256_t a lot and one of my guesses on what causes the slowdown is that the __int128_t to int256_t conversion has become much slower (in some cases) using -std=c++11. Kim
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk