Boost logo

Boost :

Subject: Re: [boost] [Multiprecision] Benchmarking
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2012-06-14 12:24:52


Phil Endecott wrote:

>But we can't write this:
> int64_t a, b ...;
> int128_t r = a * b;
>because here a*b is a 64x64->64 multiply; instead we have to write something like
> r = static_cast<int128_t>(a)*b;
>so that our operator* overload is found; but this operator* doesn't know that its int128_t argument
>is actually a promoted int64_t, and it has to do an 128x64->128 or 128x128->128 multiply, requiring 8 or 16
>32x32->64 multiplies. The only chance of avoiding the extra work is if
>the compiler can do constant-propagation, but I have my doubts about that.
>Any thoughts?

If you used an expression template for the multiply you could dispatch r = a * b; to a three argument multiply template function from within the assignment operator that would inspect the precision of each addend and the sum types and choose the algorithm that does minimal work with minimal loss of precision. However, it would be a big effort and it wouldn't work if both arguments of the multiply were builtin types. I'm not suggesting it is a good idea, it is just the first thing I thought of. Calling multiply(r, a, b) directly might be a better idea.

This should really be done in the vector unit, and that will become even more true as better vector units come out. We need a standard mp specification so that the compiler/hardware vendors will provide library implementations that take advantage of existing and future hardware to accelerate their use. Boost can't be expected to provide an optimal multi-precision type that makes full use of the hardware, but we can be expected to come up with an interface that will help the standard committee specify a standard mp. The hardware vendors will happily compete to provide the highest possible performance implementation of the standard and the code that uses that hardware will be portable.

By the way, gmp is only slow because of the allocations. Try declaring all of the gmp_int variables instead of letting there be temporaries and then make them static so that they recycle their storage rather than going to the heap for every use. I don't know if it will end up being the fastest, but at least it will be a fair comparison of algorithms instead of measuring primarily allocation time.

Regards,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk