Subject: Re: [boost] Looking for some "real world" extended precision integer arithmetic tests
From: John Maddock (boost.regex_at_[hidden])
Date: 2012-03-22 05:28:30
>>From your benchmark I see that you didn't provide results for
> boost::multiprecision::fixed_int, but rather for boost::multiprecision::**
> backends::cpp_int_backend. I tried to update boost multiprecision branch,
> but boost/mutliprecision/backends directory appeared to be empty.
> Many thanks for the use case - in point of fact it's proved pretty hard to
>> get a general purpose solution anywhere near your special-purpose code -
>> simply because your code is so simple
> I wouldn't say so. The only simple thing about my extended_int
> implementation is that it implements 3 basic operations: addition,
> subtraction and multiplication. But I don't see any problems on extending
> it to support division and other arithmetic operations. My point is that
> for fixed integer class it's not a good idea to represent negative numbers
> as two's complement. The main point of my benchmark was to compare fixed
> int implementations. And those should not have much performance difference
> in construction of temporary variables.
Ah, you probably misunderstood what I meant.
Let me give you a concrete example: Your multiplication code is fast,
probably as fast as it can be in your particular use case, the limb counts
never grow too large, so there are actually very few integer ops executed
for a multiply.
Now it so happens that multiplying by an integer (or big number with a
single limb) can be optimised with it's own simpler routine, that's *much*
quicker (as in order of magnitude quicker) for larger limb counts. But....
hooking that code into cpp_int made your benchmark run slightly slower...
the cause is the extra check in there to see if the optimisation can be
taken! Basically, in most cases, your routines are called with small
numbers that execute so few integer ops, that seemingly trivial changes can
have quite detrimental effects on runtime.
There are other similar examples, but basically my case is that to make the
integer type general purpose there are actually lots of these special case
checks, each of which are there for a very good reason, but which
collectively easily account for that 10% slowdown in my code compared to
yours. In short there's no free lunch, what's an optimisation for X is
often a dis-optimisation for Y.
> I think what we are seeing in Andrii's use case may generalize across
>> quite a few different big number use cases.
> Exactly, the main point of usage of fixed big integer class is that it
> doesn't require you to think about temporaries creation (except
> stack, but that's another topic). I would say that for math computations
> is the best option in 90% of cases.
> It would be good for Boost to have both big int implementations: with
> allocated backend and heap allocated backend. I was able to contact
> Kapoulkine, who's big int implementation is under
> Arseniy agreed to work with Boost group to solve any problems that prevent
> the standardization. I will try to come up with benchmarks of the current
> implementation based on the Voronoi code. It would be very nice to combine
> that implementation with boost multiprecision.
Ah, I should have been clearer what I've done here: the old 2's complement
fixed_int class has gone. There is now "one true" C++ big-int
implementation that handles both arbitrary precision and fixed precision
cases using signed-magnitude representation. Again, another compromise:
sign magnitude is actually very slightly slower than the old 2's complement
code for some operations, but has enough other advantages to make it
worthwhile (so you sold me on that). It also has a template parameter that
lets you tune the size of the internal cache (the bit that's used before any
memory allocation takes place). Again this is a compromise: testing with my
Miller Rabin code (another very important use case), increasing the internal
cache size could actually make things worse if you happened to choose an
"unfortunate" value compared to the size of the int's being tested.
However, increasing some more so that no memory allocation ever happens then
drops runtime down dramatically.
Basically what I'm saying here, is there is no one true configuration, that
offers the very best performance in all cases, if there was, we'd probably
all be using GMP ;-)
PS, the next step is to start stealing some more special case code from the
SOC project.... I fully expect that to make your benchmarks run very
slightly more slowly, but is necessary for other use cases....