Boost logo

Boost :

Subject: Re: [boost] Looking for some "real world" extended precision integer arithmetic tests
From: John Maddock (boost.regex_at_[hidden])
Date: 2012-03-22 12:59:15


> There are other similar examples, but basically my case is that to make
> the
>> integer type general purpose there are actually lots of these special
>> case
>> checks, each of which are there for a very good reason, but which
>> collectively easily account for that 10% slowdown in my code compared to
>> yours.
>
> Well, as I noticed before it's not just 10%, because algorithm execution
> time should be subtracted. Nevertheless I completely understand that it is
> not possible to get as fast code as mine. I would consider generalized
> implementation that is 50% slower to be pretty good (but not 3 times
> slower
> as before). It seems that you manage to do that so accept my
> congratulations! I am willing to have a look at your implementation, is it
> already in sandbox?

Yes, there are currently problems with the random number code not compiling
with GCC, but apart from that it's all in pretty good shape (I hope!), docs
here:
http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/html/boost_multiprecision/tut/ints.html

> Again, another compromise: sign magnitude is actually very slightly slower
>> than the old 2's complement code for some operations, but has enough
>> other
>> advantages to make it worthwhile (so you sold me on that)
>
> Could you list those? I am willing to test :-)

Well the main one is that it easier to code, and in particular to only track
the limbs that are actually used, which proved too brain twisting for me
with a 2's complement representation ;-)

> It also has a template parameter that lets you tune the size of the
>> internal cache (the bit that's used before any memory allocation takes
>> place).
>
> I am not sure what is internal cache here. Need to have a look at your
> implementation. Did you implement two separate backends: stack allocated,
> heap allocated? Or did you come up with generalized implementation that
> uses fixed array for small numbers and switches to dynamic array if
> exceeds
> its capacity?

No, just one implementation, but it implements the equivalent of the
small-string-optimization - which is to say the header for the dynamicly
allocated buffer is in a union with a small internal buffer. The internal
buffer is used first, then if the number grows too large a dynamic buffer is
used. The template parameter controls the size of this internal cache - it
defaults to the same size as the dynamic buffer header, but can be made
larger if that helps.

Finally, if the allocator is void, then no dynamic allocation ever takes
place, and we have a fixed precision type, some of the core code gets
simplified in this case as well.

Otherwise, an unbounded integer with a large cache provides almost the best
of both worlds - very nearly as fast as the fixed precision version, but no
need to worry about overflow.

> Basically what I'm saying here, is there is no one true configuration,
>> that offers the very best performance in all cases, if there was, we'd
>> probably all be using GMP ;-)
>
> Completely agree. But from advanced user perspective it's good to be able
> to specify configuration (stack or heap allocation, limb size, etc). For
> example it is possible to rewrite all the Voronoi code using GMP almost
> without any performance lose. But this will require to split huge math
> expressions onto atomic formulas, thus reducing code readability
> and maintainability. That's the main reason I decided to switch to fixed
> bigint.
>
> PS, the next step is to start stealing some more special case code from
> the
>> SOC project.... I fully expect that to make your benchmarks run very
>> slightly more slowly, but is necessary for other use cases...
>
> I'd like to have a look at your backend implementation. It seems to be
> already well done from your Voronoi benchmark. So it will be good to move
> parts of that SOC projects to your current implementation. One of the
> useful things we can think of is FFT multiplication (which is fast for
> large numbers). For medium size numbers I would think of Karatsuba
> multiplication. I would suggest generalized multiplication that will
> combine those two plus usual quadratic multiplication for small numbers. I
> can help on testing those and would ask Arseniy if he's interested on
> migrating his FFT code. Does it sound good for you?

For sure, those are the two main routines that I want to pinch from his
code - and if Arseniy can help with that, that would be great - I wasn't
looking forward to having to get my head around those routines ;-)

There are some other specialized routines (for squaring for example) that I
also need to implement, as well as experiment with column-first
multiplication to see whether it gains anything....

Cheers, John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk