Boost :

Subject: Re: [boost] Large Integer Library
From: Michael Tryhorn (mjtryhorn_at_[hidden])
Date: 2012-06-30 15:47:59

Dear all,

Some responses to questions asked so far:

> That can't be right. Addition of two n-bit numbers requires
> processing all n bits and is therefore O(n). The fastest
> known algorithm for multiplying two n-bit numbers is slower
> than O(n).

I may be wrong, but my understanding was that at least for an
addition operation, while certainly implemented via adders in
hardware (and hence O(n)), we would not call it so in algorithms
that use it. Let me give an example. If I wrote a function
thus:

int add_these(int a, int b)
{
return a+b;
}

I would expect that rather than converting my '+' to a loop,
the compiler would turn it into assembly 'LOAD/ADD/STORE'. My
evaluation of the complexity of the above algorithm itself (if
we can call it such) would be to label it 'O(1)' rather than 'O(n)'.
For reference, let me quote the code from large_int::operator+= :

large_int<T, U>& large_int<T, U>::operator+= (const this_type& val)
{
// Take a copy of *this, to modify
this_type ret(*this);

// Perform the addition
ret.m_lo += val.m_lo;
ret.m_hi += val.m_hi;
if( ret.m_lo < m_lo )
{
++ret.m_hi;
}

// Commit the result
return( *this = ret );
}

Although a little more complex, additions should only ever perform a
single pass of this function. As for multiplication, the operator*=
implementation uses a for loop which loops at most 'n' times, where
'n' is the number of binary bits in the integer, making use of +=
(above) and the shift operators.

> Could the programmer be given access (through member functions) to the
> two members of a large_int?

Absolutely. In fact, I used to allow this and removed the functionality
to make large_int more 'integer like'. I wouldn't add this as member
functions (as this would be inconsistent with the built-in integer types).
I would, however, either place this functionality into large_int_traits
as accessors, in the boost::large_int namespace as free-functions, or both.

> The typedef for l[u]int128_t does not seem to use the builtin
> corresponding type when available, could this be done?

Again, yes, absolutely. Provided these types support all the required
operations as well as std::numeric_limits, this could be done via
preprocessor macros to detect the necessary compiler support (as it
already done where 64bit is unavailable).

> I had "undefined symbol" problems when using the builtin 128 bits
> integers (provided by gcc) in the signature of functions in a shared
> library (it was a module created with Boost.Python, the problem occurred
> when importing this module within python). The only solution I found was
> to wrap integer-like types inside a (templated) class, which looks like a
> particular case of your library (this would be "large_int<EMPTY_TYPE,
U>").
> Would LargeInt be an appropriate place to solve this problem?

I can't speak from experience on this in particular, however where I had a
need to make an existing integral type "slightly larger", I did it with
this little trick:

typedef boost::large_int::large_int<unsigned int, T> my_int_type;

Where 'T' is the type you'd be wrapping. This extends 'T' by combining
it with an unsigned low_part_type.

> Why use Boost.Operator for some operators but not all (e.g. comparison
> operators)?

It was to do with integral promotion, and I have to admit here that I made
a mistake in my earlier email where I stated that all arithmetic operators
were fine with mixing their lhs and rhs operator sizes. This is not the
case for arithmetic operators, but it is for comparison ('<', '<=', '==',
and so on). Where two large_int values which differ in size are compared,
the smaller of the two must be promoted to the same type as the larger so
the comparison may take place without losing integer information. If I
were to use Boost.Operator to define these operators automatically, I
don't think such promotion would be possible.

> I have not found through the doc what is the result of
> luint128_t +luint128_t
> luint128_t +luint256_t
> Could you point me where I can find these kind of information?

If you are referring to the arithmetic operator '+', luint128_t +luint128_t
would produce a new luint128_t value. Just like their 64bit equivalent,
the result can overflow. As for luint128_t + luint256_t, this is where I
follow on from the above answer and it gets a little tricky. As the 128bit
integer is the left-hand-side, I believe that boost::addable<T> will return
the left-hand-side type of the expression. It might be possible to overcome
this by *not* inherting from boost::integer_arithmetic and defining all
operators manually with some MPL-like type selection. This might be good
for a version 1.1! For the moment, if you *know* that a 256bit result is
needed, using operator+= with a 256bit large_int left-hand-side will most
certainly produce a 256bit result. Thanks for the input - I'll add notes
on this to the documentation.

(Regarding "Fixed at compile time values")
> Please note the "for example" part.
> Fixed precision integers are fixed to the number of bits in
> Boost.Multiprecision the same as yours are.

If you mean that the optional selection of an integer type by the number of
decimal digits required rather than the underlying type, then fair enough.
Is that what you mean? Boost.LargeInt types are always created by a number
of binary digits at the moment. Not being too familiar with
Boost.Multiprecision, could I please ask if it is possible to select an
integral type from a number of binary bits, and is there any restriction on
power-of-2 vs. non-power-of-2?

> "A header-only Boost license version is always available (if somewhat
slower)."
> So no difference then? ;-)

I may have misread your documentation, but I thought you said that some
functionality was dependent upon third party restricted license libraries,
but that 'a header-only Boost license version is always available
(if somewhat slower)'. I took this to mean that Boost.Multiprecision's
optimal performance was only available where the restricted licencing for
those additonal libraries was acceptable to the end user. Please correct
me if I misunderstood this point. Do you mean slower to compile? Or,
slower to run? Boost.LargeInt is actually pretty light-weight on headers;
it short in code, doesn't include many headers itself and even leaves out
I/O the headers where not needed.

> You can turn expression templates off in the multiprecision lib, in which
> case it behaves as yours does - in fact the expression templates are
turned
> off for the fixed precision integer part, as it adds nothing there.

Fair enough.

> Congratulations, the Fields medal is clearly on it's way!
> Seriously, you might want to think again on those complexities...

No need to be rude. My reasoning for the O(x) complexities is explained
above.

(Regarding comparison against literal values)
> Likewise the multiprecision lib (in fact promotion to the extended type
first
> isn't strictly necessary - you can do things more efficiently than that).

How could this be done more efficiently? Comparison against a literal in
Boost.LargeInt uses the large_int constructor, which is again O(1) (subject
to my understanding of complexity calculations :-P). To convert a 'long'
to a
'large_int' for these comparisons, it uses the above 'make it larger' type
trick to create a temporary - which is itself composed of just two integers,
allocated from the stack or registers as the compiler sees fit.

> While I think there is a place for a fixed size integer library in Boost
that's
> "as simple as possible" (on the grounds that for smaller fixed sizes,
simple
> is often faster than more complex schemes, even if the complex schemes
> have better big-O performance), this method of implementation does
> appear to be quite slow, though I confess I've only done a cursory
comparison.

I agree that some benchmarking of Boost.LargeInt against the header-only
equivalent of Boost.Multiprecision is certainly required, and I will take a
look at the results you have supplied. I am glad you think, as I do, that
there is a place for a fixed size integer library in Boost, and I am keen
that
it is as useful to developers as possible.

Thanks everyone for all the feedback so far,

Michael J. Tryhorn

On 30 June 2012 19:06, John Maddock <boost.regex_at_[hidden]> wrote:

> Michael,
>
> While I think there is a place for a fixed size integer library in Boost
> that's "as simple as possible" (on the grounds that for smaller fixed
> sizes, simple is often faster than more complex schemes, even if the
> complex schemes have better big-O performance), this method of
> implementation does appear to be quite slow, though I confess I've only
> done a cursory comparison.
>
> So for comparison, I measured how long it took to compute 100 dot products
> of 1000-element vectors. The vectors were filled with values that "used" a
> specified number of bits (to simulate the typical use case were most values
> are quite small, and just a few actually require an extended precision
> type), here's the results compiled on Win32 with VC10 and /Ox:
>
> Multiprecision mp_int128:
> 32 bit time = 0.0100556 seconds
> 64 bit time = 0.02594 seconds
> 96 bit time = 0.0229292 seconds
> 128 bit time = 0.0142151 seconds
> Multiprecision mp_int256:
> 32 bit time = 0.00376926 seconds
> 64 bit time = 0.0102474 seconds
> 96 bit time = 0.01383 seconds
> 128 bit time = 0.0192363 seconds
>
> large_int<unsigned long long, long long>:
> 32 bit time = 0.0498975 seconds
> 64 bit time = 0.111413 seconds
> 96 bit time = 0.183727 seconds
> 128 bit time = 0.251469 seconds
> large_int<large_int<unsigned long long, unsigned long long>,
> large_int<unsigned long long, long long> >:
> 32 bit time = 1.08451 seconds
> 64 bit time = 2.43866 seconds
> 96 bit time = 3.79658 seconds
> 128 bit time = 5.20211 seconds
>
> Note how the times taken for mp_intXXX grow largely based on bits actually
> used, not bits specified in the type. But also, even the case where I
> expected large_int to do well - in fact where it really should win in this
> kind of comparison - for part filled 128-bit int's, it's still over 4 times
> slower than the multiprecision code :-(
>
> My guess is this is a consequence of the implementation method (as a
> doubled-type rather than an array of limbs)? If so that's actually quite
> useful information to know.
>
> Regards, John
>
> PS I see you're using bit by bit multiplication - easy to understand and
> code, but it's glacially slow in general, I tried this for integer division
> early on and it ran like treacle (on a very cold day)!
>
> PPS code follows:
>
> #include <boost/chrono.hpp>
> #include <boost/multiprecision/cpp_int.**hpp>
> #include <boost/large_int.hpp>
>
> template <class Clock>
> struct stopwatch
> {
> typedef typename Clock::duration duration;
> stopwatch()
> {
> m_start = Clock::now();
> }
> duration elapsed()
> {
> return Clock::now() - m_start;
> }
> void reset()
> {
> m_start = Clock::now();
> }
>
> private:
> typename Clock::time_point m_start;
> };
>
> template <class T>
> void time()
> {
> std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0);
> boost::chrono::duration<**double> time;
> stopwatch<boost::chrono::high_**resolution_clock> c;
> for(unsigned i = 0; i < 100; ++i)
> {
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v3[i] = v1[i] * v2[i];
> }
> }
> time = c.elapsed();
> std::cout << "32 bit time = " << time << std::endl;
>
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v1[i] <<= 32;
> v1[i] += 0x12345FF;
> v2[i] <<= 32;
> v2[i] += 0x12345F9;
> }
> c.reset();
> for(unsigned i = 0; i < 100; ++i)
> {
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v3[i] = v1[i] * v2[i];
> }
> }
> time = c.elapsed();
> std::cout << "64 bit time = " << time << std::endl;
>
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v1[i] <<= 32;
> v1[i] += 0x12345FF;
> v2[i] <<= 32;
> v2[i] += 0x12345F9;
> }
> c.reset();
> for(unsigned i = 0; i < 100; ++i)
> {
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v3[i] = v1[i] * v2[i];
> }
> }
> time = c.elapsed();
> std::cout << "96 bit time = " << time << std::endl;
>
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v1[i] <<= 32;
> v1[i] += 0x12345FF;
> v2[i] <<= 32;
> v2[i] += 0x12345F9;
> }
> c.reset();
> for(unsigned i = 0; i < 100; ++i)
> {
> for(unsigned i = 0; i < v1.size(); ++i)
> {
> v3[i] = v1[i] * v2[i];
> }
> }
> time = c.elapsed();
> std::cout << "128 bit time = " << time << std::endl;
>
> }
>
> int _tmain(int argc, _TCHAR* argv[])
> {
> using namespace boost::multiprecision;
>
> time<mp_int128_t>();
> time<mp_int256_t>();
>
>
> typedef boost::large_int::large_int<**unsigned long long, long long>
> t128;
> typedef boost::large_int::large_int<**unsigned long long, unsigned long
> long> ut128;
> typedef boost::large_int::large_int<**ut128, t128> t256;
>
> time<t128>();
> time<t256>();
>
> return 0;
>
> }
>
>
>
> ______________________________**_________________
> Unsubscribe & other changes: http://lists.boost.org/**
> mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>
>