Boost logo

Boost :

Subject: Re: [boost] Large Integer Library
From: John Maddock (boost.regex_at_[hidden])
Date: 2012-06-30 14:06:37


Michael,

While I think there is a place for a fixed size integer library in Boost
that's "as simple as possible" (on the grounds that for smaller fixed sizes,
simple is often faster than more complex schemes, even if the complex
schemes have better big-O performance), this method of implementation does
appear to be quite slow, though I confess I've only done a cursory
comparison.

So for comparison, I measured how long it took to compute 100 dot products
of 1000-element vectors. The vectors were filled with values that "used" a
specified number of bits (to simulate the typical use case were most values
are quite small, and just a few actually require an extended precision
type), here's the results compiled on Win32 with VC10 and /Ox:

Multiprecision mp_int128:
32 bit time = 0.0100556 seconds
64 bit time = 0.02594 seconds
96 bit time = 0.0229292 seconds
128 bit time = 0.0142151 seconds
Multiprecision mp_int256:
32 bit time = 0.00376926 seconds
64 bit time = 0.0102474 seconds
96 bit time = 0.01383 seconds
128 bit time = 0.0192363 seconds

large_int<unsigned long long, long long>:
32 bit time = 0.0498975 seconds
64 bit time = 0.111413 seconds
96 bit time = 0.183727 seconds
128 bit time = 0.251469 seconds
large_int<large_int<unsigned long long, unsigned long long>,
large_int<unsigned long long, long long> >:
32 bit time = 1.08451 seconds
64 bit time = 2.43866 seconds
96 bit time = 3.79658 seconds
128 bit time = 5.20211 seconds

Note how the times taken for mp_intXXX grow largely based on bits actually
used, not bits specified in the type. But also, even the case where I
expected large_int to do well - in fact where it really should win in this
kind of comparison - for part filled 128-bit int's, it's still over 4 times
slower than the multiprecision code :-(

My guess is this is a consequence of the implementation method (as a
doubled-type rather than an array of limbs)? If so that's actually quite
useful information to know.

Regards, John

PS I see you're using bit by bit multiplication - easy to understand and
code, but it's glacially slow in general, I tried this for integer division
early on and it ran like treacle (on a very cold day)!

PPS code follows:

#include <boost/chrono.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/large_int.hpp>

template <class Clock>
struct stopwatch
{
   typedef typename Clock::duration duration;
   stopwatch()
   {
      m_start = Clock::now();
   }
   duration elapsed()
   {
      return Clock::now() - m_start;
   }
   void reset()
   {
      m_start = Clock::now();
   }

private:
   typename Clock::time_point m_start;
};

template <class T>
void time()
{
   std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0);
   boost::chrono::duration<double> time;
   stopwatch<boost::chrono::high_resolution_clock> c;
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "32 bit time = " << time << std::endl;

   for(unsigned i = 0; i < v1.size(); ++i)
   {
      v1[i] <<= 32;
      v1[i] += 0x12345FF;
      v2[i] <<= 32;
      v2[i] += 0x12345F9;
   }
   c.reset();
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "64 bit time = " << time << std::endl;

   for(unsigned i = 0; i < v1.size(); ++i)
   {
      v1[i] <<= 32;
      v1[i] += 0x12345FF;
      v2[i] <<= 32;
      v2[i] += 0x12345F9;
   }
   c.reset();
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "96 bit time = " << time << std::endl;

   for(unsigned i = 0; i < v1.size(); ++i)
   {
      v1[i] <<= 32;
      v1[i] += 0x12345FF;
      v2[i] <<= 32;
      v2[i] += 0x12345F9;
   }
   c.reset();
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "128 bit time = " << time << std::endl;

}

int _tmain(int argc, _TCHAR* argv[])
{
   using namespace boost::multiprecision;

   time<mp_int128_t>();
   time<mp_int256_t>();

   typedef boost::large_int::large_int<unsigned long long, long long> t128;
   typedef boost::large_int::large_int<unsigned long long, unsigned long
long> ut128;
   typedef boost::large_int::large_int<ut128, t128> t256;

   time<t128>();
   time<t256>();

   return 0;
}


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk