Boost logo

Boost :

Subject: [boost] [xint] Performance of fixed-size large integers
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-03-03 14:04:44


Dear All,

I've never needed to use arbitrary-precision integers, but I have
sometimes wanted large fixed-size integers like uint128_t, and so I've
had a quick look at that aspect of the proposed Xint.

The docs' front page has a link under "Detailed Usage Information" to
"Fixed-Length Integers" - which is not really very detailed, and has
examples in the notes like "integer_t<8>" that don't work. I
established quickly enough that the correct syntax is
integer_t<fixedlength<8>>. Also I noted that the conventions here
don't match the built-in (u)intN_t types, i.e. int64_t vs. int63_t and
the "wrap-around" behaviour. Perhaps there is some rationale for this,
but I would be happier to see it consistent with the built-in types.
(It might also be appropriate to think about how this could work with
the existing Boost.Integer.)

I have done a quick test of the performance of a 96-bit type as follows:

   typedef boost::xint::integer_t< boost::xint::options::fixedlength<96>
> uint96_t;

   uint96_t a = 0;
   uint96_t b = 0;
   for (int i=1; i<loops; ++i) {
     a = a + a + (a & b) + i;
     b = b + i;
   }

This is an entirely contrived example, but the point is to minimise the
work that I have to do to create something to compare it with; by using
only operator+ and operator&, I can quickly hack together this:

struct uint96_t {
   uint32_t top;
   uint32_t mid;
   uint32_t bottom;

   uint96_t() {}
   uint96_t(uint32_t val): top(0), mid(0), bottom(val) {}
};

uint96_t operator&(uint96_t a, uint96_t b)
{
   uint96_t r;
   r.top = a.top & b.top;
   r.mid = a.mid & b.mid;
   r.bottom = a.bottom & b.bottom;
   return r;
}

uint96_t operator+(uint96_t a, uint96_t b)
{
   uint96_t r;
   __asm__(
     "adds %0, %3, %6\n" // Add and set carry flag
     "adcs %1, %4, %7\n" // Add with carry-in and set carry flag
     "adc %2, %5, %8\n" // Add with carry-in.
   : "=r" (r.bottom), // %0
     "=r" (r.mid), // %1
     "=r" (r.top) // %2
   : "r" (a.bottom), // %3
     "r" (a.mid), // %4
     "r" (a.top), // %5
     "r" (b.bottom), // %6
     "r" (b.mid), // %7
     "r" (b.top) // %8
   : "cc"
   );
   return r;
}

It seems to me that that is several hundred times faster than the Xint
code (on my 1.2 GHz ARM test system). Yes, it requires assembler in
order to do multi-word addition with carry, but we can write a less
efficient portable version of that:

uint96_t operator+(uint96_t a, uint96_t b)
{
   uint96_t r;
   r.bottom = a.bottom + b.bottom;
   int carry0 = r.bottom < a.bottom ? 1 : 0;
   r.mid = a.mid + b.mid + carry0;
   int carry1 = r.mid < a.mid ? 1 : 0; // hmm, not totally sure about that...
   r.top = a.top + b.top + carry1;
   return r;
}

This is about half the speed of the assembler, but still hundreds of
times faster than the Xint code.

Based on that, I think the claim that the library is "fast" on its
front page needs substantiating. I encourage anyone with experience of
arbitrary-precision variable-length numbers to run some quick
benchmarks. (Or, maybe I'm doing something horribly wrong here. I
haven't even checked that I get the right answer.)

Other reviews have reported that in this fixed-length case the library
still stores some overhead in addition to the actual data. That's
unfortunate as I would like to be able to, for example, store values in
a memory-mapped file, or to use them in e.g. Beman's proposed b-tree library.

I don't want to write a review that says "This library doesn't do what
I want, therefore it should be rejected". No doubt others will find
applications where it will be useful. But I think the performance when
used for fixed-size integers should be investigated.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk