
Boost : 
Subject: Re: [boost] [Multiprecision] Benchmarking
From: Jeffrey Lee Hellrung, Jr. (jeffrey.hellrung_at_[hidden])
Date: 20120614 13:12:12
On Thu, Jun 14, 2012 at 8:40 AM, Phil Endecott <
spam_from_boost_dev_at_[hidden]> wrote:
> Dear All,
>
> I have attempted to evaluate the proposed multiprecision library by
> applying it to the "Dealaunay Flip" function for 32bit integer
> coordinates, as this is one of the few things that I've ever implemented
> that needed more than 64bit integers:
>
> inline bool delaunay_test(int32_t ax, int32_t ay, int32_t bx, int32_t by,
> int32_t cx, int32_t cy, int32_t dx, int32_t dy)
> {
> // Test whether the quadrilateral ABCD's diagonal AC should be flipped to
> BD.
> // This is the Cline & Renka method.
> // Flip if the sum of the angles ABC and CDA is greater than 180 degrees.
> // Equivalently, flip if sin(ABC + CDA) < 0.
> // Trig identity: cos(ABC) * sin(CDA) + sin(ABC) * cos(CDA) < 0
> // We can use scalar and vector products to find sin and cos, and simplify
> // to the following code.
> // Numerical robustness is important. This code addresses it by
> performing
> // exact calculations with large integer types.
>
> i64 ax64 = ax, ay64 = ay, bx64 = bx, by64 = by,
> cx64 = cx, cy64 = cy, dx64 = dx, dy64 = dy;
>
> i64 cos_abc = (ax64bx64)*(cx64bx64) + (ay64by64)*(cy64by64);
> i64 cos_cda = (cx64dx64)*(ax64dx64) + (cy64dy64)*(ay64dy64);
> if (cos_abc >= 0 && cos_cda >= 0) return false;
> if (cos_abc < 0 && cos_cda < 0) return true;
>
> i64 sin_abc = (ax64bx64)*(cy64by64)  (cx64bx64)*(ay64by64);
> i64 sin_cda = (cx64dx64)*(ay64dy64)  (ax64dx64)*(cy64dy64);
>
> i128 sin_sum = static_cast<i128>(sin_abc)***cos_cda
> + static_cast<i128>(cos_abc)***sin_cda;
>
> return sin_sum < 0;
> }
>
>
> As you can see, with 32bit inputs, the first set of computations yield
> 64bit results and about half of calls terminate at this stage. Otherwise
> a calculation yielding a 128bit result is carried out. In practice, the
> results of the subtractions are typically small and the 128bit value
> rarely has more than 64 significant bits.
>
> I have compared the following types used for i64 and i128 in the code
> above:
>
> (1) int64_t and int64_t (which gives WRONG ANSWERS, but is useful for
> comparison)
> (2) int64_t and my own "quick and dirty" int128_t.
> (3) cpp_int.
> (4) int64_t and cpp_int.
> (5) cpp_int without expression templates.
> (6) int64_t and cpp_int without expression templates.
> (7) int64_t and mp_int128_t.
> (8) mpz_int.
> (9) int64_t and mpz_int.
>
> Mean times per function call are as follows:
>
> (1) 39.3 ns
> (2) 61.9 ns
> (3) 513 ns
> (4) 103 ns
> (5) 446 ns
> (6) 102 ns
> (7) 102 ns
> (8) 2500 ns
> (9) 680 ns
>
> (AMD Athlon(tm) II X2 250e Processor @ 3 GHz; g++ 4.6.1.)
>
> Note that:
>  GMP is slowest, despite oftrepeated claims about its good performance.
>
I think Luke made a good point about attempting to minimize allocations,
but nonetheless I suppose this might not be too surprising; I've been under
the impression that GMP is for *big* ints.
>  Expression templates make the code slower.
>
Hmmm...John: comments?
>  Even the fastest Boost.Multiprecision type, mp_int128_t, is
> significantly slower than my own quick hack.
>
> This is all a bit disappointing. Perhaps the problem is that I'm dealing
> only with 128 bits and the intended application is for much larger values?
>
I think so, and you probably identified a weak point of the library below.
> I am reminded of my experience with xint last year. See e.g. this thread:
>
> http://thread.gmane.org/gmane.**comp.lib.boost.devel/215565
>
> The current proposal seems much better  xint was hundreds of times slower
> than my adhoc code  but I still find it discouraging. I suspect that the
> performance difference may be attributable to the use of signmagnitude
> representation; as a result, the implementation is full of signtesting
> like this:
>
> template <unsigned MinBits, bool Signed, class Allocator>
> inline void eval_subtract(cpp_int_backend<**MinBits, Signed, Allocator>&
> result, const signed_limb_type& o)
> {
> if(o)
> {
> if(o < 0)
> eval_add(result, static_cast<limb_type>(o));
> else
> eval_subtract(result, static_cast<limb_type>(o));
> }
> }
>
> I think I understand the motivation for signmagnitude for variablelength
> representation, but I don't see any benefit in the case of e.g. mp_int128_t.
>
John, perhaps this is something that can be configurable?
> Any thoughts?
>
> My test code is here: http://chezphil.org/tmp/boost_**mp_review.tbz2
>
> $ make
> $ cat nn100000.dat  benchmark
>
> (Note that this needs a fix to the resize() issue reported earlier in
> order to get accurate answers, but that doesn't much alter the execution
> time.)
>
>
> Regards, Phil.
>
Thanks, Phil, for investigating the viability of using Multiprecision for
your application!
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk