Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [Multiprecision] Benchmarking
From: John Maddock (boost.regex_at_[hidden])
Date: 2012-06-15 05:44:10

Next message: John Maddock: "Re: [boost] [Multiprecision] Benchmarking"
Previous message: Vicente Botet: "Re: [boost] [Multiprecision] Benchmarking"
In reply to: Dave Abrahams: "Re: [boost] [Multiprecision] Benchmarking"
Next in thread: Dave Abrahams: "Re: [boost] [Multiprecision] Benchmarking"
Reply: Dave Abrahams: "Re: [boost] [Multiprecision] Benchmarking"

>> For sure - expression templates are about eliminating temporaries - if
>> the temporaries are cheap (cpp_int containing small values and no
>> dynamic memory allocation) then there's no benefit and probably a hit
>> from bringing in all that machinary. That's why expression templates
>> are disabled for the fixed size cpp_int typedefs.
>
> I'd like to propose an alternate viewpoint: in the context of numerics,
> expression templates are about delaying evaluation to gather more
> context, so you can perform the computation more efficiently. From that
> point-of-view there's absolutely no reason that they should ever make
> code slower. Just generate the appropriate code for the types and
> expressions involved.

Sigh... I hear what you're saying but it's not that simple.

Consider cpp_int: it uses something akin to the "small string optimization"
for small numbers, and then only switches to dynamic allocation for larger
values.

Now consider a simple expression:

a = b * c * d;

The optimal evaluation strategy depends upon the size of the values in b, c
and d: for small values that don't require memory allocation, temporaries,
and copying is cheap, so it might be best to do:

t = b * c;
a = t * d;

But for larger values, the cost of memory allocation and copying starts to
dominate so:

a = b * c;
a *= d;

Might be best (though it depends a bit on how multiplication is
implemented).

For larger values still, the cost of the multiplication is so expensive,
that all methods are the same.

So you can't choose between these at compile time. You could clearly add
some backend-specific runtime logic to choose between them, but that hits
another issue:

In the benchmark in question, we're looking at arithmetic on small values
(mostly integer sized) that just happen to be stored in a cpp_int. It's an
important use case, and cpp_int is many many times more efficient than gmp
thanks to not having to allocate memory for small values, but it's slower
than "naive" simple code. The problem is that the cost of performing the
arithmetic is as cheap as it could possibly be algorithmically speaking, so
for cpp_int, the cost is completely dominated by other runtime logic used to
select the best method. Let me give you two concrete examples so you can
tell me how to avoid the them :-)

1) When multiplying in the cpp_int backend, if one value is a single limb
long then we can optimize the multiplication routine by dispatching to the
special "multiply by an int" code. When the other value is medium sized,
it's a big win (2 or 3 times improvement), but for the trivial case that
often occurs in computational geometry, the cost of the extra if statement
and function call can easily add a 20% hit or more. One thing I could
experiment with is overloads for the multiplication/addition of small fixed
sized integer types - that might help mp_int128_t, but not variable sized
types like cpp_int.
2) In the expression template unpacking and evaluation, one crucial piece of
information is "does the LHS also appear on the RHS". Sometime if it does
we can optimize, for example "a = a + b" gets transformed into "a += b",
other times if the LHS does not appear in the RHS we can trash the LHS value
and use it as working space, saving a temporary. But... it's a runtime
calculation to figure that out, it messes up inlining as well as taking a
small amount of time. For non-trivial values which require memory
allocation etc, it makes not one jot of difference, but in the case in
question, it completely dominates the runtime compared to what is in effect,
just an int*int or whatever. Just curious, but can constexp partially solve
this at all? We would need a way to get the address of the variables in the
expression into the template arg list so that questions like this could be
asked at compile time. Even then I believe that getting the cost of
expression template unpacking down to zero is likely to be a fools errand.

John.

Next message: John Maddock: "Re: [boost] [Multiprecision] Benchmarking"
Previous message: Vicente Botet: "Re: [boost] [Multiprecision] Benchmarking"
In reply to: Dave Abrahams: "Re: [boost] [Multiprecision] Benchmarking"
Next in thread: Dave Abrahams: "Re: [boost] [Multiprecision] Benchmarking"
Reply: Dave Abrahams: "Re: [boost] [Multiprecision] Benchmarking"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk