
Boost : 
Subject: Re: [boost] [Multiprecision] Benchmarking
From: Dave Abrahams (dave_at_[hidden])
Date: 20120615 14:49:16
on Fri Jun 15 2012, John Maddock <boost.regexATvirgin.net> wrote:
>>> For sure  expression templates are about eliminating temporaries  if
>>> the temporaries are cheap (cpp_int containing small values and no
>>> dynamic memory allocation) then there's no benefit and probably a hit
>>> from bringing in all that machinary. That's why expression templates
>>> are disabled for the fixed size cpp_int typedefs.
>>
>> I'd like to propose an alternate viewpoint: in the context of numerics,
>> expression templates are about delaying evaluation to gather more
>> context, so you can perform the computation more efficiently. From that
>> pointofview there's absolutely no reason that they should ever make
>> code slower. Just generate the appropriate code for the types and
>> expressions involved.
>
> Sigh... I hear what you're saying but it's not that simple.
>
> Consider cpp_int: it uses something akin to the "small string
> optimization" for small numbers, and then only switches to dynamic
> allocation for larger values.
>
> Now consider a simple expression:
>
> a = b * c * d;
>
> The optimal evaluation strategy depends upon the size of the values in
> b, c and d: for small values that don't require memory allocation,
> temporaries, and copying is cheap, so it might be best to do:
>
> t = b * c;
> a = t * d;
>
> But for larger values, the cost of memory allocation and copying
> starts to dominate so:
>
> a = b * c;
> a *= d;
>
> Might be best (though it depends a bit on how multiplication is
> implemented).
>
> For larger values still, the cost of the multiplication is so
> expensive, that all methods are the same.
That last fact doesn't matter, then.
> So you can't choose between these at compile time.
Well you surely can. You can make a choice that is no worse than
whatever is being described as "not using expression templates." My
point is that there's no reason for anyone to say "using expression
templates slowed this down."
> You could clearly add some backendspecific runtime logic to choose
> between them, but that hits another issue:
>
> In the benchmark in question, we're looking at arithmetic on small
> values (mostly integer sized) that just happen to be stored in a
> cpp_int. It's an important use case, and cpp_int is many many times
> more efficient than gmp thanks to not having to allocate memory for
> small values, but it's slower than "naive" simple code. The problem
> is that the cost of performing the arithmetic is as cheap as it could
> possibly be algorithmically speaking, so for cpp_int, the cost is
> completely dominated by other runtime logic used to select the best
> method.
Tough sitch. For cases like that you need to ask the user for hints.
> Let me give you two concrete examples so you can tell me how to avoid
> the them :)
>
> 1) When multiplying in the cpp_int backend, if one value is a single
> limb long then we can optimize the multiplication routine by
> dispatching to the special "multiply by an int" code. When the other
> value is medium sized, it's a big win (2 or 3 times improvement), but
> for the trivial case that often occurs in computational geometry, the
> cost of the extra if statement and function call can easily add a 20%
> hit or more. One thing I could experiment with is overloads for the
> multiplication/addition of small fixed sized integer types  that
> might help mp_int128_t, but not variable sized types like cpp_int.
>
> 2) In the expression template unpacking and evaluation, one crucial
> piece of information is "does the LHS also appear on the RHS".
> Sometime if it does we can optimize, for example "a = a + b" gets
> transformed into "a += b", other times if the LHS does not appear in
> the RHS we can trash the LHS value and use it as working space, saving
> a temporary.
My favorite approach goes something like this:
a [noalias]+= b
> But... it's a runtime calculation to figure that out, it messes up
> inlining as well as taking a small amount of time.
The other thing you can do is to aggregate everlarger computations into
single expressions so that you have a hope of gathering enough context
together to do better optimizations.
> For nontrivial values which require memory allocation etc, it makes
> not one jot of difference, but in the case in question, it completely
> dominates the runtime compared to what is in effect, just an int*int
> or whatever. Just curious, but can constexp partially solve this at
> all? We would need a way to get the address of the variables in the
> expression into the template arg list
Forget it :(. The great downfall of constexpr is that every constexpr
function must be valid even when the arguments aren't constexprs.
 Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk