Boost logo

Boost :

Subject: Re: [boost] Name and Namespace forPotential BoostExtended Floating-Point
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2011-09-02 14:34:40


John Maddock wrote:
>> Gmp mpq_class expressions allocate for temporaries in compound
>> expressions.
>
> Nod, one thing I've worked hard on is eliminating temporaries inside
> expressions whenever possible, so if you write say:
>
> a = b + c + d + e + f;
>
> Then you get no temporaries at all - everthing gets accumulated inside
> variable a.
>
> However, something like:
>
> a = b*c + d*e;
>
> does require one temporary for *one* of the multiplications (the
> other uses "a" as working space).

I really like this, and I think it may be about the best we can do. If you could somehow abstract out the compile time logic of doing this with an expression template system from your application of it to bignum then it could become a boost library extension that might find a home in proto (assuming you use proto). I'd imagine Joel F. would like to discuss this more with you. You've already pushed the expression templates about as far as they can go to accomplish this, and farther than I ever have.

> One thing I've thought of is walking the expression tree at compile
> time to calculate the largest number of temporaries in the longest
> "branch", then creating these as an array at the outermost level of
> the expression evaluation and passing them down, so that:
>
> a = b*c + b*d + b*e + b*f;
>
> still only needs one temporary - which gets reused for each branch.
>
> This needs some more thought though - to make sure I don't shoot
> myself in the foot and reuse something that shouldn't be.

You could cache the array of temporaries in "a" for reuse when "a" is reused.

bignum a, b, c, d, e, f;
for (int i = 0; i < 1000; ++i) {
 b = B[i];
 c = C[i];
 d = D[i];
 e = E[i];
 f = F[i];
 a = b*c + b*d + b*e + b*f;
 foo(a);
}

Allocates a,b,c,d,e,f and two temporaries for a total of 8 allocations instead of 6006 with gmpq_class.

The only problem is that it is more natural to decare a-f inside the loop or just write:

for (int i = 0; i < 1000; ++i) {
 foo( b[i]*c[i] + b[i]*d[i] + b[i]*e[i] + b[i]*f[i]);
}

We could make the array mutable and cache it in any of the arguments of the expression, but that wouldn't help in this case, and wouldn't help if they were all rvalues.

for (int i = 0; i < 1000; ++i) {
 foo( b(i)*c(i) + b(i)*d(i) + b(i)*e(i) + b(i)*f(i));
}

If we try to fall back to a custom allocator we have all the problems with thread safety and have only really accomplished improving on normal heap allocation by applying a pool. The problem with creating an areana of gmpq_class objects is that what we really want is to cache the object that has the right *size* buffer internally to store the result of the operation. In general that is different from one temporary to the next in the code. If you use an areana you get an aribtrary object and you end up growing all their buffers to store the largest temporary, which is a lot more allocations and a lot more memory consumed. Also, with the areana you can't release the memory without putting that burden on the library user.

In the end there are problems with any approach that goes beyond what you've already done and I'm not entirely sure what our best option is. I'm afraid we can never abstract away the management of memory from the user of the code and give them something that can be as efficient as they could be if they managed their allocations explictly.

Regards,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk