|
Boost : |
From: David Abrahams (david.abrahams_at_[hidden])
Date: 2002-06-30 20:57:29
From: "Michael Stevens" <support-1_at_[hidden]>
> >
> >
> >From: "Benedikt Weber" <weber_at_[hidden]>
> >
> >Joerg
> >
> >>> It's a proven (but not well known ;-) technique to reintroduce
> >>> temporaries at the right places into the expression, i.e. writing A
*vector
> >>
> >>
> >>> (B * v).
> >>
> >>
> >I am glad there is at least a possibility to force the creation of a
> >temporary. However, the user should not have to be concerned about such
> >details. I never did anything with expression templates and I only
recently
> >read about them in the context of the Spirit parser. However, I think it
> >should be possible to change the design so a temporary is automatically
> >introduced where appropriate (say for the prod() functions). I urge you
to
> >think about a possibility to change that. It would be a real benefit for
the
> >library.
> >
> Benedikt's discussion regarding the evaluation complexities is very
> thought provoking. However automatically introducing temporaries would
> be a VERY bad thing.
Let's look a little more closely at his cases. If I remember my linear
algebra:
1. x = A*(B*v)
Assuming v is a column vector here, B*v and A*(B*v) are also column vectors
of the same dimension as v.
say c = B*v, then
x[0] = A[0,0]*c[0] + A[0,1]*c[1] + A[0,2]*c[2]... + A[0,N]*c[N]
x[1] = A[1,0]*c[0] + A[1,1]*c[1] + A[1,2]*c[2]... + A[1,N]*c[N]
Can we avoid creating the temporary vector c?
Yes, we can create a zero vector for the result, then compute the elements
of c one-by-one (just B[i] dot v), and multiply them by the corresponding
column of A, accumulating the result into x. However, the cost is that we
make N passes over x, reading old values and writing new ones. If we create
a temporary for c, we make multiple passes over c, but we only read. Reads
tend to be much faster than writes, so at least in the dense case, it may
make sense to create the temporary. Note that creating the temporary has no
effect on the result at all. In sparse matrix multiplication the trade-offs
are different, because it usually requires scatter/gather -- but of course
that always uses a temporary.
Complexities:
B*v => O(N^2)
A*c => O(N^2)
A*(B*v) => O(N^2) + O(N^2) => O(N^2)
2. x = (A*B)*v
A*B is a matrix C. x is still a column vector with the same dimensions as
v.
x[0] = C[0,0]*v[0] + C[0,1]*v[1] + C[0,2]*v[2]... + C[0,N]*v[N]
x[1] = C[1,0]*v[0] + C[1,1]*v[1] + C[1,2]*v[2]... + C[1,N]*v[N]
Can we eliminate the temporary matrix C?
Well, we only need one row of C at a time, and in fact we just need to be
able to compute single elements of that row because we can quickly multiply
each element by the corresponding element of v and accumulate that into the
element of x we're computing. Each element of C can be computed as the
dot-product of a row of A and a column of B. So, yes, we can eliminate the
temporaries completely in this case.
Complexities:
A*B => O(N^3)
C*v => O(N^2)
(A*B)*v => O(N^3) + O(N^2) => O(N^3)
3. A*B*v
We can't tell this from case 2, unless someone made it possible to overload
parentheses while I wasn't looking ;-)
So I only have one question left: should we interpret A*B*v and (A*B)*v the
same as A*(B*v)? I think that if you care about the effects of limited
precision, the answer has to be no. We can always rely on the parentheses
to get the effects we need.
> I say this for four reasons:
> a) The semantics of expression evaluation need to be clear and well
> defined. This is important to me, as I need to know the complexity and
---------------------------------------------------------^
"worst-case"?
IOW, I presume you won't mind if an operation goes faster than expected.
> likely numerical accuracy of the result. Certainly uBLAS with ET
> certainly has very different semantics (and syntax) then without ET.
Really, the syntax is different? Can you give an example? That really
surprises me, considering they're turning ETs on and off with NDEBUG. If
that were the case, most programs would stop compiling in release mode.
> Infact they seem to be two extremes of the possible stratergies. Both
> Benidict's points and David's worries regarding changes to template
> matching are relvant here.
How are changes to template matching relevant?
> I believe uBLAS's ET semantics are that the expression is evaluated as a
> WHOLE. That is each element of the result is computed using a complete
> expression formed from all the element on which it depends. Correct me
> if I am wrong here!
It's pretty hard to correct such a vague statement. Can you be specific
about what you mean?
> Certainly it is wrong to think about various left or
> right associative interpretations of prod in this context.
> I believe this matches my expectation for clear and well definied
> semantics. I cannot think of a way to automatically add temporaries that
> would not muddle this horribly.
I think I illustrated how you can evaluate A*(B*v) both with and without
temporary vectors, giving identical results but with somewhat different
efficiency trade-offs.
> b) There is no "Best" solution to introducing temporaris. They vary
> in complexity and numerical accuracy.
I think automatically introduced temporaries should never affect accuracy.
> c) There are simple ways to explicity define when you want a a
> temporary. Either use more then one expression OR add explict
constructors.
I think the library should ultimately be tuned to choose the most efficient
strategy which doesn't affect accuracy.
> d) The temporaries would inevitably be allocated from the heap. My
> experience with embedded programming makes we object to anything using
> the heap without my explict consent!
Sometimes a little heap-allocated storage results in a huge speedup. See
scatter/gather for sparse matrices, for example (or the adaptive algorithms
in the standard library). I would not want to make operations much less
efficient just to avoid the objections of an anti-heap embedded programmer.
-Dave
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk