Boost logo

Boost :

From: Joerg Walter (jhr.walter_at_[hidden])
Date: 2002-06-30 05:26:01


----- Original Message -----
From: "Benedikt Weber" <weber_at_[hidden]>
Newsgroups: gmane.comp.lib.boost.devel
To: <boost_at_[hidden]>
Sent: Sunday, June 30, 2002 12:02 AM
Subject: [boost] uBLAS and expression templates

> I would like to discuss one important aspect of expression templates for
> matrix algebra, that has not been touched so far. The documentation and
> examples shown in uBLAS give the impression that expression templates
> always lead to more efficient calculations.

Sorry for that, they don't.

> That is, however, only true for some
> expressions. For many expressions, expression templates perform much worse
> than the regular evaluation. Incidentally, all examples in uBLAS are taken
> from BLAS and these do indeed perform well with expression
templates.uBLAS,
> however, is very flexible and allows for many more complicated expressions
> and I am sure users want to take advantage of that possibility.

Agreed.

> Expression templates are a clever technique to avoid temporaries during
the
> evaluation of matrix/vector expressions. While avoiding temporaries can
> speed up the evaluation, it can also increase the number of operations
> dramatically. Consider the example A*B*v, where A and B are matrices and v
> is a vector. First take the case without expression templates. The
> expression should be evaluated as A*(B*v), because then you have only two
> matrix/vector products. Taking all dimensions as N, this will lead to
2*N^2
> flops. If you evaluate it as (A*B)*v, you have one matrix/matrix product
and
> one matrix/vector product, which is much worse in terms of number of
> operations, namely N^3+N^2. Now do the same thing with expression
templates.
> Evaluating A*(B*v) will lead to N^3 flops, because the vector B*v is not
> stored as a temporary but recalculated N times. Evaluating (A*B)*v also
> gives N^3 flops, because each matrix row in (A*B) is evaluated N times. So
> with expression templates the performance is much worse than without.

Correct. It's a proven (but not well known ;-) technique to reintroduce
temporaries at the right places into the expression, i.e. writing A * vector
(B * v). But I see that the prod() functions (matrix-vector and
matrix-matrix) play a special role w.r.t. computational complexity.

> On my
> Pentium 3, taking N=100 and evaluating the expression 1000 times takes
> _with_ expression templates (#define NUMERICS_USE_ET):
> A*(B*v): 6.59 s
> (A*B)*v): 7.39 s
> and _without_ expression templates:
> A*(B*v): 0.21 s
> (A*B)*v): 13.62 s

Ok, here's a slightly extented sample:
---------
#include <iostream>
#include <boost/timer.hpp>
#include <boost/numeric/ublas/config.h>
#undef NUMERICS_USE_ET
#include <boost/numeric/ublas/vector.h>
#include <boost/numeric/ublas/matrix.h>
#include <boost/numeric/ublas/io.h>

using namespace boost;
using namespace boost::numerics;

int main () {
    timer t;
    matrix<double> A (100, 100), B (100, 100);
    vector<double> v (100), r (100);
    for (int i = 0; i < 100; ++ i) {
         for (int j = 0; j < 100; ++ j) {
             A (i, j) = B (i, j) = 100 * i + j;
         }
         v (i) = i;
    }
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (prod (A, B), v));
    std::cout << t.elapsed () << std::endl;
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (A, prod (B, v)));
    std::cout << t.elapsed () << std::endl;
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (matrix<double> (prod (A, B)), v));
    std::cout << t.elapsed () << std::endl;
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (A, vector<double> (prod (B, v))));
    std::cout << t.elapsed () << std::endl;
    return 0;
}

---------

Without ETs (#undef NUMERICS_USE_ET) this gives on my platform (Linux, GCC
3.1)

6.34
0.12
6.34
0.12

With ETs I get

6.2
5.9
6.3
0.12

> Considering only the number of operations, the last number should
> theoretically be of the same order as the calculations with expression
> templates, so we gain a factor of 2 here from expression templates. But
> this depends on how efficient the evaluation is done without expression
> templates. It's quite tricky to understand what's going on exactly from
the
> code, and I did not get it yet up to now. I did the calulations with
> Codewarrior because MSVC gives internal compiler errors, as also indicated
> in config.h. (For MSVC, NUMERICS_USE_ET is therefore always defined in
> config.h, so be careful.)
>
> As I understand it, and looking at a few tests, neither matrix/matrix
> multiplication nor matrix/vector multiplication gain much efficiency from
> expression templates. A factor of two was sometimes observed as in the
> examples above, but this could be due to an inefficient implementation of
> the the case without expression templates (correct me, if I am wrong). So
> my first guess is, you should use expression templates only for addition,
> subtraction, assignment and scalar multiplication with vectors or matrices
> (and, of course, the corresponding +=, -=, *=). I hope it is possible to
> have only some of the operations evaluated by expression templates. And I
> hope the problem can be solved by just having some operations being
> treated differently. If you would have to analyse the whole expression and
> optimize it, this would probably be much more involved.
>
> I think this is a serious problem and should be considered in more detail.

Yes, I'm currently thinking about changing the interface for prod() to make
the above problem more clear.

> I really like the library, especially for its flexibility to evaluate
> expressions with matrices of different shapes and different types of
> entries and it would be a shame if users could not write down more
> complicated expressions without loosing efficiency.

Thanks very much for spotting this.

Regards

Joerg


Boost list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk