Boost logo

Boost :

From: Joerg Walter (jhr.walter_at_[hidden])
Date: 2002-06-30 05:26:01


----- Original Message -----
From: "Benedikt Weber" <weber_at_[hidden]>
Newsgroups: gmane.comp.lib.boost.devel
To: <boost_at_[hidden]>
Sent: Sunday, June 30, 2002 12:02 AM
Subject: [boost] uBLAS and expression templates

> I would like to discuss one important aspect of expression templates for
> matrix algebra, that has not been touched so far. The documentation and
> examples shown in uBLAS give the impression that expression templates
> always lead to more efficient calculations.

Sorry for that, they don't.

> That is, however, only true for some
> expressions. For many expressions, expression templates perform much worse
> than the regular evaluation. Incidentally, all examples in uBLAS are taken
> from BLAS and these do indeed perform well with expression
templates.uBLAS,
> however, is very flexible and allows for many more complicated expressions
> and I am sure users want to take advantage of that possibility.

Agreed.

> Expression templates are a clever technique to avoid temporaries during
the
> evaluation of matrix/vector expressions. While avoiding temporaries can
> speed up the evaluation, it can also increase the number of operations
> dramatically. Consider the example A*B*v, where A and B are matrices and v
> is a vector. First take the case without expression templates. The
> expression should be evaluated as A*(B*v), because then you have only two
> matrix/vector products. Taking all dimensions as N, this will lead to
2*N^2
> flops. If you evaluate it as (A*B)*v, you have one matrix/matrix product
and
> one matrix/vector product, which is much worse in terms of number of
> operations, namely N^3+N^2. Now do the same thing with expression
templates.
> Evaluating A*(B*v) will lead to N^3 flops, because the vector B*v is not
> stored as a temporary but recalculated N times. Evaluating (A*B)*v also
> gives N^3 flops, because each matrix row in (A*B) is evaluated N times. So
> with expression templates the performance is much worse than without.

Correct. It's a proven (but not well known ;-) technique to reintroduce
temporaries at the right places into the expression, i.e. writing A * vector
(B * v). But I see that the prod() functions (matrix-vector and
matrix-matrix) play a special role w.r.t. computational complexity.

> On my
> Pentium 3, taking N=100 and evaluating the expression 1000 times takes
> _with_ expression templates (#define NUMERICS_USE_ET):
> A*(B*v): 6.59 s
> (A*B)*v): 7.39 s
> and _without_ expression templates:
> A*(B*v): 0.21 s
> (A*B)*v): 13.62 s

Ok, here's a slightly extented sample:
---------
#include <iostream>
#include <boost/timer.hpp>
#include <boost/numeric/ublas/config.h>
#undef NUMERICS_USE_ET
#include <boost/numeric/ublas/vector.h>
#include <boost/numeric/ublas/matrix.h>
#include <boost/numeric/ublas/io.h>

using namespace boost;
using namespace boost::numerics;

int main () {
    timer t;
    matrix<double> A (100, 100), B (100, 100);
    vector<double> v (100), r (100);
    for (int i = 0; i < 100; ++ i) {
         for (int j = 0; j < 100; ++ j) {
             A (i, j) = B (i, j) = 100 * i + j;
         }
         v (i) = i;
    }
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (prod (A, B), v));
    std::cout << t.elapsed () << std::endl;
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (A, prod (B, v)));
    std::cout << t.elapsed () << std::endl;
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (matrix<double> (prod (A, B)), v));
    std::cout << t.elapsed () << std::endl;
    t.restart ();
    for (int run = 0; run < 1000; ++ run)
        r.assign (prod (A, vector<double> (prod (B, v))));
    std::cout << t.elapsed () << std::endl;
    return 0;
}

---------

Without ETs (#undef NUMERICS_USE_ET) this gives on my platform (Linux, GCC
3.1)

6.34
0.12
6.34
0.12

With ETs I get

6.2
5.9
6.3
0.12

> Considering only the number of operations, the last number should
> theoretically be of the same order as the calculations with expression
> templates, so we gain a factor of 2 here from expression templates. But
> this depends on how efficient the evaluation is done without expression
> templates. It's quite tricky to understand what's going on exactly from
the
> code, and I did not get it yet up to now. I did the calulations with
> Codewarrior because MSVC gives internal compiler errors, as also indicated
> in config.h. (For MSVC, NUMERICS_USE_ET is therefore always defined in
> config.h, so be careful.)
>
> As I understand it, and looking at a few tests, neither matrix/matrix
> multiplication nor matrix/vector multiplication gain much efficiency from
> expression templates. A factor of two was sometimes observed as in the
> examples above, but this could be due to an inefficient implementation of
> the the case without expression templates (correct me, if I am wrong). So
> my first guess is, you should use expression templates only for addition,
> subtraction, assignment and scalar multiplication with vectors or matrices
> (and, of course, the corresponding +=, -=, *=). I hope it is possible to
> have only some of the operations evaluated by expression templates. And I
> hope the problem can be solved by just having some operations being
> treated differently. If you would have to analyse the whole expression and
> optimize it, this would probably be much more involved.
>
> I think this is a serious problem and should be considered in more detail.

Yes, I'm currently thinking about changing the interface for prod() to make
the above problem more clear.

> I really like the library, especially for its flexibility to evaluate
> expressions with matrices of different shapes and different types of
> entries and it would be a shame if users could not write down more
> complicated expressions without loosing efficiency.

Thanks very much for spotting this.

Regards

Joerg


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk