From: jhr.walter_at_[hidden]
Date: 2002-11-22 14:52:37

Hi Alexei,

you wrote:

> I did some performance tests with uBLAS and MTL. Here are some
> results for those who is interested.
> The following tests were run:
> 1) uBLAS dense matrix multiplication:
> - row_major * row_major: ures.assign(prod(ur, ur))
> - row_major * column_major: ures.assign(prod(ur, uc))
> - column_major * row_major: ures.assign(prod(uc, ur))
> 2) MTL dense matrix multiplication:
> - row_major * row_major: mult(mr, mr, mres)
> - row_major * column_major: mult(mr, mc, mres)
> - column_major * row_major: mult(mc, mr, mres)
> 3) C matrix multiplication (basic multiplication algorithm for C
> array).
> 4) uBLAS sparse matrix multiplication (20% of non-zera elements):
> - row_major * row_major: ures.assign(prod(ur, ur))
> - row_major * column_major: ures.assign(prod(ur, uc))
> - column_major * row_major: ures.assign(prod(uc, ur))
> ures - is dense matrix.
> 5)MTL sparse marix multiplication (20% of non-zera elements):
> - row_major * row_major: mult(mr, mr, mres)
> - row_major * column_major: mult(mr, mc, mres)
> - column_major * row_major: mult(mc, mr, mres)
> mres - is dense matrix.
> All tests were run on Windows2000, I used gcc 3.2 with -O3
> optimization flag. boost 1.29 was used.
> Here are some results:
> 1) uBLAS and MTL have approximately the same performance for dense
> matrix mult. uBLAS is a bit faster with small matrices (< 50-100),
> MTL is faster with large ones (>100).
> 2) When working with C array multiplication it is 5-6 times faster
> than both uBLAS and MTL. (!!!)

This is too much abstraction penalty for uBLAS. Did you define -DNDEBUG
(enabling expression templates and disabling bounds and type checks)?

> 3) If I use my own simple mult funcion to mutiply uBLAS or MTL
> matrices:
> template <typename Mat>
> void mat_mat_mult(const Mat& m1, const Mat& m2, Mat& res, int rank) {
> for(int i = 0; i < rank; ++i)
> for(int j = 0; j < rank; ++j) {
> res(i, j) = 0;
> for(int k = 0; k < rank; ++k)
> res(i, j) += m1(i, k) * m2(k, j);
> }
> }
> It works 2 times faster than native uBLAS and MTL implementations of
> multiplication. (Iterators overhead?)

uBLAS normally doesn't use iterators when multiplying dense matrices.

> 4) Dense matrix performance doesn't depend on row orientation neither
> for uBLAS nor for MTL.

Maybe if your matrices fit into the cache, otherwise one probably should use
blocked operations.

> 5) MTL sparse matrix mult doesn't depend on row orientation and in
> any case much (2 times) faster than the best case for uBLAS.
> 6) uBLAS gives best performance for sparse matrices in row_major *
> column_major case (it is 3 times faster than column_major *
> row_major).
> 7) When I tryed latest uBLAS from boost cvs. It worked much (up to 2
> times) slower for sparse matrices than 1.29.

There are more debug runtime checks for sparse matrices now (see the remark
regarding NDEBUG).

> I would need to do other tests to make any real conclusions, but
> preliminary it seams to me that abstraction penalty for both uBLAS
> and MTL is too big.

Best regards


