Boost logo

Boost :

From: boost (boost_at_[hidden])
Date: 2002-06-27 15:25:19


On Thursday 27 June 2002 17:47, Martin Weiser wrote:

> Although the abstraction penalty is mostly insignificant compared to
> equivalent handwritten code, there remains a huge performance gap to
> optimized numerical kernels. I've done a preliminary comparison of dgemm
> performance with different matrix sizes N (see
>, measuring the flops/s
> multiplying two square matrices of size N (N=1,...,1000). The competitors
> are a vendor optimized BLAS implementation (Sun Performance Library) and
> a straightforward naive "C" implementation. The environtment is GCC 3.1
> (options -O3 -funroll-loops) on UltraSparc 10/300.

Thanks for doing this graph, I was planning to do this kind at the weekend, so
you saved my weekend :)

In my application the difference between plain ublas (gcc-2.95.4, amd/athlon,
hp pa-2.0) and a version where most ublas-prod expressions are replaced by
ATLAS dgemm (vendor blas for hp) gives an overall improvement of a factor 5
to 10. The matrix sizes are ~100x100 and below. So, the use of dgemm is a
must. Using xgemm instead of prod() for matrices which are suited for
xgemm is fairly simple (using #ifdef).
However, ublas is more general than blas and much more readable than blas,
so I hope ublas can be improved. Couldn't one take the concepts of ATLAS
to improve ublas? [I'm not an expert in blocking techniques, just an user of
blas). What about the expression templates used in MTL / Blitz++ ?

Best wishes,

Boost list run by bdawes at, gregod at, cpdaniel at, john at