From: boost (boost_at_[hidden])
Date: 2001-11-20 17:19:58
On Tuesday 20 November 2001 16:33, walter_at_[hidden] wrote:
> > Are there any blocking techniques in the current ublas library ?
> One of the objectives for ublas is to reduce the abstraction penalty
> when using vector and matrix abstractions. Therefore we use Todd
> Veldhuizen's expression template technique to eliminate temporaries
> and fuse loops and the Barton-Nackman trick to avoid virtual function
> call overhead.
Thanks for the clarification.
In my problem a major part of the computing time in spent in matrix-matrix
multiplications of various forms, with dimensions ranging from tiny up to
~1000, with the typical size of 100..400. (All these small blocks a part
of a big matrix, and the vectors are stored in a dyadic product of
2 vector spaces, i.e. represented by a list of matrices).
> For large matrices it's indicated (as you mention) to go the other
> way: reintroduce temporaries and split loops. So I imagine, that the
> usage of blocking techniques is a possible extension of ublas.
> BTW, netlib BLAS doesn't implement blocking, but LAPACK does AFAIK ;-)
> > I'm asking since I would need a performance which is comparable
> > to BLAS/ATLAS.
> Do you want to get netlib (reference) BLAS performance? Do you want
I havn't been in the computational physics for 4 years now, I reentered
this field only 2 weeks ago. In my old code I used vendor supplied
BLAS (e.g. IBM essl, ... ).
> to get ATLAS performance without any platform specific optimization?
> If we tune for a certain platform, which compiler/operating system
> combination do you prefer?
That's a pretty hard question. The programm will at least run on
Athlon PCs (Linux-Cluster), ibm rs6k (SP) Power2/3/4, HP PA, and
I'd be happy if I could replace (specialize) a few routines of ublas
by ATLAS or vendor supplied BLAS routines in order to perform benchmarks,
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk