Boost logo

Ublas :

From: Simon Clift (ssclift_at_[hidden])
Date: 2007-03-15 09:33:57

On Mon, 2007-03-12 at 18:29 +0100, Preben Hagh Strunge Holm wrote:
> Hmm... most of them are actually sparse (compressed matrix).
> For future optimizations I better try these other proposed methods out.

I tried an experiment a while ago to see if I could speed up sparse row
compressed matrix multiplication. I explicitly set pointers to the
uBLAS data() fields, then coded the matrix-vector multiply using raw
native (double*) and (size_t*) types for the coefficients and row/column

I also tried manually unrolling the loops.

Using GCC 4.0 with -O2 there was no difference that I could detect
between the speed of the plain uBLAS code and the "hand optimized" code,
even on small cases that ought to have fit into cache. The uBLAS code
was evidently straightforward enough for the compiler to do a good job.

For dense cases I get decent speed using the uBLAS<->ATLAS interface
from the CVS version.
I had a chance, at a numerical conference last year, to chat with one of
the IBM people who was working on Power/Cell compilers. From that
conversation it seems that sparse matrix operations is not something
they have worked on (which is a shame).

I've seen one article about replacing the ia[] and ja[] matrices of a
CSR sparse matrix with size_t* and double* pointers into a target ja[]
and x[] array. That limits the application of the resulting structure
to a multiply only, and with one target x[] vector only.

For large problems I'm usually memory-bandwidth bound anyway, so clever
one-CPU optimizations don't seem to help me very much.

Simon Clift                       ssclift at cs dot uwaterloo dot ca
Ph.D. Student, Computational Finance, Scientific Computation Group
David R. Cheriton School of Computer Science
University of Waterloo, Waterloo, Ontario, Canada  N2L 3G1