
Ublas : 
From: Simon Clift (ssclift_at_[hidden])
Date: 20070315 09:33:57
On Mon, 20070312 at 18:29 +0100, Preben Hagh Strunge Holm wrote:
> Hmm... most of them are actually sparse (compressed matrix).
>
> For future optimizations I better try these other proposed methods out.
I tried an experiment a while ago to see if I could speed up sparse row
compressed matrix multiplication. I explicitly set pointers to the
uBLAS data() fields, then coded the matrixvector multiply using raw
native (double*) and (size_t*) types for the coefficients and row/column
indices.
I also tried manually unrolling the loops.
Using GCC 4.0 with O2 there was no difference that I could detect
between the speed of the plain uBLAS code and the "hand optimized" code,
even on small cases that ought to have fit into cache. The uBLAS code
was evidently straightforward enough for the compiler to do a good job.
For dense cases I get decent speed using the uBLAS<>ATLAS interface
from the CVS version.
I had a chance, at a numerical conference last year, to chat with one of
the IBM people who was working on Power/Cell compilers. From that
conversation it seems that sparse matrix operations is not something
they have worked on (which is a shame).
I've seen one article about replacing the ia[] and ja[] matrices of a
CSR sparse matrix with size_t* and double* pointers into a target ja[]
and x[] array. That limits the application of the resulting structure
to a multiply only, and with one target x[] vector only.
For large problems I'm usually memorybandwidth bound anyway, so clever
oneCPU optimizations don't seem to help me very much.
 Simon Clift ssclift at cs dot uwaterloo dot ca Ph.D. Student, Computational Finance, Scientific Computation Group David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada N2L 3G1