## Glas :## Re: [glas] summary of fonctionnalities |

**From:** Richard Vuduc (*richie_at_[hidden]*)

**Date:** 2005-02-16 22:45:36

**Next message:**Andreas Pokorny: "[glas] Diploma/Master Thesis about safe scalar products and optimized linear algebra expressions"**Previous message:**Evgenii Rudnyi: "[glas] Re: summary of fonctionnalities"**Maybe in reply to:**Yves Renard: "[glas] summary of fonctionnalities"

Hello all,

My colleagues at Berkeley and I are developing OSKI, an automatically

tuned low-level library of sparse matrix kernels in the spirit of recent

libraries like ATLAS/PHiPAC for dense matrix kernels and FFTW/SPIRAL for

signal processing kernels. Kernels we consider include sparse

matrix-vector multiply and sparse triangular solve, as well as other

kernels with better inherent cache locality (e.g., transpose(A)*A*x).

OSKI specifically automates the selection of a sparse data structure at

run-time, given your sparse matrix in a general format (e.g., compressed

sparse row or compressed sparse column) and your machine. We plan to

release version 1 within the next few weeks. You might be interested in

skimming slides from an overview talk, as well as a (draft) document

describing the interface to this library, both available here:

http://bebop.cs.berkeley.edu/oski

We welcome your questions and feedback.

The following issues may be relevant to the sparse component of the GLAS

activity:

* Although the "tuning" happens in OSKI at run-time when the matrix is

known, we rely on having libraries of pre-compiled kernels available to

choose from. A generic approach might be an interesting way to express

and then generate these implementations, in which the data structure and

kernel are fixed (e.g., sparse matrix-vector multiply using a 4x2 block

compressed sparse row format).

* OSKI defines an interface for a few "higher-level" kernels with more

opportunities to exploit data reuse, such as: (1) multiplying a dense

vector not just by a sparse matrix A, but by transpose(A)*A, (2)

simultaneous multiplication of vectors by A and transpose(A), and (3)

multiplication by powers of a matrix (e.g., A^k * x, for positive

integer k). We chose these based on our knowledge of their use, but

clearly there is no end to the possibilities of kernels users might

require. Again, perhaps a generic approach would be of use to address

this issue.

* An important technique in optimizing sparse kernel implementations is

reordering the matrix. Explicit representations of permutation matrices

would be useful.

Regarding the Sparse BLAS specifically, there is a reference implementation:

See package/algorithm "818". We designed OSKI so that one could produce

a Sparse BLAS implementation based on it, as the reference

implementation does not do any kind of optimization. As far as I know,

there are no vendor implementations of the Sparse BLAS either.

Finally, I'd like to mention the related Kokkos C++ matrix library,

being developed as a part of the Trilinos software package:

http://software.sandia.gov/trilinos/packages/kokkos/

--rich

Evgenii Rudnyi wrote:

>

> >Also note that the BLAS do not provide (nice) software for the

> > sparse case

> >(unless the performance has improved). Can this be confirmed

> >by someone?

>

> Well, there is some document

>

> BLAS Technical Forum Standard

> Chapter 3: Sparse BLAS

>

> http://www.netlib.org/blas/blast-forum/chapter3.pdf

>

> but I have never seen this in action.

>

> Best wishes,

>

> Evgenii Rudnyi

> --

> http://Evgenii.Rudnyi.Ru/