Re: [glas] summary of fonctionnalities
From: Richard Vuduc (richie_at_[hidden])
Date: 2005-02-16 22:45:36
My colleagues at Berkeley and I are developing OSKI, an automatically
tuned low-level library of sparse matrix kernels in the spirit of recent
libraries like ATLAS/PHiPAC for dense matrix kernels and FFTW/SPIRAL for
signal processing kernels. Kernels we consider include sparse
matrix-vector multiply and sparse triangular solve, as well as other
kernels with better inherent cache locality (e.g., transpose(A)*A*x).
OSKI specifically automates the selection of a sparse data structure at
run-time, given your sparse matrix in a general format (e.g., compressed
sparse row or compressed sparse column) and your machine. We plan to
release version 1 within the next few weeks. You might be interested in
skimming slides from an overview talk, as well as a (draft) document
describing the interface to this library, both available here:
We welcome your questions and feedback.
The following issues may be relevant to the sparse component of the GLAS
* Although the "tuning" happens in OSKI at run-time when the matrix is
known, we rely on having libraries of pre-compiled kernels available to
choose from. A generic approach might be an interesting way to express
and then generate these implementations, in which the data structure and
kernel are fixed (e.g., sparse matrix-vector multiply using a 4x2 block
compressed sparse row format).
* OSKI defines an interface for a few "higher-level" kernels with more
opportunities to exploit data reuse, such as: (1) multiplying a dense
vector not just by a sparse matrix A, but by transpose(A)*A, (2)
simultaneous multiplication of vectors by A and transpose(A), and (3)
multiplication by powers of a matrix (e.g., A^k * x, for positive
integer k). We chose these based on our knowledge of their use, but
clearly there is no end to the possibilities of kernels users might
require. Again, perhaps a generic approach would be of use to address
* An important technique in optimizing sparse kernel implementations is
reordering the matrix. Explicit representations of permutation matrices
would be useful.
Regarding the Sparse BLAS specifically, there is a reference implementation:
See package/algorithm "818". We designed OSKI so that one could produce
a Sparse BLAS implementation based on it, as the reference
implementation does not do any kind of optimization. As far as I know,
there are no vendor implementations of the Sparse BLAS either.
Finally, I'd like to mention the related Kokkos C++ matrix library,
being developed as a part of the Trilinos software package:
Evgenii Rudnyi wrote:
> >Also note that the BLAS do not provide (nice) software for the
> > sparse case
> >(unless the performance has improved). Can this be confirmed
> >by someone?
> Well, there is some document
> BLAS Technical Forum Standard
> Chapter 3: Sparse BLAS
> but I have never seen this in action.
> Best wishes,
> Evgenii Rudnyi