|
Boost : |
From: lums_at_[hidden]
Date: 2001-03-17 08:38:50
--- In boost_at_y..., Jeremy Siek <jsiek_at_l...> wrote:
> BTW, this kind of process led to the development of the
> original BLAS... they are the basic operations needed in the direct
dense
> matrix solvers provided by LAPACK.
>
> I'd challenge those interested in boost numerics libraries to come
forward
> with concrete lists of problems and algorithms for a particular
domain.
> With those in hand we'll be able to determine scope and priority for
> various peices, and it will provide a firmer ground for discussing
> interface and implementation issues.
>
> Personally, I'll contribute to the list of algorithms for the
> traditionally linear algebra domain.
Let me take a broad stab at a couple of different levels. At the
highest level, the canonical set of problem classes for numerical
linear algebra consist of:
o) Solving linear systems of equations
o) Solving least-squares problems (e.g., over and under determined
systems)
o) SVD
o) Eigensystems
Within each of these problem classes, solution methods fall along the
lines of application area and problem representation. E.g., there
will be different algorithms for symmetric versus unsymmetric
systems, dense or sparse, direct or iterative. I suspect that the
result of this survey will be that it would be nice to have an
algorithm for each possible combination of dense/sparse,
symmetric/unsymmetric, and direct/iterative -- with the possible
exception of dense+iterative -- although this is not unheard of.
LAPACK covers all of these combinations for the dense case, e.g.
Going one level deeper -- one would want to provide algorithms at all
the leaf nodes of the following hierarchy. I list some example
classical algorithms for each leaf -- note that there are a number of
implementation variations (left vs right looking Cholesky
factorizations, e.g.) Note also that beyond linear systems, direct
methods for sparse systems become more and more difficult because the
popular factorizations tend to destroy sparsity. There are sparsity
preserving QR factorizations, but I am not aware of anything
equivalent for SVD or eigenproblems.
I. Linear systems
A. Direct
1. Symmetric
a. Dense -- Cholesky factorization, forward + back solves
b. Sparse -- Cholesky factorization, forward + back solves,
also needed are graph algorithms for ordering
2. Unsymmetric
a. Dense -- LU factorization with partial pivoting, forward
+ back solves
b. Sparse -- LU factorization with partial pivoting, forward
+ back solves
B. Iterative
1. Symmetric
a. Dense -- captured in sparse case
b. Sparse -- preconditioned conjugate gradient,
preconditioners needed
2. Unsymmetric
a. Dense -- captured by sparse case
b. Sparse -- preconditioned Krylov algorithms such as gmres,
qmr, bicgstab, etc., preconditioners needed
II. Least squares
A. Direct
1. Symmetric
a. Dense -- QR factorization
b. Sparse -- QR factorization
2. Unsymmetric
a. Dense -- QR factorization
b. Sparse -- QR factorization
B. Iterative
1. Symmetric
a. Dense
b. Sparse -- various Krylov algorithms
2. Unsymmetric
a. Dense
b. Sparse -- various Krylov algorithms
III. SVD
A. Direct
1. Symmetric
a. Dense -- classic SVD
b. Sparse -- I don't know of anything here
2. Unsymmetric
a. Dense -- classic SVD
b. Sparse -- I don't know of anything here
B. Iterative
1. Symmetric
a. Dense
b. Sparse -- Krylov algorithms, useful for largest SVs
primarily
2. Unsymmetric
a. Dense
b. Sparse -- Krylov algorithms, useful for largest SVs
primarily
IV. Eigensystems
A. Direct
1. Symmetric
a. Dense -- classical algorithms
b. Sparse -- nothing off the top of my head
2. Unsymmetric
a. Dense -- shifted QR
b. Sparse -- nothing off the top of my head
B. Iterative
1. Symmetric
a. Dense
b. Sparse -- various Krylov, useful for extremal eigenvalues
primarily
2. Unsymmetric
a. Dense
b. Sparse -- various Krylov, useful for extremal eigenvalues
primarily
In addition to this are issues related to high performance. For
instance, one not only wants LU factorization, one wants a high-
performance LU factorization. In an ideal world one would just build
the 'typical' algorithm on high-performance pieces, but this might
not always be possible. For the LU factorization, a high-performance
version is realized by constructing a blocked algorithm (to maximize
cache reuse). This in turn depends on a non-blocked LU as well as a
high-performance matrix-matrix product. The performance is all
contained in the matrix-matrix product, but the algorithm itself
needs to be constructed in block fashion.
Finally, before we go pouring new wine into old wineskins, there is
one very interesting question I want to raise. Many of the
algorithms above are now "classic" with very well studied
implementations. I have always wondered, however, if new algorithms
(or new implementations) might emerge due to better implementation
machinery. E.g., there are numerous assumptions underlying these
algorithms and their implementations. Which assumptions no longer
apply?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk