
Boost : 
From: lums_at_[hidden]
Date: 20010317 08:38:50
 In boost_at_y..., Jeremy Siek <jsiek_at_l...> wrote:
> BTW, this kind of process led to the development of the
> original BLAS... they are the basic operations needed in the direct
dense
> matrix solvers provided by LAPACK.
>
> I'd challenge those interested in boost numerics libraries to come
forward
> with concrete lists of problems and algorithms for a particular
domain.
> With those in hand we'll be able to determine scope and priority for
> various peices, and it will provide a firmer ground for discussing
> interface and implementation issues.
>
> Personally, I'll contribute to the list of algorithms for the
> traditionally linear algebra domain.
Let me take a broad stab at a couple of different levels. At the
highest level, the canonical set of problem classes for numerical
linear algebra consist of:
o) Solving linear systems of equations
o) Solving leastsquares problems (e.g., over and under determined
systems)
o) SVD
o) Eigensystems
Within each of these problem classes, solution methods fall along the
lines of application area and problem representation. E.g., there
will be different algorithms for symmetric versus unsymmetric
systems, dense or sparse, direct or iterative. I suspect that the
result of this survey will be that it would be nice to have an
algorithm for each possible combination of dense/sparse,
symmetric/unsymmetric, and direct/iterative  with the possible
exception of dense+iterative  although this is not unheard of.
LAPACK covers all of these combinations for the dense case, e.g.
Going one level deeper  one would want to provide algorithms at all
the leaf nodes of the following hierarchy. I list some example
classical algorithms for each leaf  note that there are a number of
implementation variations (left vs right looking Cholesky
factorizations, e.g.) Note also that beyond linear systems, direct
methods for sparse systems become more and more difficult because the
popular factorizations tend to destroy sparsity. There are sparsity
preserving QR factorizations, but I am not aware of anything
equivalent for SVD or eigenproblems.
I. Linear systems
A. Direct
1. Symmetric
a. Dense  Cholesky factorization, forward + back solves
b. Sparse  Cholesky factorization, forward + back solves,
also needed are graph algorithms for ordering
2. Unsymmetric
a. Dense  LU factorization with partial pivoting, forward
+ back solves
b. Sparse  LU factorization with partial pivoting, forward
+ back solves
B. Iterative
1. Symmetric
a. Dense  captured in sparse case
b. Sparse  preconditioned conjugate gradient,
preconditioners needed
2. Unsymmetric
a. Dense  captured by sparse case
b. Sparse  preconditioned Krylov algorithms such as gmres,
qmr, bicgstab, etc., preconditioners needed
II. Least squares
A. Direct
1. Symmetric
a. Dense  QR factorization
b. Sparse  QR factorization
2. Unsymmetric
a. Dense  QR factorization
b. Sparse  QR factorization
B. Iterative
1. Symmetric
a. Dense
b. Sparse  various Krylov algorithms
2. Unsymmetric
a. Dense
b. Sparse  various Krylov algorithms
III. SVD
A. Direct
1. Symmetric
a. Dense  classic SVD
b. Sparse  I don't know of anything here
2. Unsymmetric
a. Dense  classic SVD
b. Sparse  I don't know of anything here
B. Iterative
1. Symmetric
a. Dense
b. Sparse  Krylov algorithms, useful for largest SVs
primarily
2. Unsymmetric
a. Dense
b. Sparse  Krylov algorithms, useful for largest SVs
primarily
IV. Eigensystems
A. Direct
1. Symmetric
a. Dense  classical algorithms
b. Sparse  nothing off the top of my head
2. Unsymmetric
a. Dense  shifted QR
b. Sparse  nothing off the top of my head
B. Iterative
1. Symmetric
a. Dense
b. Sparse  various Krylov, useful for extremal eigenvalues
primarily
2. Unsymmetric
a. Dense
b. Sparse  various Krylov, useful for extremal eigenvalues
primarily
In addition to this are issues related to high performance. For
instance, one not only wants LU factorization, one wants a high
performance LU factorization. In an ideal world one would just build
the 'typical' algorithm on highperformance pieces, but this might
not always be possible. For the LU factorization, a highperformance
version is realized by constructing a blocked algorithm (to maximize
cache reuse). This in turn depends on a nonblocked LU as well as a
highperformance matrixmatrix product. The performance is all
contained in the matrixmatrix product, but the algorithm itself
needs to be constructed in block fashion.
Finally, before we go pouring new wine into old wineskins, there is
one very interesting question I want to raise. Many of the
algorithms above are now "classic" with very well studied
implementations. I have always wondered, however, if new algorithms
(or new implementations) might emerge due to better implementation
machinery. E.g., there are numerous assumptions underlying these
algorithms and their implementations. Which assumptions no longer
apply?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk