MTL Development Plan (preliminary) and Notes

Current Status

My work on FAST has taken me into the realm of how to represent sequences with cursors and property maps, and how to dispatch algorithms.

Index

1   Milestone 1

1.1   Create Basic Development Infrastructure

1.1.1   Accounts

Larry Meehan reports that accounts have been set up at:

  • magrathea.osl.iu.edu (main NFS file server for user directories)
  • milliways.osl.iu.edu (linux -- hosts boost mailing lists)
  • eddie.osl.iu.edu (currently down)
  • vogon.osl.iu.edu (linux)
  • earth.osl.iu.edu (solaris)
  • deep-thought.osl.iu.edu (solaris)
  • frood.osl.iu.edu and some other Mac systems.

(2004-11-9)

1.1.2   Testing System

We'll be using Boost.Build version 2 (BBv2) for all building/testing. I've invested great deal of time recently in trying to grok BBv2, and am working closely with Vladimir Prus (the primary maintainer) to ensure that its documentation is comprehensible, which means going through a massive review/edit cycle.

A project has just been started to rewrite Boost.Build in Python, hopefully with a Scons substrate. The rewrite should yield many advantages, not least freeing Boost.Build developers from the shackles of the odd language built into Boost.Jam, and much smarter target updating logic. Scons is a wonderful build system, and several projects hosted at OSL have apparently started using it. That said, it is very low-level; we want the high-level and platform-/compiler-neutral functionality of Boost.Build.

(2005-1-25)

1.1.3   Documentation System

Right now we are using Docutils and reStructuredText for documentation. We have an automated system called litre (“literate reStructuredtext”) for extracting and testing C++ examples. Serious consideration is being given to the idea of moving to quickbook, not least because we expect the codebase to be more understandable and maintainable. Translating litre to quickbook will require generating some Python bindings, though, as some scripting language integration is crucial.

(2005-1-25)

1.2   Bootstrap Design/Coding

Iterating between generic interface design and low-level experiments to characterize performance impact of interface design decisions.

1.3   Develop Fixed Algorithm Size Template Library (FAST)

Cursors have types that represent their positions. That is to say, a cursor has a different type from each of its neighbors.

1.3.1   Implement Unrolled copy Algorithm

  1. Non-Homogeneous Sequences - this assumes that there is no

    single type that can be used to represent cursors for all positions in the sequence. A tuple of different types is a good example of such a sequence.

  2. Homogeneous Sequences - When a homogeneous representation of

    a cursor's position exists (e.g. a pointer or integer for a fixed-size array), the algorithm can be implemented much more efficiently at compile-time, once the sequence length is known, by moving a homogenous cursor each time the sequence is subdivided.

It should be possible to generalize the support for homogeneous sequences into something that will unroll dynamically-sized sequences as well as fixed-size ones.

1.3.2   Design Segmented Cursors and Property Maps

This is the cursor/property map equivalent to the segmented iterators described in [Austern98].

[Austern98]Matthew H. Austern, Segmented Iterators and Hierarchical Algorithms, 1998. Lecture Notes In Computer Science; Vol. 1766 Selected Papers from the International Seminar on Generic Programming, Pages: 80 - 90, ISBN:3-540-41090-2 http://lafstern.org/matt/segmented.pdf

1.3.3   Implement Segmentation Optimization for copy

We don't want to unroll the largest homogeneous sequences completely. Instead it would be better to subdivide them into unrolled chunks, and iterate the unrolled chunks at runtime. Implement this optimization by imposing a segmented view over the fixed-size sequence. This optimization is basically the same as matrix blocking, but in-the-small.

1.4   First Cut at ATLAS-like Tuning Framework

We can start by deciding the maximal amount of loop unrolling that's appropriate for various fixed-sized data structures. We can also decide loop unrolling for some regular variable-sized sequences.

1.7   Linear Algebra Concept Taxonomy

In which we define concepts such as Ring, Field, LinearOperator, LinearAlgebra, TransposableLinearOperator, AbelianGroup, HilbertSpace, BanachSpace, VectorSpace, and R-Module.

1.7.1   Dealing with the Imprecision of Floating-Point

(2005-1-27)

Traditional mathematical concepts are defined in terms of calculations on pure numbers that exhibit no rounding error, but the number types we use every day in numerical linear algebra (e.g., float and double) don't behave quite that well [High02]. In Section 7.1, subsection Equality of Jeremy Siek's preliminary documentation for his early prototype of this project, the notation

boost/tools/build/jam_src/

a =ε b

was used to mean “|a - b| < ε where ε is some appropriate small number for the situation (like machine epsilon).” The problem with that is that it's too fuzzy. In particular, according to Andrew Lumsdaine, ordinary floating-point numbers don't actually model Field when notation is used to describe the concept.

One approach to this issue might be to expel the notion of imprecision from the concept taxonomy. Concepts like Field would be require true equality, and we'd deal with the imprecision of floating-point by saying, that if an algorithm requires one of its arguments to model Field and you pass a double (which isn't quite a model of Field), then naturally the algorithm doesn't produce the promised result. Instead, if you pass an approximation of a Field to the algorithm it produces some approximation to the specified result.

That approach is unsatisfying because the error bounds of any algorithm when used with real-life floating datatypes can be calculated, and we'd like our algorithm specifications to be able to make some promises about the magnitude of those errors. Naturally, if you have violated an algorithm's requirements by passing a float where it expects a pure Field, the algorithm can't make any promises at all about the result! Looked at from the other side, if the algorithm can make some guarantees about the result it produces for some input, then whatever the specification says, the input must clearly satisfy some real, underlying requirement.

Only by keeping floating types in the concept taxonomy can we sensibly make guarantees about the precision of algorithms operating on those types. We assert that float and double model a concept called FieldWithError[1], of which Field is a refinement that requires perfect precision. Similar “-WithError” counterparts exist for all the basic algebraic concepts. Just as algorithms like std::binary_search require Forward Iterators` but make stronger efficiency guarantees when passed Random Access Iterators`, numerical algorithms can require their arguments to model the imprecise “-WithError” concepts and make stronger precision guarantees when operating on models of precise algebraic concepts.

This approach has the added benefit of allowing algorithms to be specialized based on refinement. For example, most L/U factorization algorithms involve pivoting steps designed to reduce the magnitude of errors induced by floating-point operations. However, when the element type models a precise algebraic concept (e.g. an infinite-precision rational number type), those pivoting steps are not required. A similar effect occurs in simulations where matrices with the same sparse structure are factored repeatedly: in calculating the sparse structure of the result, a boolean “fill” type that requires no pivoting can be used.

Andrew Lumsdaine notes (2005-1-28) that

“Another simpler example of where things can be sped up in infinite precision case is in just adding up a list of numbers. To do this with high accuracy with floats you want to sort, normalize, etc. With infinite precision, you can just add them up.”

and

“We should probably also distinguish infinite precision from infinite length. I.e., integers can be added without error, but not if they overflow. So perhaps a Bounded concept as well. A float therefore models FinitePrecision and Bounded
[1]Pick a different name if you like.
[High02]Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, Second edition, SIAM, 2002, xxx+680 pp, ISBN 0-89871-521-0. http://www.ma.man.ac.uk/~higham/asna/

1.7.2   Deep vs. Shallow Copy Semantics

Unlike previous incarnations of MTL, we do not plan to use a handle-body implementation for matrices and vectors.

  • except for views and adapters, which explicitly do not own data, copy constructors should copy (no "handles"). Rationale: this models the well-understood behavior of mathematical primitives. Stack-based and heap-based objects have consistent behavior. As an upshot of both these facts, there is less chance of confusing bugs.

  • assignment operators should always copy. Views and adapters copy over their target elements when assigned. Rationale: ditto.

  • Efficiency issues can be handled using library implementations of move semantics. "Perfect" move semantics are possible in most modern compilers today, and with recent developments in the core working group that capability will become mandated (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#291) and even automatic (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#391). None of this was available when Jeremy wrote his paper.

  • Issues of views and reference binding (see http://www.osl.iu.edu/research/mtl/reference/html/MTL_Object_Model.html) can be dealt with by returning const views from adapter functions. For example:

    template <class MatrixType>
    const transpose_view<MatrixType> transpose(MatrixType& m);
    

    consider:

    typedef transpose_view<matrix<> > t;
    typedef transpose_view<matrix<> const> tc;
    

    The library supplies t with const member functions and free functions accepting t const& that can mutate t's referent matrix.

    The library only supplies tc with const member functions and free functions accepting tc const& that cannot mutate tc's referent matrix.

1.8   Algorithm Implementations

Enough support so that vectors model VectorSpace and vectors + matrices model Linear Algebra.

1.9   Expression Templates

Support operator notation for implemented algorithms.

2   Milestone 2

2.2   Expand Matrix Representations

Add Storage and corresponding Shape aspects.

2.2.1   Triangular and Banded

Note

Triangular can be seen as a special case of banded.

2.2.1.1   Packed Storage

Applies to banded and triangular shapes

2.2.1.2   Triangular packed storage

Applies to triangular shape

2.2.1.3   BLAS banded storage

Applies to banded shape

2.2.1.4   Tridiagonal shape

Applies to diagonal orientation

2.2.2   Symmetric

is this really a shape?

Note

re-use triangular packed storage for these

3   Milestone 3

3.1   Blocking Dense Matrix Matrix multiply

Note

probably involves blocked view of dense matrix

4   Milestone 4

4.1   Sparse Fixed Blocked CSR

New data structure modeling Linear Algebra when combined with Vector. Blocking should be exploited for fast Matrix Vector product

Note

Fast addition may be too hard to do.

5   Milestone 5

5.1   Sparse Variable Blocked CSR

New data structure modeling Linear Algebra when combined with Vector. Blocking should be exploited for fast Matrix Vector product

Note

Fast addition may be too hard to do.

6   Milestone 6

6.1   Generic LU Factorization

Note

Don't worry about making all combinations fast

7   Milestone 7

Incorporate parallelism in conjunction with parallel BGL