Boost logo

Ublas :

From: Gunter Winkler (guwi17_at_[hidden])
Date: 2007-03-02 07:43:16

On Friday 02 March 2007 13:21, Sourabh wrote:
> Hi,
> Can anyone please suggest me some optimizations in the loop to improve
> the overall speed of the following program ?
> The steps I am doing are:
> 1. Solve for Y the equation: (I-X) Y = B, where X is a lower triangular
> sparse matrix.
> 2. Multiply Y with lower triangular sparse matrix Z, A = Y*Z.
> 3. Add A to another vector.
> The sizes in the above steps are of order of 100.

Why don't you use dense matrices and fast BLAS (see This is usually extremly fast for small
matrices (read: smaller than the processor cache)

Without any knowledge of the structure of your matrices, no-one can make any
suggestions. The general answer is: Try to find an ordering of rows/columns
such that X and Y are band matrices and use banded BLAS.

> The above mentioned sequence is repeated by 2e9 times. So it is taking too
> much time.

Precompute as much as possible outside your loop. If necessary split the loop
and try to use large dense matrices and fast BLAS. (No, ublas is not fast -
ublas is _reliable_)

> I am sure if I optimize the inner-most loop, it can be made fast.
> What optimizations can be made in the loop ?

> int main (int argc, char* argv[])
> {
> unsigned int samples = atoi (argv[1]);
> unsigned int numpaths = atoi (argv[2]);
> unsigned short gatesperpath = atoi (argv[3]);
> for (unsigned int i = 0; i < samples; ++i) {
> for (unsigned int j = 0; j < numpaths ; ++ j) {

> mapped_matrix<double> lembda (gatesperpath, gatesperpath);
> identity_matrix<double> ident (gatesperpath);

are these matrices constant and independent of i and j? If so, initialize them
before the loop and use a compressed_matrix.

> slopes = solve (lembda, omegas, lower_tag ());

You mean unit_lower_tag(): Solve (I+lembda) x = omegas?

> vector <double> delays (gatesperpath);

move the construction out off the loop

> delays = prod (lembda, slopes);
> delays += omegas;

try: noalias( delays ) = omegas;
     axpy_prod( lambda, slopes, delays, false ); // from operation.hpp

> }
> }
> }