# Ublas :

From: Sourabh (sourabh_at_[hidden])
Date: 2007-03-03 01:52:27

On Fri, 2 Mar 2007, Gunter Winkler wrote:

> On Friday 02 March 2007 13:21, Sourabh wrote:
> > Hi,
> > Can anyone please suggest me some optimizations in the loop to improve
> > the overall speed of the following program ?
> >
> > The steps I am doing are:
> > 1. Solve for Y the equation: (I-X) Y = B, where X is a lower triangular
> > sparse matrix.
> >
> > 2. Multiply Y with lower triangular sparse matrix Z, A = Y*Z.
> > 3. Add A to another vector.
> >
> > The sizes in the above steps are of order of 100.
>
> Why don't you use dense matrices and fast BLAS (see
> http://math-atlas.sourceforge.net/)? This is usually extremly fast for small
> matrices (read: smaller than the processor cache)
>
> Without any knowledge of the structure of your matrices, no-one can make any
> suggestions. The general answer is: Try to find an ordering of rows/columns
> such that X and Y are band matrices and use banded BLAS.
>

Hi Gunter,
The structure of my matrices are as follows :

X is a lower triangular matrix of size (100 x 100) (in addtion, all
diagonal elements are also
zero), with at most 3 or 4 non-zero entries per column.

B is a dense vector of 100.
Y is to be solved (size obviously 100)

> > The above mentioned sequence is repeated by 2e9 times. So it is taking too
> > much time.
>
> Precompute as much as possible outside your loop. If necessary split the loop
> and try to use large dense matrices and fast BLAS. (No, ublas is not fast -
> ublas is _reliable_)
>

What is your opinion, is construction of these matrices take much of the
time ?

> > I am sure if I optimize the inner-most loop, it can be made fast.
> > What optimizations can be made in the loop ?
>
> > int main (int argc, char* argv[])
> > {
> > unsigned int samples = atoi (argv[1]);
> > unsigned int numpaths = atoi (argv[2]);
> > unsigned short gatesperpath = atoi (argv[3]);
> >
> > for (unsigned int i = 0; i < samples; ++i) {
> > for (unsigned int j = 0; j < numpaths ; ++ j) {
>
> > mapped_matrix<double> lembda (gatesperpath, gatesperpath);
> > identity_matrix<double> ident (gatesperpath);
>
> are these matrices constant and independent of i and j? If so, initialize them
> before the loop and use a compressed_matrix.

The matrix ident is obvioulsy const. I can move it above. However,
the matrix lembda is different for each j. However, since I compute delays
based on current value of lembda, I don't need previous values of lembda.
So, I can modify already constructed lembda in each j. Which approach is
more efficient :
1. Create lembda each time in the innermost loop
2. Create lembda once and modify it in each innermost loop.

(lembda does not vary with i).

>
> > slopes = solve (lembda, omegas, lower_tag ());
>
> You mean unit_lower_tag(): Solve (I+lembda) x = omegas?

I need to solve the equation (I-lembda) slopes = omegas

>
> > vector <double> delays (gatesperpath);
>
> move the construction out off the loop
>

Same question as above. The delays vector creation or assignment to delays
in each loop?

> > delays = prod (lembda, slopes);
> > delays += omegas;
>
> try: noalias( delays ) = omegas;
> axpy_prod( lambda, slopes, delays, false ); // from operation.hpp
>
> > }
> > }
> >
> > }
>
> mfg
> Gunter
>

Thank you very much for your help.

```--
-- Sourabh
```