
Ublas : 
From: Gunter Winkler (guwi17_at_[hidden])
Date: 20070302 07:43:16
On Friday 02 March 2007 13:21, Sourabh wrote:
> Hi,
> Can anyone please suggest me some optimizations in the loop to improve
> the overall speed of the following program ?
>
> The steps I am doing are:
> 1. Solve for Y the equation: (IX) Y = B, where X is a lower triangular
> sparse matrix.
>
> 2. Multiply Y with lower triangular sparse matrix Z, A = Y*Z.
> 3. Add A to another vector.
>
> The sizes in the above steps are of order of 100.
Why don't you use dense matrices and fast BLAS (see
http://mathatlas.sourceforge.net/)? This is usually extremly fast for small
matrices (read: smaller than the processor cache)
Without any knowledge of the structure of your matrices, noone can make any
suggestions. The general answer is: Try to find an ordering of rows/columns
such that X and Y are band matrices and use banded BLAS.
> The above mentioned sequence is repeated by 2e9 times. So it is taking too
> much time.
Precompute as much as possible outside your loop. If necessary split the loop
and try to use large dense matrices and fast BLAS. (No, ublas is not fast 
ublas is _reliable_)
> I am sure if I optimize the innermost loop, it can be made fast.
> What optimizations can be made in the loop ?
> int main (int argc, char* argv[])
> {
> unsigned int samples = atoi (argv[1]);
> unsigned int numpaths = atoi (argv[2]);
> unsigned short gatesperpath = atoi (argv[3]);
>
> for (unsigned int i = 0; i < samples; ++i) {
> for (unsigned int j = 0; j < numpaths ; ++ j) {
> mapped_matrix<double> lembda (gatesperpath, gatesperpath);
> identity_matrix<double> ident (gatesperpath);
are these matrices constant and independent of i and j? If so, initialize them
before the loop and use a compressed_matrix.
> slopes = solve (lembda, omegas, lower_tag ());
You mean unit_lower_tag(): Solve (I+lembda) x = omegas?
> vector <double> delays (gatesperpath);
move the construction out off the loop
> delays = prod (lembda, slopes);
> delays += omegas;
try: noalias( delays ) = omegas;
axpy_prod( lambda, slopes, delays, false ); // from operation.hpp
> }
> }
>
> }
mfg
Gunter