|
Ublas : |
From: Gunter Winkler (guwi17_at_[hidden])
Date: 2007-03-02 07:43:16
On Friday 02 March 2007 13:21, Sourabh wrote:
> Hi,
> Can anyone please suggest me some optimizations in the loop to improve
> the overall speed of the following program ?
>
> The steps I am doing are:
> 1. Solve for Y the equation: (I-X) Y = B, where X is a lower triangular
> sparse matrix.
>
> 2. Multiply Y with lower triangular sparse matrix Z, A = Y*Z.
> 3. Add A to another vector.
>
> The sizes in the above steps are of order of 100.
Why don't you use dense matrices and fast BLAS (see
http://math-atlas.sourceforge.net/)? This is usually extremly fast for small
matrices (read: smaller than the processor cache)
Without any knowledge of the structure of your matrices, no-one can make any
suggestions. The general answer is: Try to find an ordering of rows/columns
such that X and Y are band matrices and use banded BLAS.
> The above mentioned sequence is repeated by 2e9 times. So it is taking too
> much time.
Precompute as much as possible outside your loop. If necessary split the loop
and try to use large dense matrices and fast BLAS. (No, ublas is not fast -
ublas is _reliable_)
> I am sure if I optimize the inner-most loop, it can be made fast.
> What optimizations can be made in the loop ?
> int main (int argc, char* argv[])
> {
> unsigned int samples = atoi (argv[1]);
> unsigned int numpaths = atoi (argv[2]);
> unsigned short gatesperpath = atoi (argv[3]);
>
> for (unsigned int i = 0; i < samples; ++i) {
> for (unsigned int j = 0; j < numpaths ; ++ j) {
> mapped_matrix<double> lembda (gatesperpath, gatesperpath);
> identity_matrix<double> ident (gatesperpath);
are these matrices constant and independent of i and j? If so, initialize them
before the loop and use a compressed_matrix.
> slopes = solve (lembda, omegas, lower_tag ());
You mean unit_lower_tag(): Solve (I+lembda) x = omegas?
> vector <double> delays (gatesperpath);
move the construction out off the loop
> delays = prod (lembda, slopes);
> delays += omegas;
try: noalias( delays ) = omegas;
axpy_prod( lambda, slopes, delays, false ); // from operation.hpp
> }
> }
>
> }
mfg
Gunter