From: Manuel González Castro (iinmgc00_at_[hidden])
Date: 2008-01-22 08:02:18
> Am Montag, 21. Januar 2008 13:55 schrieb Per Abrahamsen:
>> A = B1 + B2 + B3 + C1 + C2 + C3
> Sparse addition is usually done in two steps (see cxsparse addition):
> 1) symbolic sum (count row lengths, estimate size of result, compute
> structure) 2) actual sum (lookups into a fixed structure are fast,
> structure changes are expensive)
An additional optimization can be achived using compressed matrixes with
a) Compute the resulting sparse structure of A
b) Make the sparse pattern of B1,B2...C3 equal to A by adding artificial
non-zeros. In this way, all the involved matrices will have the same
Sum the value_data arrays of B1,B2...C3 with axpy and store the result
in the value_data array of A. No lookup needed, it is just a fast dense
vector addition because the sparse patterns are equal.
This approach consumes more memory, but makes the evaluation faster in
some cases. It depends on your sparse patterns and matrix sizes. I found
this technique to be arround 75 times faster than the plain ublas matrix
addition for my simulations.
> Something like 2 seconds in the solver,
> 20 seconds building the component matrixes, and 3 minutes
> adding them together.
The time needed for building component matrices can also be reduced by
using a proxy for writing matrix elements:
Store in a vector the position in the value_data array corresponding to
each write operation ( B(i,j) = something ) in your matrix building
Write the matrix directly in its value_data array ( value_data[k] =
something ) using the information computed in the preprocessing stage.
This technique requires all the matrix write operations to be performed
in the same order: be carefull if the building routine contains flow
control statement that change this order. I found this technique to be 3
times faster than conventional ublas write operations, which involve
lookup in the compressed matrix sparse pattern. But this result was
obtained for problems with simple building routines: most of the time
was spent in the element access, not in the evaluation of the right term
of the write operation.
If you follow the recommendations posted in the list you may achieve
2 seconds in the solver -> same
20 seconds building the component matrixes -> 20/3 = 7 seconds
3 minutes adding them together -> 180/75 = 3 seconds
TOTAL time = 12 seconds + preprocessing time (difficult to evaluate)