Boost logo

Ublas :

From: Paul C. Leopardi (paul.leopardi_at_[hidden])
Date: 2007-01-22 06:29:35


Hi Gunter,
Reply below,
Best Paul
On Monday 22 January 2007 06:17, Gunter Winkler wrote:
> Paul C. Leopardi schrieb:
> > Hi all,
> > Is the worst case time complexity of ublas::compressed_matrix operations
> > documented anywhere? For addition of square matrices where each of the
> > operands is half full with no non-zeros in common, using operator+=
> >
> > >from Boost 1.33.0 and 1.33.1, I'm getting about O(dim^2) for
> > > ublas::matrix and
> >
> > about O(nnz * dim^2) for ublas::compressed_matrix. Is this what should be
> > expected in this case? Is it the worst case?
>
> The worst case for a compressed matrix is to insert elements from last
> (n,m) to first (0,0) because you essentiallialy insert a new element
> into an array. This means (1) realloc if necessary (in armotized
> constant time) (2) shift all elements one position forward (in linear
> time O(nnz)) (3) insert the new data at position 0. For this operation
> you should use a different storage scheme:
>
> coordinate_matrix: += means appending elements to an array (you need
> more memory later need a sort/compress operation in O(nnz (1 + log(nnz)))).
>
> generalized_vector_of_vector with compressed vectors: element lookup is
> O(log(nnz per row)) and worst case insert is O(nnz per row), average
> should be O(log(nnz per row))
>
> generalized_vector_of_vector with coordinate vectors: += (aka
> v.append_element(k,x) ) is O(1) + final sort.
>
> I use the second one for my finite element assembly.
>
> btw: the construction of a compressed_matrix from a coordinate matrix
> is a O(nnz) operation. This could even be done inplace (but this is not
> implemented, because I have no idea how to do this without too much ugly
> tricks.). Please have a look at the constructors of compressed_matrix.
> (I even have a near optimal implementation of
> compressed_matrix.assign_temporary(coordinate_matrix) in my files.)

OK, I see with compressed_matrix, with the current addition algorithm, we have
worst case complexity like O((nnz(lhs)+nnz(rhs))*nnz(rhs)). But the addition
could use the same storage scheme and a different algorithm and achieve
O(nnz(lhs)+nnz(rhs)), at the expense of some temporary storage.

The way to achieve O(nnz(lhs)+nnz(rhs)) for addition is a 3 step algorithm:
1: Estimate nnz(lhs+rhs). This takes O(1) to O(nnz(lhs)+nnz(rhs)) depending on
whether we want the worst case or an accurate estimate.
2: Allocate storage of size nnz(lhs+rhs) for the result.
3: Iterate over lhs and rhs, adding into the result. The iteration should be
O(nnz(lhs)+nnz(rhs)) assuming that the lhs and rhs storage scheme is suitably
sorted. Insertion into the result should be O(1).

Is there anything I have missed? Is there any reason that this algorithm
cannot be implemented and made optional via the #define and #ifdef mechanism?