Boost logo

Ublas :

From: Paul C. Leopardi (paul.leopardi_at_[hidden])
Date: 2007-01-23 16:41:09


Hi Gunter,
Thanks for your reply. My response is below.
Best, Paul
On Wednesday 24 January 2007 01:57, Gunter Winkler wrote:
> Am Montag, 22. Januar 2007 12:29 schrieb Paul C. Leopardi:
> > OK, I see with compressed_matrix, with the current addition
> > algorithm, we have worst case complexity like
> > O((nnz(lhs)+nnz(rhs))*nnz(rhs)). But the addition could use the same
> > storage scheme and a different algorithm and achieve
> > O(nnz(lhs)+nnz(rhs)), at the expense of some temporary storage.
> >
> > The way to achieve O(nnz(lhs)+nnz(rhs)) for addition is a 3 step
> > algorithm: 1: Estimate nnz(lhs+rhs). This takes O(1) to
> > O(nnz(lhs)+nnz(rhs)) depending on whether we want the worst case or
> > an accurate estimate.
> > 2: Allocate storage of size nnz(lhs+rhs) for the result.
> > 3: Iterate over lhs and rhs, adding into the result. The iteration
> > should be O(nnz(lhs)+nnz(rhs)) assuming that the lhs and rhs storage
> > scheme is suitably sorted. Insertion into the result should be O(1).
>
> Ok. Someone should implement a free function "compressed_plus_assign(A,
> X)" which computes A += X in a better way than now. Then we can think
> about how to automatically dispatch to this function.
>
> I think a good compromise is to do the summation with 2 passes over the
> Elements of A.
> Pass 1: (forward) update all common elements and count the number of
> elements to be inserted. (optional: remove these elements from X)
> Pass 2: (backward) move elements of A (from last to first) to the new
> positions and insert the elements of X accordingly.
>
> (Maybe one shifts the update to the second pass.) The advantage of the
> exact new size is, that we exactly know where each element of A is
> stored after the addition.
>
> What do you think?

Let me see if I understand what you are proposing.

After Pass 1, we know the total length of the new A. I assume tha between Pass
1 and Pass 2 we have reallocated the space for A. Most likely this means that
we have created a new A of the right size and have either:
i. kept the old A; or
ii. copied the elements from the old A to the same places in the new A and
deleted the old A.

From what you have said about Pass 2, I assume that you mean to do ii above.
In Pass 2, we scan the elements of old A and X backwards and insert the
appropriate result in the correct place in new A. The "correct place" is
always one position less than the previous insertion. We scan backwards so
that if old A is stored in the low positions of new A as per ii above, we do
not overwrite any values of old A which we have not yet scanned.
For compressed_matrix, this algorithm should work.

I have repeated my test for mapped_marix and coordinate_matrix as well.
Results follow. They indicate that addition of coordinate_matrix is even
slower than compressed_matrix. I suppose your new algorithm could also be
used for coordinate_matrix?

leopardi_at_linfinit:~/src/test_add> ./test_add 8
Dense: ublas::matrix
Matrix addition test:
Dim = 4, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 8, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 16, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 32, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 64, CPU = 0.03; Inner product = 0; Fill = 0.5
Dim = 128, CPU = 0.14; Inner product = 0; Fill = 0.5
Dim = 256, CPU = 0.87; Inner product = 0; Fill = 0.5
Mapped: ublas::mapped_matrix
Matrix addition test:
Dim = 4, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 8, CPU = 0.03; Inner product = 0; Fill = 0.5
Dim = 16, CPU = 0.09; Inner product = 0; Fill = 0.5
Dim = 32, CPU = 0.35; Inner product = 0; Fill = 0.5
Dim = 64, CPU = 1.37; Inner product = 0; Fill = 0.5
Dim = 128, CPU = 7.70; Inner product = 0; Fill = 0.5
Dim = 256, CPU = 33.03; Inner product = 0; Fill = 0.5
Compressed: ublas::compressed_matrix
Matrix addition test:
Dim = 4, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 8, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 16, CPU = 0.04; Inner product = 0; Fill = 0.5
Dim = 32, CPU = 0.37; Inner product = 0; Fill = 0.5
Dim = 64, CPU = 4.90; Inner product = 0; Fill = 0.5
Dim = 128, CPU = 114.44; Inner product = 0; Fill = 0.5
Dim = 256, CPU = 2160.00; Inner product = 0; Fill = 0.5
Coordinate: ublas::coordinate_matrix
Matrix addition test:
Dim = 4, CPU = 0.00; Inner product = 0; Fill = 0.5
Dim = 8, CPU = 0.06; Inner product = 0; Fill = 0.5
Dim = 16, CPU = 0.53; Inner product = 0; Fill = 0.5
Dim = 32, CPU = 8.26; Inner product = 0; Fill = 0.5
Dim = 64, CPU = 222.00; Inner product = 0; Fill = 0.5
Dim = 128, CPU = 4360.00; Inner product = 0; Fill = 0.5
Dim = 256, CPU = 133870.00; Inner product = 0; Fill = 0.5