Boost logo

Ublas :

From: Gunter Winkler (guwi17_at_[hidden])
Date: 2006-07-05 08:12:30

On Wednesday 05 July 2006 13:35, Nico Galoppo wrote:
> Actually, I'm quite happy with the gvov random fill efficiency (0.09 s
> actually, 0.9 was a typo). :)
> But anyway, for assignment, I first resize the compressed_matrix and then
> use assign:

Ok. Resizing does not do any storage allocation for a sparse type. The
constructor (size1, size2, nnz) preallocates storage for nnz non zeroes. But
this is no real problem because the allocation is automatically done in
"amortized constant time" which means is essentially doubles its storage on
each allocation step. You can explicitly call reserve(nnz, preserve) to
adjust the allocated storage.

you could modify vector_of_vector.hpp around line 80 from

 for (size_type i = 0; i < sizeM; ++ i) // create size1 vector elements
   data_.insert_element (i, vector_data_value_type ()) .resize (sizem, false);


// untested
for (size_type i = 0; i < sizeM; ++ i) { // create size1 vector elements
 vector_data_value_type & ref =
  data_.insert_element (i, vector_data_value_type ()) .resize (sizem, false);
 ref.reserve(non_zeros / sizeM);

> tmp.resize(n,n);
> // do random inserts in tmp
> C.resize(n,n);
> C.assign(tmp);
> This last call takes about 29s. I am not very familiar with ublas
> internals, so I'm not sure how hard it would be for me to change the
> constructor of vector_data_value_type.

This is quite long. You should try to copy the data using C.push_back(i,j,v);
like I did in
The copy time should be (much) less than the fill time.

> On a side note, are you saying that I should consider using
> coordinate_matrix instead of the above approach, because the performance of
> my algorithms will hardly change? Basically, all I'm using the sparse
> matrix for, is a (non-preconditioned) linear CG. The CG algorithm was coded
> by me, using atlas bindings. I'm considering switching to lapack bindings +
> GOTO blas linking.

As long as you only use ublas the CG time of a CRS is about 80% of a COO (for
my example program). Using optimized blas for the inner products does not
help much, because the axpy_prod dominates the total time. If you succeed in
speeding up axpy_prod (see operation.hpp) using (sparse) blas I would be glad
to receive a copy.