Have you checked out this: http://www.boost.org/doc/libs/1_42_0/libs/numeric/ublas/doc/operations_overview.htm#speed
I guess your case was more or less mentioned there.

-Tito.


On 6 December 2012 22:46, George Slavov <gpslavov@gmail.com> wrote:
Hi Jörn,

Thanks for your response. I have made some surprising discoveries since I read your email. I looked up your code and tried it out but got no speed boost in my case which is (row major) * (col major) = (col major). By the way, I get a printout ucucuc. Shouldn't there be a case urucuc? I didn't see such a case handled in your code.

The next thing I tried is to produce a copy of trans(A) and place it in a col major compressed matrix, so the product is actually (col major) * (col major) = (col major) and this is lightning fast! We're talking a fraction of a second whereas sparse_prod is about half a second, significantly slower. The majority of time is now spent constructing the transpose matrix but that's still roughly a third of the computation time of the old product. Now I'm trying to figure out if there is a way to construct the transpose in col major format faster, but I have no ideas yet.

I'm really baffled why row major * col major is a bad combination. If I have A*B = C, each entry of C is computed as the inner product of a row of A and a column of B, right? That would be the order in which elements are accessed, so I don't understand. I am really curious about this so any thoughts you can give me would be appreciated.

Best,
George


On Thu, Dec 6, 2012 at 4:47 AM, "Ungermann, Jörn" <j.ungermann@fz-juelich.de> wrote:
Hi George,

this is not a good storage layout for the matrix product.
There are two major costs involved:
1) finding the right entries in A^T and A to multiply with one another.
2) writing the stuff into the target matrix. If this cannot be done
consecutively, performance will seriously suffer from copy operations.

My first tip would be to use a coordinate_matrix as result matrix and only
copy the result to a compressed matrix after the product is done.
Secondly, copying the transpose matrix into a column_major matrix might
help, too, but this depends more on the employed algorithm, and I forgot,
which kernel is used by sparse_prod.

If these tips are not sufficient or you cannot spend the additional memory,
you might want to search the mailing list for a patch proposal of mine that
offers a series of different product kernels suited for a wide range of
matrix types. All of these are at least as efficient as the ublas kernels
and some improve the performance dramatically. The mail contains a list of
measurements for products involving various types.

Either way, you might think about just not doing the matrix product. If you
can avoid it (and one mostly can), this would be the best solution.

If you have further questions, you may contact me.

Cheers,
Jörn



> -----Original Message-----
> From: ublas-bounces@lists.boost.org [mailto:ublas-
> bounces@lists.boost.org] On Behalf Of George Slavov
> Sent: Dienstag, 4. Dezember 2012 22:03
> To: ublas@lists.boost.org
> Subject: [ublas] Efficiency of product of two sparse matrices
>
> In my project I need to form the product A^T A where A is a
> compressed_matrix, but it's proving to be a big bottleneck in my code,
> taking on the order of 10 seconds. The result is also very sparse. The
> matrix has a size of about 13000x13000 and has about 37000 nonzero
> entries. The product of sparse matrices is generally not sparse, but my
> matrix is structured enough that the product has roughly the same
> sparsity as A.
>
> The code I'm using is basically.
>
> typedef ublas::compressed_matrix<T, ublas::column_major, 0,
> ublas::unbounded_array<unsigned int>, ublas::unbounded_array<T> >
> sparse_matrix;
>
> sparse_matrix A(13000, 13000)
> sparse_matrix result(13000, 13000)
> // fill A
> sparse_prod(trans(A), A, result, false)
>
>
> Are there any conditions on the use of sparse_prod which I'm not aware
> of with regard to matrix orientations? My A is column major so the
> transpose is row_major. I would have thought that would be the optimal
> storage to form the product since sums range over rows of A^T and
> columns of A.
>
> Regards,
> George