Hi Jörn,

Oh how I wish I could use row-major matrices throughout my code but UMFPACK needs column-major matrices.

I'd really like to understand why my urucuc case is difficult though. Here's how I envision it should work. Suppose A is row-major and B is column-major and C=A*B is column-major, all of which are of type compressed_matrix. This makes column slicing in B very efficient. Write B=[b_1, b_2, ..., b_n] where each b_i is a column vector. Then the product is

A*B = A * [b_1, b_2, ..., b_n] = [A*b_1, A*b_2, ..., A*b_n]

where each A*b_i is the i'th column of C. Each such multiplication is a row-major matrix and column vector multiplication which is the ideal arrangement and the result is in column-major format. You can literally push_back entries of A*b_i in order into C and it will be constant time insertion.

I'm going to try to implement this but I'd like to make sure I'm not being stupid and missing some obvious problem.

I've also spent some time thinking about how I could possibly avoid taking this matrix product like you suggested, but I really don't see a way. The system I'm trying to solve looks like

(B + A^T A)x = b

I can't see a way to break this up into two separate solves. A is not square so it's not of full rank. The addition seems absolutely necessary to me.

Best,
George

On Mon, Dec 10, 2012 at 8:52 AM, "Ungermann, Jörn" wrote:

Hi George,

The key is reducing the number of matrix accesses here. It is usually much more effective to search through one matrix only once and perform an actual multiplication for each access to the second.

For row times row one keeps an element of the first matrix (upper left) and multiplies it with the uppermost row of the right matrix to get entries for the toprow of the target matrix.

Key is here to have a random-access matrix for the target matrix, otherwise your mileage may vary wildly depending on the sparsity of the involved matrices.

To speed up the transpose,  try inserting another coordinate_matrix with proper majority, i.e.

Compressed_matrix<col_major> A;

Coordinate_matrix<row_major> AT_prime(trans(A));

Compressed_matrix<row_major> AT(AT_prime);

Please keep in mind that the compressed_matrix type is designed to be only effective for a) reducing memory and b) fast multiplication with (dense) vectors.

Any kind of construction work is usually better performed with coordinate_matrix (my preferred choice) or one of the other random access types contained in ublas.

Cheers,

Jörn

PS: Please bear in mind that one really nearly never needs this kind of product in practice. Actually, people got very creative in finding ways to not need this, because the performance of such products is generally and provably very poor.

From: ublas-bounces@lists.boost.org [mailto:ublas-bounces@lists.boost.org] On Behalf Of George Slavov
Sent: Donnerstag, 6. Dezember 2012 22:47
To: ublas mailing list
Subject: Re: [ublas] Efficiency of product of two sparse matrices

Hi Jörn,

Thanks for your response. I have made some surprising discoveries since I read your email. I looked up your code and tried it out but got no speed boost in my case which is (row major) * (col major) = (col major). By the way, I get a printout ucucuc. Shouldn't there be a case urucuc? I didn't see such a case handled in your code.

The next thing I tried is to produce a copy of trans(A) and place it in a col major compressed matrix, so the product is actually (col major) * (col major) = (col major) and this is lightning fast! We're talking a fraction of a second whereas sparse_prod is about half a second, significantly slower. The majority of time is now spent constructing the transpose matrix but that's still roughly a third of the computation time of the old product. Now I'm trying to figure out if there is a way to construct the transpose in col major format faster, but I have no ideas yet.

I'm really baffled why row major * col major is a bad combination. If I have A*B = C, each entry of C is computed as the inner product of a row of A and a column of B, right? That would be the order in which elements are accessed, so I don't understand. I am really curious about this so any thoughts you can give me would be appreciated.

Best,

George

On Thu, Dec 6, 2012 at 4:47 AM, "Ungermann, Jörn" <j.ungermann@fz-juelich.de> wrote:

Hi George,

this is not a good storage layout for the matrix product.
There are two major costs involved:
1) finding the right entries in A^T and A to multiply with one another.
2) writing the stuff into the target matrix. If this cannot be done
consecutively, performance will seriously suffer from copy operations.

My first tip would be to use a coordinate_matrix as result matrix and only
copy the result to a compressed matrix after the product is done.
Secondly, copying the transpose matrix into a column_major matrix might
help, too, but this depends more on the employed algorithm, and I forgot,
which kernel is used by sparse_prod.

If these tips are not sufficient or you cannot spend the additional memory,
you might want to search the mailing list for a patch proposal of mine that
offers a series of different product kernels suited for a wide range of
matrix types. All of these are at least as efficient as the ublas kernels
and some improve the performance dramatically. The mail contains a list of
measurements for products involving various types.

Either way, you might think about just not doing the matrix product. If you
can avoid it (and one mostly can), this would be the best solution.

If you have further questions, you may contact me.

Cheers,
Jörn

> -----Original Message-----
> From: ublas-bounces@lists.boost.org [mailto:ublas-
> bounces@lists.boost.org] On Behalf Of George Slavov
> Sent: Dienstag, 4. Dezember 2012 22:03
> To: ublas@lists.boost.org
> Subject: [ublas] Efficiency of product of two sparse matrices
>
> In my project I need to form the product A^T A where A is a
> compressed_matrix, but it's proving to be a big bottleneck in my code,
> taking on the order of 10 seconds. The result is also very sparse. The
> matrix has a size of about 13000x13000 and has about 37000 nonzero
> entries. The product of sparse matrices is generally not sparse, but my
> matrix is structured enough that the product has roughly the same
> sparsity as A.
>
> The code I'm using is basically.
>
> typedef ublas::compressed_matrix<T, ublas::column_major, 0,
> ublas::unbounded_array<unsigned int>, ublas::unbounded_array<T> >
> sparse_matrix;
>
> sparse_matrix A(13000, 13000)
> sparse_matrix result(13000, 13000)
> // fill A
> sparse_prod(trans(A), A, result, false)
>
>
> Are there any conditions on the use of sparse_prod which I'm not aware
> of with regard to matrix orientations? My A is column major so the
> transpose is row_major. I would have thought that would be the optimal
> storage to form the product since sums range over rows of A^T and
> columns of A.
>
> Regards,
> George

> _______________________________________________
> ublas mailing list
> ublas@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: j.ungermann@fz-juelich.de

_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: gpslavov@gmail.com

_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: gpslavov@gmail.com