
Ublas : 
Subject: Re: [ublas] Efficiency of product of two sparse matrices
From: George Slavov (gpslavov_at_[hidden])
Date: 20121211 12:48:58
Hi JÃ¶rn,
Oh how I wish I could use rowmajor matrices throughout my code but UMFPACK
needs columnmajor matrices.
I'd really like to understand why my urucuc case is difficult though.
Here's how I envision it should work. Suppose A is rowmajor and B is
columnmajor and C=A*B is columnmajor, all of which are of type
compressed_matrix. This makes column slicing in B very efficient. Write
B=[b_1, b_2, ..., b_n] where each b_i is a column vector. Then the product
is
A*B = A * [b_1, b_2, ..., b_n] = [A*b_1, A*b_2, ..., A*b_n]
where each A*b_i is the i'th column of C. Each such multiplication is a
rowmajor matrix and column vector multiplication which is the ideal
arrangement and the result is in columnmajor format. You can literally
push_back entries of A*b_i in order into C and it will be constant time
insertion.
I'm going to try to implement this but I'd like to make sure I'm not being
stupid and missing some obvious problem.
I've also spent some time thinking about how I could possibly avoid taking
this matrix product like you suggested, but I really don't see a way. The
system I'm trying to solve looks like
(B + A^T A)x = b
I can't see a way to break this up into two separate solves. A is not
square so it's not of full rank. The addition seems absolutely necessary to
me.
Best,
George
On Mon, Dec 10, 2012 at 8:52 AM, "Ungermann, JÃ¶rn" <
j.ungermann_at_[hidden]> wrote:
> Hi George,****
>
> ** **
>
> The key is reducing the number of matrix accesses here. It is usually much
> more effective to search through one matrix only once and perform an actual
> multiplication for each access to the second.****
>
> For row times row one keeps an element of the first matrix (upper left)
> and multiplies it with the uppermost row of the right matrix to get entries
> for the toprow of the target matrix.****
>
> Key is here to have a randomaccess matrix for the target matrix,
> otherwise your mileage may vary wildly depending on the sparsity of the
> involved matrices.****
>
> ** **
>
> To speed up the transpose, try inserting another coordinate_matrix with
> proper majority, i.e.****
>
> Compressed_matrix<col_major> A;****
>
> Coordinate_matrix<row_major> AT_prime(trans(A));****
>
> Compressed_matrix<row_major> AT(AT_prime);****
>
> ** **
>
> Please keep in mind that the compressed_matrix type is designed to be only
> effective for a) reducing memory and b) fast multiplication with (dense)
> vectors.****
>
> Any kind of construction work is usually better performed with
> coordinate_matrix (my preferred choice) or one of the other random access
> types contained in ublas.****
>
> ** **
>
> Cheers,****
>
> JÃ¶rn****
>
> ** **
>
> PS: Please bear in mind that one really nearly never needs this kind of
> product in practice. Actually, people got very creative in finding ways to
> not need this, because the performance of such products is generally and
> provably very poor.****
>
> ** **
>
> *From:* ublasbounces_at_[hidden] [mailto:
> ublasbounces_at_[hidden]] *On Behalf Of *George Slavov
> *Sent:* Donnerstag, 6. Dezember 2012 22:47
> *To:* ublas mailing list
> *Subject:* Re: [ublas] Efficiency of product of two sparse matrices****
>
> ** **
>
> Hi JÃ¶rn,****
>
> ** **
>
> Thanks for your response. I have made some surprising discoveries since I
> read your email. I looked up your code and tried it out but got no speed
> boost in my case which is (row major) * (col major) = (col major). By the
> way, I get a printout ucucuc. Shouldn't there be a case urucuc? I didn't
> see such a case handled in your code.****
>
> ** **
>
> The next thing I tried is to produce a copy of trans(A) and place it in a
> col major compressed matrix, so the product is actually (col major) * (col
> major) = (col major) and this is lightning fast! We're talking a fraction
> of a second whereas sparse_prod is about half a second, significantly
> slower. The majority of time is now spent constructing the transpose matrix
> but that's still roughly a third of the computation time of the old
> product. Now I'm trying to figure out if there is a way to construct the
> transpose in col major format faster, but I have no ideas yet.****
>
> ** **
>
> I'm really baffled why row major * col major is a bad combination. If I
> have A*B = C, each entry of C is computed as the inner product of a row of
> A and a column of B, right? That would be the order in which elements are
> accessed, so I don't understand. I am really curious about this so any
> thoughts you can give me would be appreciated.****
>
> ** **
>
> Best,****
>
> George****
>
> ** **
>
> On Thu, Dec 6, 2012 at 4:47 AM, "Ungermann, JÃ¶rn" <
> j.ungermann_at_[hidden]> wrote:****
>
> Hi George,
>
> this is not a good storage layout for the matrix product.
> There are two major costs involved:
> 1) finding the right entries in A^T and A to multiply with one another.
> 2) writing the stuff into the target matrix. If this cannot be done
> consecutively, performance will seriously suffer from copy operations.
>
> My first tip would be to use a coordinate_matrix as result matrix and only
> copy the result to a compressed matrix after the product is done.
> Secondly, copying the transpose matrix into a column_major matrix might
> help, too, but this depends more on the employed algorithm, and I forgot,
> which kernel is used by sparse_prod.
>
> If these tips are not sufficient or you cannot spend the additional memory,
> you might want to search the mailing list for a patch proposal of mine that
> offers a series of different product kernels suited for a wide range of
> matrix types. All of these are at least as efficient as the ublas kernels
> and some improve the performance dramatically. The mail contains a list of
> measurements for products involving various types.
>
> Either way, you might think about just not doing the matrix product. If you
> can avoid it (and one mostly can), this would be the best solution.
>
> If you have further questions, you may contact me.
>
> Cheers,
> JÃ¶rn****
>
>
>
>
> > Original Message
> > From: ublasbounces_at_[hidden] [mailto:ublas
> > bounces_at_[hidden]] On Behalf Of George Slavov
> > Sent: Dienstag, 4. Dezember 2012 22:03
> > To: ublas_at_[hidden]
> > Subject: [ublas] Efficiency of product of two sparse matrices
> >
> > In my project I need to form the product A^T A where A is a
> > compressed_matrix, but it's proving to be a big bottleneck in my code,
> > taking on the order of 10 seconds. The result is also very sparse. The
> > matrix has a size of about 13000x13000 and has about 37000 nonzero
> > entries. The product of sparse matrices is generally not sparse, but my
> > matrix is structured enough that the product has roughly the same
> > sparsity as A.
> >
> > The code I'm using is basically.
> >
> > typedef ublas::compressed_matrix<T, ublas::column_major, 0,
> > ublas::unbounded_array<unsigned int>, ublas::unbounded_array<T> >
> > sparse_matrix;
> >
> > sparse_matrix A(13000, 13000)
> > sparse_matrix result(13000, 13000)
> > // fill A
> > sparse_prod(trans(A), A, result, false)
> >
> >
> > Are there any conditions on the use of sparse_prod which I'm not aware
> > of with regard to matrix orientations? My A is column major so the
> > transpose is row_major. I would have thought that would be the optimal
> > storage to form the product since sums range over rows of A^T and
> > columns of A.
> >
> > Regards,
> > George****
>
> > _______________________________________________
> > ublas mailing list
> > ublas_at_[hidden]
> > http://lists.boost.org/mailman/listinfo.cgi/ublas
> > Sent to: j.ungermann_at_[hidden]
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: gpslavov_at_[hidden]****
>
> ** **
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: gpslavov_at_[hidden]
>