Ublas :

Date view	Thread view	Subject view	Author view

Subject: Re: [ublas] Matrix multiplication performance
From: Michael Lehn (michael.lehn_at_[hidden])
Date: 2016-01-28 15:47:35

Next message: Karl Meerbergen: "Re: [ublas] Matrix multiplication performance"
Previous message: Riccardo Rossi: "Re: [ublas] Matrix multiplication performance"
In reply to: Riccardo Rossi: "Re: [ublas] Matrix multiplication performance"
Next in thread: Karl Meerbergen: "Re: [ublas] Matrix multiplication performance"
Reply: Karl Meerbergen: "Re: [ublas] Matrix multiplication performance"

On 28 Jan 2016, at 21:15, Riccardo Rossi <rrossi_at_[hidden]> wrote:

> i am impressed. 6* on a cuadcore!!
>

Thanks, but actually two quad cores ;-)

And with more then 6 threads it requires a more fine gained method to scale well. You have to consider
groups-hierarchies of threads. E.g. one group is responsible of packing a block and afterwards multiplying
it multithreaded. At the moment its like one group with to many members.

> do you also do sparse linear algebra by chance?

Sorry, not directly. I just looked at libraries like SuperLU and Umfpack. However, not as close as to other BLAS libraries. But
from my impression this also could be done much more elegant in C++. The big headache in these libraries is that they basically
have the same code for float, double, complex<float> and complex<double> . Just using C++ as "C plus function templates” would
make it much easier. And the performance relevant part in these libraries is again a fast dense BLAS.

Cheers,

Michael

Next message: Karl Meerbergen: "Re: [ublas] Matrix multiplication performance"
Previous message: Riccardo Rossi: "Re: [ublas] Matrix multiplication performance"
In reply to: Riccardo Rossi: "Re: [ublas] Matrix multiplication performance"
Next in thread: Karl Meerbergen: "Re: [ublas] Matrix multiplication performance"
Reply: Karl Meerbergen: "Re: [ublas] Matrix multiplication performance"

Date view	Thread view	Subject view	Author view