Subject: Re: [ublas] considering the speed of axpy_prod
From: Umut Tabak (u.tabak_at_[hidden])
Date: 2012-01-02 10:44:50
On 01/02/2012 04:39 PM, Ungermann, Jörn wrote:
> Dear Oswin,
> the matrix-matrix multiplication is not really optimized.
> Please refer to my mail from 2010 for details
> The performance of the product kernels really depends on the majority of all three involved matrices and becomes *really* complicated, once you take into account all flavours of sparse matrices.
> It is ridicoulously easy to program a matrix-matrix-multiplication routine that is fast for any given, specific combination of involved matrices, but really, really, ahrd to be performant for a wide range of types and combination with a restricted set of kernels.
> We went forward and implemented cache-optimal, SSE using routines for our common matrix-vector / matrix-matrix product types (about 2000 LoC, quite fun to do). But this stuff wouldn't fit into uBLAS.
Dear Joern and Oswin,
Because of these issues, I would like to point out that I completely
left uBlas except some minor stuff.
I am not advertising MTL4 however that is more intuitive to use and
easier to interface with external libraries such as Intel MKL for
dense/sparse matrix operations.
Just as a side note: since I lost too much time with uBlas, I did not
want someone to experience the same. Take a look at MTL4, I am guessing
that you will not be disappointed.