
Ublas : 
Subject: Re: [ublas] considering the speed of axpy_prod
From: Umut Tabak (u.tabak_at_[hidden])
Date: 20120102 10:44:50
On 01/02/2012 04:39 PM, Ungermann, Jörn wrote:
> Dear Oswin,
>
> the matrixmatrix multiplication is not really optimized.
> Please refer to my mail from 2010 for details
> http://lists.boost.org/MailArchives/ublas/2010/03/4091.php
>
> The performance of the product kernels really depends on the majority of all three involved matrices and becomes *really* complicated, once you take into account all flavours of sparse matrices.
> It is ridicoulously easy to program a matrixmatrixmultiplication routine that is fast for any given, specific combination of involved matrices, but really, really, ahrd to be performant for a wide range of types and combination with a restricted set of kernels.
>
> We went forward and implemented cacheoptimal, SSE using routines for our common matrixvector / matrixmatrix product types (about 2000 LoC, quite fun to do). But this stuff wouldn't fit into uBLAS.
>
>
Dear Joern and Oswin,
Because of these issues, I would like to point out that I completely
left uBlas except some minor stuff.
I am not advertising MTL4 however that is more intuitive to use and
easier to interface with external libraries such as Intel MKL for
dense/sparse matrix operations.
Just as a side note: since I lost too much time with uBlas, I did not
want someone to experience the same. Take a look at MTL4, I am guessing
that you will not be disappointed.
Best regards,
Umut