Ublas :

Date view	Thread view	Subject view	Author view

Subject: Re: [ublas] considering the speed of axpy_prod
From: Oswin Krause (Oswin.Krause_at_[hidden])
Date: 2012-01-03 02:56:49

Next message: Oswin Krause: "Re: [ublas] considering the speed of axpy_prod"
Previous message: Oswin Krause: "Re: [ublas] considering the speed of axpy_prod"
In reply to: Umut Tabak: "Re: [ublas] considering the speed of axpy_prod"
Next in thread: Oswin Krause: "Re: [ublas] considering the speed of axpy_prod"

Hi again,

I just had to Benchmark the vector axpy_prod again.

Same Benchmark as before, i just changed the arguments and results to
vectors. I couldn't believe my eyes and checked the results 3 times.

matRows=matColumns = 512
iterations=10000

(1) 22.5s
(2) 20.8s
(3) 9.6s
(4) 2.8s

I conclude:
1. There is no suitable fast implementation of the matrix-vector product
xA with A being row_major in uBLAS at all.
2. axpy_prod is for Ax around factor 8 slower, for xA factor 2
3. Never use axpy_prod for dense vector arguments.
Never. I would appreciate a big fat warning in the documentation.

now the fun thing: use matrices of size 1 instead of vector arguments(old
test with numVectors = 1, same numbers as before):

(1) 3.3s
(2) 9.8s
(3) 9.6s
(4) 2.8s

I would consider this to be a bug.

Greetings,
Oswin

> On 01/02/2012 04:39 PM, Ungermann, JÃ¶rn wrote:
>> Dear Oswin,
>>
>> the matrix-matrix multiplication is not really optimized.
>> Please refer to my mail from 2010 for details
>> http://lists.boost.org/MailArchives/ublas/2010/03/4091.php
>>
>> The performance of the product kernels really depends on the majority of
>> all three involved matrices and becomes *really* complicated, once you
>> take into account all flavours of sparse matrices.
>> It is ridicoulously easy to program a matrix-matrix-multiplication
>> routine that is fast for any given, specific combination of involved
>> matrices, but really, really, ahrd to be performant for a wide range of
>> types and combination with a restricted set of kernels.
>>
>> We went forward and implemented cache-optimal, SSE using routines for
>> our common matrix-vector / matrix-matrix product types (about 2000 LoC,
>> quite fun to do). But this stuff wouldn't fit into uBLAS.
>>
>>
> Dear Joern and Oswin,
>
> Because of these issues, I would like to point out that I completely
> left uBlas except some minor stuff.
>
> I am not advertising MTL4 however that is more intuitive to use and
> easier to interface with external libraries such as Intel MKL for
> dense/sparse matrix operations.
>
> Just as a side note: since I lost too much time with uBlas, I did not
> want someone to experience the same. Take a look at MTL4, I am guessing
> that you will not be disappointed.
>
> Best regards,
> Umut
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Oswin.Krause_at_[hidden]
>

Next message: Oswin Krause: "Re: [ublas] considering the speed of axpy_prod"
Previous message: Oswin Krause: "Re: [ublas] considering the speed of axpy_prod"
In reply to: Umut Tabak: "Re: [ublas] considering the speed of axpy_prod"
Next in thread: Oswin Krause: "Re: [ublas] considering the speed of axpy_prod"

Date view	Thread view	Subject view	Author view