|
Ublas : |
Subject: Re: [ublas] Matrix multiplication performance
From: Oswin Krause (Oswin.Krause_at_[hidden])
Date: 2016-01-24 04:18:12
Hi,
I would still vote for the route to rewrite uBLAS based on BLAS bindings
and providing a reasonable default implementation that also works well
without memory assumptions.The main reason is that only having a fast
gemm implementation does not really improve things, given that BLAS
level 3 is a quite large beast.
Im still willing to donate my partial uBLAS rewrite, unfortunately I am
a bit short on time to polish it(just finished my phd and have a huge
load of work on my desk). But if someone opened a git-branch for that i
could try to make the code ready (porting my implementation back to
boost namespaces etc).
On 2016-01-23 18:53, palik imre wrote:
> Hi All,
>
> what's next? I mean what is the development process for ublas?
>
> Now we have a C-like implementation that outperforms both the
> mainline, and the branch version (axpy_prod). What will we do with
> that?
>
> As far as I see we have the following options:
>
> 1) Create a C++ template magic implementation out of it. But for
> this, at the least we would need compile-time access to the target
> instruction set. Any idea how to do that?
>
> 2) Create a compiled library implementation out of it, and choose the
> implementation run-time based on the CPU capabilities.
>
> 3) Include some good defaults/defines, and hope the user will use
> them.
>
> 4) Don't include it, and do something completely different.
>
> What do you think?
>
> Cheers,
>
> Imre
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Oswin.Krause_at_[hidden]