Ublas :

Date view	Thread view	Subject view	Author view

Subject: Re: [ublas] GSOC 2013
From: Riccardo Rossi (rrossi_at_[hidden])
Date: 2013-03-26 17:06:42

Next message: Hoang Giang Bui: "Re: [ublas] GSOC 2013"
Previous message: oswin krause: "Re: [ublas] GSOC 2013"
In reply to: oswin krause: "Re: [ublas] GSOC 2013"
Next in thread: Hoang Giang Bui: "Re: [ublas] GSOC 2013"
Reply: Hoang Giang Bui: "Re: [ublas] GSOC 2013"

Dear Oswin,
while you are for sure right on the performance side, you should
recognise that linking to blas is the proverbial "pain in the ass". ever
tried to do it in windows 64 without having a commercial fortran compiler?
try, enjoy and report...

on my side ublas is really good as it allows u doing operations with small
matrices (either of fixed size or of variable size but still small) in a
simple and portable way.

such operations should use effectively the cache and hence NOT be memory
bound. for this reason eigen, blitz++, etc can be faster than blas...

i really wish there was some work in having ublas to be competitive with
eigen for such small matrices. my only point is that i would NOT mix the
concept of vectors and of matrices.

furthermore if i was to wish something than i would really love to have the
possibility to have such small matrices as elements of a CSR matrix... then
you would have a computation bound spmv (or spmm) which would be very nice
for many applications.

anyhow...
greetings to everyone

Riccardo

On Tue, Mar 26, 2013 at 8:56 PM, oswin krause <
oswin.krause_at_[hidden]> wrote:

> Hi,
>
> there is one more thing i want to comment, and this is on the more serious
> side:
>
>
> On 23.03.2013 16:15, Nasos Iliopoulos wrote:
>
> David,
> Since mdsd:array is a generic multi-dimensional container it is not bound
> to algebraic operations. I expect that with proper aligned memory
> allocation and SSE aglorithms (It is easy to add a custom storage container
> that supports that) it will be as fast as MKL, GotoBLAS, Eigen or
> armadillo. I believe that within that context, a GSOC project will need to
> include both the matrix container and the SSE algorithms tasks, or even
> AVX. (http://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
>
>
> and also the starting post from David itself:
>
>
> On 23.03.2013 13:47, David Bellot wrote:
>
> OK, the idea behind this is to have a clean framework to enable
> optimization based on SSE, Neon, multi-core, ... you name it.
>
>
> Just to make this clear: in the current state of the library, SSE, AVX,
> multi core computation etc won't cut it as soon as the arguments involved
> are bigger than ~32KB. In this case, uBLAS performance is memory bound.Thus
> we will only wait more efficient for the next block of memory. And even if
> it were not, the way ublas is designed makes it impossible to use
> vectorization aside from the c-style functions like axpy_prod, which can in
> 99% of all relevant cases be mapped on BLAS2/BLAS3 calls of the optimized
> C libraries(which give you AVX/SSE and OpenMP for free). If you expect that
> SSE helps you when computing your
>
> A+=prod(B,C);
>
> than you will be desperately disappointed in the current design.
>
> Now maybe some of you are thinking: "But all fast linear algebra libraries
> are using SSE, so you must be wrong". Simple answer: these libraries are
> not memory bound as they optimize for that (you can experience this
> yourself by comparing the performance of copying a big matrix to
> transposing it. Than try the transposition block-wise: allocate a small
> buffer, say 16x16 elements, and than read 16x16 blocks from the matrix,
> write them transposed into the buffer and than copy the buffer to the
> correct spot in the target matrix. this gives a factor 7 speed-up on my
> machine. no SSE, no AVX.).
>
> Don't trust me, trust the writer of the gotoblas library:
>
> Goto, Kazushige, and Robert A. Geijn. "Anatomy of high-performance matrix
> multiplication." *ACM Transactions on Mathematical Software (TOMS)* 34.3
> (2008): 12.
>
> We all don't have enough time to implement fast linear algebra algorithms.
> Instead we should fall back to the numeric bindings as often as possible
> and use the power of expression templates to generate an optimal sequence
> of BLAS2/BLAS3 calls.
>
> I would also like to part in that if it happens.
>
> Greetings,
> Oswin
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: rrossi_at_[hidden]
>

-- 
Dr. Riccardo Rossi, Civil Engineer
Member of Kratos Team
International Center for Numerical Methods in Engineering - CIMNE
Campus Norte, Edificio C1
c/ Gran CapitÃ¡n s/n
08034 Barcelona, EspaÃ±a
Tel:        (+34) 93 401 56 96
Fax:       (+34) 93.401.6517
web:       www.cimne.com

text/html attachment: attachment

Next message: Hoang Giang Bui: "Re: [ublas] GSOC 2013"
Previous message: oswin krause: "Re: [ublas] GSOC 2013"
In reply to: oswin krause: "Re: [ublas] GSOC 2013"
Next in thread: Hoang Giang Bui: "Re: [ublas] GSOC 2013"
Reply: Hoang Giang Bui: "Re: [ublas] GSOC 2013"

Date view	Thread view	Subject view	Author view