Dear Oswin,
           while you are for sure right on the performance side, you should recognise that linking to blas is the proverbial "pain in the ass". ever tried to do it in windows 64 without having a commercial fortran compiler? try, enjoy and report...

on my side ublas is really good as it allows u doing operations with small matrices (either of fixed size or of variable size but still small) in a simple and portable way.

such operations should use effectively the cache and hence NOT be memory bound. for this reason eigen, blitz++, etc can be faster than blas...

i really wish there was some work in having ublas to be competitive with eigen for such small matrices. my only point is that i would NOT mix the concept of vectors and of matrices.


furthermore if i was to wish something than i would really love to have the possibility to have such small matrices as elements of a CSR matrix... then you would have a computation bound spmv (or spmm) which would be very nice for many applications.

anyhow...
greetings to everyone

Riccardo




On Tue, Mar 26, 2013 at 8:56 PM, oswin krause <oswin.krause@ruhr-uni-bochum.de> wrote:
Hi,

there is one more thing i want to comment, and this is on the more serious side:


On 23.03.2013 16:15, Nasos Iliopoulos wrote:
David,
Since mdsd:array is a generic multi-dimensional container it is not bound to algebraic operations. I expect that with proper aligned memory allocation and SSE aglorithms (It is easy to add a custom storage container that supports that) it will be as fast as MKL, GotoBLAS, Eigen or armadillo. I believe that within that context, a GSOC project will need to include both the matrix container and the SSE algorithms tasks, or even AVX. (http://en.wikipedia.org/wiki/Advanced_Vector_Extensions)


and also the starting post from David itself:


On 23.03.2013 13:47, David Bellot wrote:
OK, the idea behind this is to have a clean framework to enable optimization based on SSE, Neon, multi-core, ... you name it.

Just to make this clear: in the current state of the library, SSE, AVX, multi core computation etc won't cut it as soon as the arguments involved are bigger than ~32KB. In this case, uBLAS performance is memory bound.Thus we will only wait more efficient for the next block of memory. And even if it were not, the way ublas is designed makes it impossible to use vectorization aside from the c-style functions like axpy_prod, which can in 99% of all relevant cases be mapped on  BLAS2/BLAS3 calls of the optimized C libraries(which give you AVX/SSE and OpenMP for free). If you expect that SSE helps you when computing your

A+=prod(B,C);

than you will be desperately disappointed in the current design.

Now maybe some of you are thinking: "But all fast linear algebra libraries are using SSE, so you must be wrong". Simple answer: these libraries are not memory bound as they optimize for that (you can experience this yourself by comparing the performance of copying a big matrix to transposing it. Than try the transposition block-wise: allocate a small buffer, say 16x16 elements, and than read 16x16 blocks from the matrix, write them transposed into the buffer and than copy the buffer to the correct spot in the target matrix. this gives a factor 7 speed-up on my machine. no SSE, no AVX.).

Don't trust me, trust the writer of the gotoblas library:

Goto, Kazushige, and Robert A. Geijn. "Anatomy of high-performance matrix multiplication." ACM Transactions on Mathematical Software (TOMS) 34.3 (2008): 12.

We all don't have enough time to implement fast linear algebra algorithms. Instead we should fall back to the numeric bindings as often as possible and use the power of expression templates to generate an optimal sequence of BLAS2/BLAS3 calls.

I would also like to part in that if it happens.

Greetings,
Oswin


_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: rrossi@cimne.upc.edu



--

Dr. Riccardo Rossi, Civil Engineer

Member of Kratos Team

International Center for Numerical Methods in Engineering - CIMNE
Campus Norte, Edificio C1
 

c/ Gran Capitán s/n

08034 Barcelona, España

Tel:        (+34) 93 401 56 96

Fax:       (+34) 93.401.6517

web:       www.cimne.com