Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] SIMD implementation of uBLAS
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2013-05-29 10:58:25

Next message: Daniel Pfeifer: "Re: [boost] [gsoc] Getting started with a repository (Was: Re: Live read-only GIT mirrors of Boost trunk SVN)"
Previous message: David Bellot: "Re: [boost] SIMD implementation of uBLAS"
In reply to: Mathias Gaunard: "Re: [boost] SIMD implementation of uBLAS"
Next in thread: David Bellot: "Re: [boost] SIMD implementation of uBLAS"

That is one of the core purposes of the GSOC project. To provide fast
algorithms especially for items like matrix-matrix multiplications and
not to optimize the whole infrastructure.

Regarding the simple cases you mean that on your compiler uBLAS is
slower for example from Eigen on this piece of code?

#include <iostream>
#include <chrono>
#include <Eigen/Dense>
#include <boost/numeric/ublas/matrix.hpp>

using boost::numeric::ublas::noalias;

std::chrono::high_resolution_clock::time_point now() {
return std::chrono::high_resolution_clock::now();
}

double duration_since( const
std::chrono::high_resolution_clock::time_point &since) {
return std::chrono::duration_cast<std::chrono::microseconds>(now()
- since).count();
}

typedef double value_type;
typedef boost::numeric::ublas::matrix<value_type> ublas_matrix_type;
typedef Eigen::Matrix<value_type, Eigen::Dynamic, Eigen::Dynamic>
eigen_matrix_type;

#define SIZE 200
#define ITERATIONS 3000

int main() {

eigen_matrix_type EA(SIZE,SIZE), EB(SIZE,SIZE), EC(SIZE,SIZE),
ED(SIZE,SIZE);
ublas_matrix_type UA(SIZE,SIZE), UB(SIZE,SIZE), UC(SIZE,SIZE),
UD(SIZE,SIZE);

for( auto i=0; i!=SIZE; i++) for( auto j=0; j!=SIZE; j++){
         EB(i,j)=i+3*j; EC(i,j)=i+5*j+2; ED(i,j)=2*i+3*j;
         UB(i,j)=i+3*j; UC(i,j)=i+5*j+2; UD(i,j)=2*i+3*j;
     }

auto start = now();
for (auto i=0; i!=ITERATIONS; i++) EA.noalias() += 2*EB+3*(EC+ED);
auto dur = (double)duration_since(start)/1000;
std::cout << EA(SIZE-1,SIZE-1) << " Duration EIGEN: " << dur << " msec\n";

start = now();
for (auto i=0; i!=ITERATIONS; i++) noalias(UA) += 2*UB+3*(UC+UD);
dur = (double)duration_since(start)/1000;
std::cout << UA(SIZE-1,SIZE-1) << " Duration uBLAS: " << dur << " msec\n";

return 0;
}

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.7.2-2ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.7 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-gnu-unique-object --enable-plugin --enable-objc-gc
--disable-werror --with-arch-32=i686 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu

$ g++ -DNDEBUG -O3 -std=c++0x main.cpp -o benchmarks

$ ./benchmarks
2.4495e+07 Duration EIGEN: 160.901 msec
2.4495e+07 Duration uBLAS: 160.86 msec
$ ./benchmarks
2.4495e+07 Duration EIGEN: 165.348 msec
2.4495e+07 Duration uBLAS: 168.003 msec
./benchmarks
2.4495e+07 Duration EIGEN: 161.826 msec
2.4495e+07 Duration uBLAS: 160.674 msec

Best regards,
Nasos

On 05/29/2013 09:59 AM, Mathias Gaunard wrote:
> On 29/05/13 15:00, Nasos Iliopoulos wrote:
>
>> We are also seeking ways of making the uBLAS expression templates more
>> transparent to the compiler so that auto-vectorization can kick in -
>> which it does in certain cases and provides a very nice performance
>> boost on par with explicitly vectorized libraries.
>>
>> As a matter of fact I am surprised by the progress of the compilers
>> auto-vectorization facilities the last few years, that make me -doubt-
>> the need for explicit vectorization any more. The GSOC project will make
>> it clear for us. An added benefit on relying on compiler is that future
>> vector instructions come for free. A disadvantage is of course the
>> non-guarantee that auto-vectorization will work but I find this rarely
>> the case.
>
> Yet according to a variety of benchmarks, performance of uBLAS is very
> bad when compared to other similar libraries (Eigen, Armadillo,
> Blitz++, Blaze, or even our own library NT2) even for simple cases and
> with aggressive optimization settings.
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost

Next message: Daniel Pfeifer: "Re: [boost] [gsoc] Getting started with a repository (Was: Re: Live read-only GIT mirrors of Boost trunk SVN)"
Previous message: David Bellot: "Re: [boost] SIMD implementation of uBLAS"
In reply to: Mathias Gaunard: "Re: [boost] SIMD implementation of uBLAS"
Next in thread: David Bellot: "Re: [boost] SIMD implementation of uBLAS"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk