|
Ublas : |
Subject: [ublas] uBlas and OpenMP
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2011-03-28 14:41:59
Hello all,
As this issue was raised in this list, I run some tests with uBlas and OpenMP. For anyone interested in adding OpenMP functionality to uBlas I think the findings below will be a good starting point.
As uBlas indirects the calls of assignment operator to certain functions, what needs to be changed are the for loops in those functions. So in this test I added the openMP clause at line 264 in uBlas' vector_assign.hpp (rev. 70666) :
From:
#ifndef BOOST_UBLAS_USE_DUFF_DEVICE
for (size_type i = 0; i < size; ++ i)
functor_type::apply (v (i), e () (i));
#else
To:
#ifndef BOOST_UBLAS_USE_DUFF_DEVICE
#pragma omp parallel for
for (size_type i = 0; i < size; ++ i)
functor_type::apply (v (i), e () (i));
#else
The test program was:
#include <boost/numeric/ublas/vector.hpp>
#include <iostream>
int main() {
const std::size_t N=20000000;
boost::numeric::ublas::vector<double> a(N), b(N), c(N);
for (std::size_t i=0; i!=500; i++)
c=a+b;
std::cout << c(1) << std::endl;
return 0;
}
that was compiled with the following gcc flags (amongst typical flags):
-O2 -DNDEBUG -fopenmp -msse3 -mfpmath=sse -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=1
and linked with (amongst typical flags):
-fopenmp
Before executing the test, one has to set the OMP_NUM_THREADS environment variable. i.e. for 4 cores on an ubuntu machine (bash):
export OMP_NUM_THREADS=4
I didn't run any benchmarks, but top shows 400% utilization on a 4-core i7 (hence all 4 cores working).
Furthermore inspecting the assembly code reveals that the above compilation results into SSE instructions for the addition. (GCC 4.4.3):
movsd (%rbp,%rax), %xmm1
movsd (%rbx,%rax), %xmm2
movhpd 8(%rbp,%rax), %xmm1
movhpd 8(%rbx,%rax), %xmm2
movapd %xmm1, %xmm0
addpd %xmm2, %xmm0
movapd %xmm0, (%r12,%rax)
I hope that gives a clue to someone with some free time to make some uBlas patches. Please let me know if those instructions are enough to get a basic parallel example running in uBlas.
Best,
Nasos