Boost logo

Ublas :

Subject: Re: [ublas] uBlas and OpenMP
From: Andrey Asadchev (asadchev_at_[hidden])
Date: 2011-03-28 14:52:37


Hello.

> From:
>
> #ifndef BOOST_UBLAS_USE_DUFF_DEVICE
>         for (size_type i = 0; i < size; ++ i)
>             functor_type::apply (v (i), e () (i));
> #else
>
> To:
>
> #ifndef BOOST_UBLAS_USE_DUFF_DEVICE
>         #pragma omp parallel for
>         for (size_type i = 0; i < size; ++ i)
>             functor_type::apply (v (i), e () (i));
> #else
>
> The test program was:
>
> #include <boost/numeric/ublas/vector.hpp>
> #include <iostream>
> int main() {
>     const std::size_t N=20000000;
>     boost::numeric::ublas::vector<double> a(N), b(N), c(N);
>     for (std::size_t i=0; i!=500; i++)
>         c=a+b;
>     std::cout << c(1) << std::endl;
>     return 0;
> }
>

....
>
> I didn't run any benchmarks, but top shows 400% utilization on a 4-core i7
> (hence all 4 cores working).

In OpenMP default scheduler is static with size 1 (iirc) which will
cause major major cache waste through false sharing.
Set chunk size to at least 16, N/omp_num_threads is even better
Top is a poor way of measuring performance. try ompP instrumentation
with oprofile.