|
Ublas : |
Subject: Re: [ublas] uBlas and OpenMP
From: Andrey Asadchev (asadchev_at_[hidden])
Date: 2011-03-28 14:52:37
Hello.
> From:
>
> #ifndef BOOST_UBLAS_USE_DUFF_DEVICE
> for (size_type i = 0; i < size; ++ i)
> functor_type::apply (v (i), e () (i));
> #else
>
> To:
>
> #ifndef BOOST_UBLAS_USE_DUFF_DEVICE
> #pragma omp parallel for
> for (size_type i = 0; i < size; ++ i)
> functor_type::apply (v (i), e () (i));
> #else
>
> The test program was:
>
> #include <boost/numeric/ublas/vector.hpp>
> #include <iostream>
> int main() {
> const std::size_t N=20000000;
> boost::numeric::ublas::vector<double> a(N), b(N), c(N);
> for (std::size_t i=0; i!=500; i++)
> c=a+b;
> std::cout << c(1) << std::endl;
> return 0;
> }
>
....
>
> I didn't run any benchmarks, but top shows 400% utilization on a 4-core i7
> (hence all 4 cores working).
In OpenMP default scheduler is static with size 1 (iirc) which will
cause major major cache waste through false sharing.
Set chunk size to at least 16, N/omp_num_threads is even better
Top is a poor way of measuring performance. try ompP instrumentation
with oprofile.