|
Ublas : |
Subject: Re: [ublas] uBlas and OpenMP
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2011-03-28 15:40:27
Great to know!
I don't feel that the example I posted earlier is by any means complete, optimized or whatever. I just pointed out the locations in uBlas one may want to start looking to add OpenMP parallelization into it.
In any case if you have any further experience to share it would be very useful!
Thanks,
Nasos
On Mar 28, 2011, at 2:52 PM, Andrey Asadchev wrote:
> Hello.
>
>> From:
>>
>> #ifndef BOOST_UBLAS_USE_DUFF_DEVICE
>> for (size_type i = 0; i < size; ++ i)
>> functor_type::apply (v (i), e () (i));
>> #else
>>
>> To:
>>
>> #ifndef BOOST_UBLAS_USE_DUFF_DEVICE
>> #pragma omp parallel for
>> for (size_type i = 0; i < size; ++ i)
>> functor_type::apply (v (i), e () (i));
>> #else
>>
>> The test program was:
>>
>> #include <boost/numeric/ublas/vector.hpp>
>> #include <iostream>
>> int main() {
>> const std::size_t N=20000000;
>> boost::numeric::ublas::vector<double> a(N), b(N), c(N);
>> for (std::size_t i=0; i!=500; i++)
>> c=a+b;
>> std::cout << c(1) << std::endl;
>> return 0;
>> }
>>
>
> ....
>>
>> I didn't run any benchmarks, but top shows 400% utilization on a 4-core i7
>> (hence all 4 cores working).
>
> In OpenMP default scheduler is static with size 1 (iirc) which will
> cause major major cache waste through false sharing.
> Set chunk size to at least 16, N/omp_num_threads is even better
> Top is a poor way of measuring performance. try ompP instrumentation
> with oprofile.
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]