Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2016-03-07 10:47:18
std::thread would be also really easy to implement using lambda
functions. I would consider openmp acceptable as long as it is not the
default. A couple of years ago I benchmarked code using both approaches
but std::threads were slower, I assume because of not efficient
implementation in all platforms. I am unsure of the situation know but
it should be easy to find out.
A pragma omp type loop implemented with std::thread should look like ( I
didn't test this code):
auto num_threads = 4;
workers.reserve( num_threads ); // Maybe this can provide some speedup.
std::vector<double> v( num_threads * 10) ;
for (std::size_t i = 0; i !=num_threads ; i++)
workers.push_back(std::thread([ i, &v ]()
auto index = i * 10;
std::cout << "thread " << i << std::endl;
for ( std::size_t j = 0 ; j != 10; j++)
v( i*10 + j ) = j;
for ( auto &w: workers)
All the above can be abstracted to perform kernel operations.
On 03/06/2016 03:58 PM, palik imre wrote:
> It just ocured to me, that based on the descriptor struct it would be
> possible to choose between parallel and serial implementation of the
> Anybody would be interested in having something like that in ublas?
> Would an OpenMP parallel implementation be accepted to the library?
> On Sunday, 6 March 2016, 10:43, palik imre <imre_palik_at_[hidden]> wrote:
> Fork is here:
> pull request is sent.
> ublas mailing list
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]