|
Ublas : |
Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2016-03-07 10:47:18
std::thread would be also really easy to implement using lambda
functions. I would consider openmp acceptable as long as it is not the
default. A couple of years ago I benchmarked code using both approaches
but std::threads were slower, I assume because of not efficient
implementation in all platforms. I am unsure of the situation know but
it should be easy to find out.
A pragma omp type loop implemented with std::thread should look like ( I
didn't test this code):
auto num_threads = 4;
std::vector<std::thread> workers;
workers.reserve( num_threads ); // Maybe this can provide some speedup.
std::vector<double> v( num_threads * 10) ;
for (std::size_t i = 0; i !=num_threads ; i++)
{
workers.push_back(std::thread([ i, &v ]()
{
auto index = i * 10;
std::cout << "thread " << i << std::endl;
for ( std::size_t j = 0 ; j != 10; j++)
{
v( i*10 + j ) = j;
}
}));
}
for ( auto &w: workers)
{
w.join();
}
All the above can be abstracted to perform kernel operations.
- Nasos
On 03/06/2016 03:58 PM, palik imre wrote:
> It just ocured to me, that based on the descriptor struct it would be
> possible to choose between parallel and serial implementation of the
> kernels.
>
> Anybody would be interested in having something like that in ublas?
>
> Would an OpenMP parallel implementation be accepted to the library?
>
> Thanks,
>
> Imre
>
>
> On Sunday, 6 March 2016, 10:43, palik imre <imre_palik_at_[hidden]> wrote:
>
>
> Fork is here:
> https://github.com/imre-palik/ublas/tree/feature/ublas00004_simd_gemm
>
> pull request is sent.
>
>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]