Ublas :

Date view	Thread view	Subject view	Author view

Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2016-03-07 10:47:18

Next message: Riccardo Rossi: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"
Previous message: palik imre: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"
In reply to: palik imre: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"
Next in thread: Riccardo Rossi: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"

std::thread would be also really easy to implement using lambda
functions. I would consider openmp acceptable as long as it is not the
default. A couple of years ago I benchmarked code using both approaches
but std::threads were slower, I assume because of not efficient
implementation in all platforms. I am unsure of the situation know but
it should be easy to find out.

A pragma omp type loop implemented with std::thread should look like ( I
didn't test this code):

     auto num_threads = 4;
     std::vector<std::thread> workers;
     workers.reserve( num_threads ); // Maybe this can provide some speedup.
     std::vector<double> v( num_threads * 10) ;

     for (std::size_t i = 0; i !=num_threads ; i++)
{
         workers.push_back(std::thread([ i, &v ]()
         {
             auto index = i * 10;
             std::cout << "thread " << i << std::endl;
             for ( std::size_t j = 0 ; j != 10; j++)
             {
                 v( i*10 + j ) = j;
             }
         }));
     }

     for ( auto &w: workers)
     {
         w.join();
     }

All the above can be abstracted to perform kernel operations.

- Nasos

On 03/06/2016 03:58 PM, palik imre wrote:
> It just ocured to me, that based on the descriptor struct it would be
> possible to choose between parallel and serial implementation of the
> kernels.
>
> Anybody would be interested in having something like that in ublas?
>
> Would an OpenMP parallel implementation be accepted to the library?
>
> Thanks,
>
> Imre
>
>
> On Sunday, 6 March 2016, 10:43, palik imre <imre_palik_at_[hidden]> wrote:
>
>
> Fork is here:
> https://github.com/imre-palik/ublas/tree/feature/ublas00004_simd_gemm
>
> pull request is sent.
>
>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]

text/html attachment: attachment

Next message: Riccardo Rossi: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"
Previous message: palik imre: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"
In reply to: palik imre: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"
Next in thread: Riccardo Rossi: "Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks"

Date view	Thread view	Subject view	Author view