std::thread would be also really easy to implement using lambda functions. I would consider openmp acceptable as long as it is not the default. A couple of years ago I benchmarked code using both approaches but std::threads were slower, I assume because of not efficient implementation in all platforms. I am unsure of the situation know but it should be easy to find out.

A pragma omp type loop implemented with std::thread should look like ( I didn't test this code):

    auto num_threads = 4;
    std::vector<std::thread> workers;
    workers.reserve( num_threads ); // Maybe this can provide some speedup.
    std::vector<double> v( num_threads * 10) ;

    for (std::size_t i = 0; i !=num_threads ; i++)
{
        workers.push_back(std::thread([ i, &v ]()
        {
            auto index = i * 10;
            std::cout << "thread " << i << std::endl;
            for ( std::size_t j = 0 ; j != 10; j++)
            {
                v( i*10 + j ) = j;
            }
        }));
    }

    for ( auto &w: workers)
    {
        w.join();
    }

All the above can be abstracted to perform kernel operations.

- Nasos

On 03/06/2016 03:58 PM, palik imre wrote:

It just ocured to me, that based on the descriptor struct it would be possible to choose between parallel and serial implementation of the kernels.

Anybody would be interested in having something like that in ublas?

Would an OpenMP parallel implementation be accepted to the library?

Thanks,

Imre

On Sunday, 6 March 2016, 10:43, palik imre <imre_palik@yahoo.co.uk> wrote:

Fork is here: https://github.com/imre-palik/ublas/tree/feature/ublas00004_simd_gemm

pull request is sent.
_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: athanasios.iliopoulos.ctr.gr@nrl.navy.mil