Hi All,
i believe that including OpenMP in ublas is potentially very interesting, nevertheless how will you ensure
that openmp is used only for large matrices and vectors without providing a performance penalty for small matrices/vectors?
For example i use ublas dense matrices inside parallel regions, so if you put openmp inside that you would destroy the performance of my code...
maybe one solution would be to define a parallel matrix/vector so that the user can explicitly use it. Indeed currently what we do is that we redefine the axpy product and the dots
so that we can use them ONLY when we like them to be parallel.
As a secondary comment please consider that in order to squeeze performance from modern machines it is FUNDAMENTAL to do allocation in parallel...
well...all for now
greetings
Riccardo
On Tue, 29 Mar 2011 00:20:08 +0400, Matwey V. Kornilov wrote:
Try to change number of active threads, for instance type: export OMP_NUM_THREADS=2 before running your programm. Due to OpenMP scheduling there is possible performance degeneracy with increasing of threads. Nasos Iliopoulos wrote:Great to know! I don't feel that the example I posted earlier is by any means complete, optimized or whatever. I just pointed out the locations in uBlas one may want to start looking to add OpenMP parallelization into it. In any case if you have any further experience to share it would be very useful! Thanks, Nasos_______________________________________________ ublas mailing list ublas@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/ublas Sent to: rrossi@cimne.upc.edu