Hello all,

I think that introducing OpenMP should be fairly straightforward, by tweaking the dispatching loops in functional.hpp functors (and wherever else loops exist) to use #pragma omp fors. (instead of while or simple for loops).  I wish I had time to implement it (I may attempt to post a test though).

As far as SSE is concerned, there may be an easy way with auto-vectorization, but the hard part would be to get the compiler unwind uBlas expression template structure. In hand-made tests I have seen auto-vectorization and openMP working nicely on uBlas containers and giving performance better the Eigen3 (basically because Eigen3 is not using OpenMP efficiently atm).  I may post those results in the near future along with some source code. Unfortunately the whole auto-vectorization feature is very compiler specific and I find the gcc is easier to get it running than MSVC or icpc. Another issue with auto-vectorization is the type alignment. For more info look at:

http://gcc.gnu.org/projects/tree-ssa/vectorization.html

Furthermore I have seen some attempts in functional.hpp to probably enable SIMD auto-vectorization by providing compiler friendly syntax, (check the BOOST_UBLAS_USE_SIMD define), but I never tried to see how this works or if it actually boosts performance. 

FInally I would agree with David than more permanent solutions would probably be Boost::SIMD.

Best,
Nasos


On Mar 24, 2011, at 6:14 AM, David Bellot wrote:

not for the moment.
In fact, there is a GSoC project for porting part of NT2 to Boost. It will be something like Boost::SIMD I think.
So I think that would be better to integrate this future library into ublas rather than having our own implementation.
The reason is that those guys at NT2 already have a rock-solid vector implementations running on multiple architectures (SSE, Altivec, Cell processor and I think ARM). So the benefit would be immediate for us.

However, I you plan to do something with OpenMP, that would be great. Eigen has it (or some sort of multi-core capabilities).
We need that too. I would start by looking at the GNU parallel STL implementation to have an idea.

Any suggestions ?

Cheers,
David
____________________
David Bellot, PhD
http://david.bellot.free.fr
http://ai-owl.blogspot.com




On Thu, Mar 24, 2011 at 10:39, Philipp Kraus <philipp.kraus@flashpixx.de> wrote:
Hello,

does the ublas components use a SSE (streaming SIMD Extension) ? If yes, which version? Or must / should I create a own support class?


Thanks

Phil


_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: david.bellot@gmail.com

_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: athanasios.iliopoulos.ctr.gr@nrl.navy.mil