Subject: Re: [ublas] SSE support
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2011-03-28 11:32:33
I think that introducing OpenMP should be fairly straightforward, by tweaking the dispatching loops in functional.hpp functors (and wherever else loops exist) to use #pragma omp fors. (instead of while or simple for loops). I wish I had time to implement it (I may attempt to post a test though).
As far as SSE is concerned, there may be an easy way with auto-vectorization, but the hard part would be to get the compiler unwind uBlas expression template structure. In hand-made tests I have seen auto-vectorization and openMP working nicely on uBlas containers and giving performance better the Eigen3 (basically because Eigen3 is not using OpenMP efficiently atm). I may post those results in the near future along with some source code. Unfortunately the whole auto-vectorization feature is very compiler specific and I find the gcc is easier to get it running than MSVC or icpc. Another issue with auto-vectorization is the type alignment. For more info look at:
Furthermore I have seen some attempts in functional.hpp to probably enable SIMD auto-vectorization by providing compiler friendly syntax, (check the BOOST_UBLAS_USE_SIMD define), but I never tried to see how this works or if it actually boosts performance.
FInally I would agree with David than more permanent solutions would probably be Boost::SIMD.
On Mar 24, 2011, at 6:14 AM, David Bellot wrote:
> not for the moment.
> In fact, there is a GSoC project for porting part of NT2 to Boost. It will be something like Boost::SIMD I think.
> So I think that would be better to integrate this future library into ublas rather than having our own implementation.
> The reason is that those guys at NT2 already have a rock-solid vector implementations running on multiple architectures (SSE, Altivec, Cell processor and I think ARM). So the benefit would be immediate for us.
> However, I you plan to do something with OpenMP, that would be great. Eigen has it (or some sort of multi-core capabilities).
> We need that too. I would start by looking at the GNU parallel STL implementation to have an idea.
> Any suggestions ?
> David Bellot, PhD
> On Thu, Mar 24, 2011 at 10:39, Philipp Kraus <philipp.kraus_at_[hidden]> wrote:
> does the ublas components use a SSE (streaming SIMD Extension) ? If yes, which version? Or must / should I create a own support class?
> ublas mailing list
> Sent to: david.bellot_at_[hidden]
> ublas mailing list
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]