Boost logo

Boost :

Subject: Re: [boost] [GSOC]SIMD Library
From: Gruenke, Matt (mgruenke_at_[hidden])
Date: 2011-03-30 02:04:09


My experience is mostly with MMX and integer SSE2. A useful approach I've used in the past was to create type-safe wrappers for the various intrinsics. These mostly took the form of overloaded inline functions, though I used templates whenever an immediate integer operand was required. These overloads enabled me to write higher-level templates that supported multiple vector types, even if they were sometimes machine-specific. The templates enabled optimizations of degenerate & special cases.
 
Even though this isn't as sophisticated as what proto can do, I think it will be useful to have a fall-back for cases where there are either specialized instructions that aren't easily expressible as expressions or other cases where it's difficult to get proto to generate the instruction sequence you want. Besides type-safety, the wrappers make the code much more readable than using the native intrinsics.
 
A few functions and templates implemented idioms and tricks for doing common tasks, like loading a vector with zeros (hint: xor anything with itself) or filling the vector with copies of a single value (I think this is called splat, in Altivec). Another reason to do this is to take advantage of architecture-specific optimizations. Some of these generics were:

                template< typename V > V zero(); // generates a vector of 0's
                template< typename V > V full_mask(); // sets all bits to 1
                template< int i, typename V > T get_element( V );
                template< int i, typename V > V set_element( V, T );
                template< int i, typename V, typename T > V set_element_general( V, T );
                template< int i, int j, ... > V shuffle( V ); // rearranges the elements in V.
                template< int n, typename T > V load( T * ); // loads n lowest elements of T[]
                template< int n, typename T > void store( V, T * ); // stores n lowest elements of V
                template< typename V > void store_uncached( V, V * ); // avoids cache pollution
                template< typename T, typename V > T horizontal_sum( V ); // sum of all elements in V
                 

I'm also a fan of having a set of common, optimized 1-D operations, such as buffer packing/interleaving & unpacking/deinterleaving, extract/insert columns, convolution, dot-product, SAD, FFT, etc. Keep it low-level, though. IMO, any sort of high-level abstraction that ships data off to different accelerator back-ends, like GPUs, is a different animal and should go in a different library.
 
 
Matt
 

________________________________

From: boost-bounces_at_[hidden] on behalf of Faiçal Tchirou
Sent: Tue 3/29/2011 7:16 AM
To: boost_at_[hidden]
Subject: [boost] [GSOC]SIMD Library

Hi everyone,
I recently ask documentation about the SIMD Library project here and I receive some guides and references links from Joel Falcou. Now I have some questions about the library itself. Is the goal of the project to build a library similar to libSIMDx86 but more focusing on the AltiVec instruction set ? Joel also advises me to take a look at Proto. I read that Proto is used to build DSEL. Will Proto be used to map high level classes and methods of the library to native SIMD instructions ? What about the library itself ? Which modules have to be developped ?
Thanks.
                                         
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk