Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: Smith, Jacob N (jacob.n.smith_at_[hidden])
Date: 2010-10-11 19:12:45
> -----Original Message-----
> From: boost-bounces_at_[hidden] [mailto:boost-
> bounces_at_[hidden]] On Behalf Of Simonson, Lucanus J
> Sent: Monday, October 11, 2010 3:34 PM
> To: boost_at_[hidden]
> Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented
> iterators and alternatives
> Mathias Gaunard wrote:
> > On 11/10/2010 17:54, DE wrote:
> >> hi there
> >> if you focus on simd-aware - and more specifically x86 simd -
> >> implementation _DON'T_ read further
> >> consider OpenCL as a general way to speed up computations (which
> >> simd as one of backends as well as gpu shader units etc.)
> >> i think it will be much more general and useful
> > OpenCL is a possible implementation, albeit we choose to call the
> > various SIMD instructions ourselves to have more control on the
> > toolchain and the end result.
> > OpenCL will however be our main backend for GPU targets which we will
> > be supporting in the future. Or maybe we will just target it through
> > Clang
> > and LLVM.
> > NT2 (also know as the crazy frenchman library), upon which this
> > is based, only supports x86 (SSE, ..., SSE4, AVX) and PowerPC
> > (AltiVec).
> > ARM (NEON, VFP) is being added.
> > An effort has been made in its design so that instructions could
> > register themselves for a particular (type, cardinal) pair, all of
> > which ranked according to a category to select the best candidate. It
> > heavily
> > uses meta-programming, including rewritten bits of MPL to augment its
> > compile-time performance.
> > It also uses expression templates with Proto to detect certain
> > patterns,
> > such as fused multiply-add, for which x86 is introducing new
> > instructions soon.
> > Therefore it is very tunable and extensible.
> > The bit I wanted to discuss here, however, is not the implementation,
> > but rather the interface that the library provides to the programmer.
> > We aim to provide an interface in modern C++ that integrates well
> > the standard library and Boost in order to allow developers to make
> > use
> > of SIMD in an easy and fairly high-level fashion, potentially using
> > meta-programming to write an algorithm with parametric register and
> > cache sizes.
> > OpenCL is not an interface that satisfies those criteria.
> Have you seen the ct thing coming from Intel? I just learned about it
> last week.
> It looks like a container library for vector processing in C++ with a
> JIT compilation runtime environment as part of the library. As a guy
> who appreciates compilers, langauges and libraries there is a lot to
> like there. This area is evolving very fast.
Assuredly I missed some sort of presentation or a discussion elsewhere, but why is a segmented-iterator/SIMD-oriented-iterator (or ct) superior or worse than a well-tuned valarray? Obviously, valarray has a uniquely sparse interface providing (less than) the barebones needed for data-parallel programming; on top of which it uses a member-function strategy. However, it seems that "fixing" valarray might be a more robust start, rather than immediately striking out on a different strategy.
> Unsubscribe & other changes:
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk