Boost logo

Boost :

Subject: Re: [boost] SIMD libraries -- was: [xint] Fourth release, requesting preliminary review again
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2010-06-11 20:18:38


joel falcou wrote:
> Simonson, Lucanus J wrote:
>> You can wrap usage of intrinsics with templates and create simd data
>> types in C++ that use SSE when available and emulate it in software
>> when not available. Doing so would be a boost library unto itself,
>> which has been proposed many times and generally not well received
>> because it favors which ever instruction set it chooses to target
>> and is therefore not general. It would also be a maintanence
>> headache because new revisions of the instruction set would require
>> constant revisions of the library to keep up. For complicated
>> reasons that become obvious once you comparatively study several
>> vector instruction sets it is not practical (or really feasible for
>> that matter) to target multiple instruction sets with such a
>> library. You write your algorithms differently to take advantage of
>> different instructions.

> Except in fact, after a while you can just see all this as a real
> generic layer on top of register. I suggest you to get a hold of what
> we presented this year at Boost'Con. With the proper abstraction, we
> were
> able to handle all SSE flavors as well as AltiVec. The effort on
> maintenance is marginal. As an exercie we worked on a XOP binding.
> Took
> less than a week to get running. The main idea was to separate the low
> level, extensions pecific binding from a generic, EDSL based concept
> of "pack of N elements of type T" in which compile-time transform
> restructure simd expression toward their natural form for current
> extension.
>
> The error is to make a boost.SIMD stuff. You need a global overhaul of
> function for scalar and vector register, as scalar are baically vector
> register with one element. When you're there, everythign flow
> properly.
>
> As for the maitnenance, the IS doesn't change. New IS are made for new
> processor. At this either they are completely separate (altivec vs spu
> altivec) or they share an ancestry (SSE2,3,4,AVX for example). Both
> case
> are easily manageable.

Obviously something like valarray can be implemented. It has been part of the standard for years, though most people don't implement it, use it or even know of its existence. There is a difference between that and what I was talking about, which is a DSEL that exposes all of the features of a SIMD instruction set. It is doable for one IS, and probably for one family, but to generalize and unify across SIMD instruction sets I'm more than skeptical. Since most software targets a platform specifically it is almost a moot point, however. There is very little room between people who want to target a wide variety of hardware and care enough about performance to want to use SIMD and similar people who care so much about performance that they want to write all their code by hand in assembly for each target hardware. I expect GMP falls solidly in the second category, for example. There is a big difference between using SSE instructions and getting the most out of them. In general you have to change your algorithm to fit the architecture. That means different algorithms for different architectures.

Does your simd wrapper use expression templates to convert a * b + c to fused multiply add? If it doesn't then it is wrong because it is missing an important optimization and if it does it is also wrong since the fused multiply add produces a result that has different accuracy of precision. Does it expose the overflow/carry flags produced by arithmetic? What about predicates? I agree it is pretty obvious to start out from a "pack of N elements of type T", but that's just the start, not the whole echelada.

Please don't misinterpret my skepticism for criticism. I am actually pretty curious to learn more about what you did with SIMD and I'm sorry I missed boostcon this year. Discussions like this work much better in person. I've followed some of your posts in the past about NT2 and uBlas and the recent matrix library submission.

Regards,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk