Subject: Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2009-01-22 11:25:07
John Maddock wrote:
>>> Well, the whole discussion is about "optimal." If one doesn't care
>>> about "optimal" then a compiler will do just fine all the time and
>>> there's no need for a DSEL, asm or ugly gcc intrinsics.
>> What if we replace optimal with optimized?
>> Surely library code than gives a 4x speedup is desirable to have
>> even if you can hand generate code that gives you a 5x. Getting a 4x
>> speedup over naive simd-less in simple vector operations and still
>> being able to concentrate on the problem at hand instead of low
>> level optimization details sounds fantastic to me.
> Indeed, and like I keep saying: show us the code that produces the
> speedup and we can all stop arguing and start rejoycing :-)
The compiler doesn't do better than a good assembly programmer *can* do by hand, it does as well as a good assembly programmer *usually* does. Arguing for the compiler to perform SIMD optimization is in and of itself arguing for giving up optimal for good enough. Rather than pointing to the code that produces the speedup, we can point to the application domains where the speedup is realized. Video encoding is one place where SIMD becomes a big deal. Very often when comparing processors using benchmarks there will be a pronounced difference in performance for video encoding that isn't there on other benchmarks because of the use of SSE in the benchmark. As John points out, these fantastic performance benefits just aren't in the offing for most applications. Sometimes I write C++ code that looks a little like verilog and I think to myself, wow, this would be screamingly fast in hardware. For video encoding and other very important applications they actually do implement such functions in hardware, they know exactly where they want to use that hardware in the code. I think it makes perfect sense for Adobe (for instance) to implement a library of image processing primitives based on SIMD that detects the hardware at runtime and chooses the appropriate implementation for that hardware so that the binary rather than the code is portable across x86 platforms. It does not, however, make sense for them to open source such a library because it is sufficiently difficult to implement that it represents a competitive advantage. SIMD is hard, and if you stand to benefit from it, you have to use it in order to compete. I sympathise with the OP, I think an open source SIMD library could be a real help to guys like him. He should be asking the hardware manufacturers for support instead of boost. A SIMD library could be seen as comparable to TBB, for example, but the number of applications where it could be applied is much less, so it might be easier to just write the intrinsics in the few places where it actually matters, the places where you want hand crafted code anyway.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk