Subject: Re: [boost] Going forward with Boost.SIMD
Date: 2013-04-24 16:47:30
Mathias Gaunard <mathias.gaunard_at_[hidden]> writes:
> Automatic parallelization will never beat code optimized by
> experts. Experts program each type of parallelism by taking into
> account its specificities.
That is hyperbole. "Never" is a strong word.
> An interesting point in favor of a library is also memory layout. A
> C++ compiler cannot change the memory layout on its own to make it
> more friendly to vectorize. By providing the right types and
> primitives to the user, he is made aware of the issues at hand and
> empowered with the ability to explicitly state how a given algorithm
> is to be vectorized.
I agree that libraries to make data shaping easier are useful!
>> For specialized operations like horizontal add, saturating arithmetic,
>> etc. we will need intrinsics or functions that will be necessarily
> The proposal suggests providing vectorized variants of all
> mathematical functions in the C++ standard (the Boost.SIMD library
> covers C99, TR1 and more). That's quite a lot of functions.
But not the special ones I mentioned.
> Should all these functions be made compiler built-ins? That doesn't
> sound like a very scalable and extensible approach.
I dunno, we do a lot of that here.
>> Vector masks fundamentally change the model. They drastically affect
>> control flow.
> Some processors have had predication at the scalar level for quite
> some time. It hasn't drastically changed the way people program.
Scalar predication hasn't changed the way people program because
compilers do the if-conversion. As it should be with vectors.
> It is similar to doing two instructions in one (any instruction can
> also do a blend for free), and optimizing those instructions done
> separately into one is something that a compiler should be able to do
> pretty well. It doesn't sound very unlike what a compiler must do for
> VLIW codegen to me, but then I have little knowledge of compilers.
I have trouble seeing how one would use the SIMD library to make it
easier to write predicated vector code. Can you sketch it out?
> The fact that it is the library doesn't mean that the compiler
> shouldn't perform on vector types the same optimizations that it does
> on scalar ones.
Of course it will. But the library user has already made the choice of
what to vectorize. Many times it will be the right choice, but not
> While I can see the benefit of this feature for a compiler that wants
> to generate SIMD for arbitrary code, dedicated SIMD code will not
> depend on this too much that it cannot be covered by a couple of
> additional functions.
Predication allows much more effecient vectorization of many common
idioms. A SIMD library without support for it will miss those idioms
and the compiler auto-vectorizer will get better performance.
>> Longer vectors can also dramatically change the generated code. It is
>> *not* simply a matter of using larger strips for stripmined loops. One
>> often will want to vectorize different loops in a nest based on the
>> hardware's maximum vector length.
> I don't see what the problem is here.
> This is C++. You can write generic code for arbitrary vector
> lengths. It is up to the user to use generative programming techniques
> to make his code depend on this parameter and be portable. The library
> tries to make this as easy as possible.
So the user has to write multiple versions of loops nests, potentially
one for each target architecture? I don't see the advantage of this
>> A library-based short vector model like the SIMD library is very
>> non-portable from a performance perspective.
> From my experience, it is still fairly reliable. There are differences
> in performance, but they're mostly due to differences in the hardware
> capabilities at solving a particular application domain well.
Well yes, that's one of the main issues.