Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [gsoc] boost.simd news from the front.
From: David A. Greene (greened_at_[hidden])
Date: 2011-06-10 18:09:49

Next message: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Previous message: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
In reply to: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Next in thread: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Reply: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Reply: Gruenke, Matt: "Re: [boost] [gsoc] boost.simd news from the front."

Joel falcou <joel.falcou_at_[hidden]> writes:

> On 10/06/11 15:16, David A. Greene wrote:
>> Almost everything the compiler needs to vectorize well that it does not
>> get from most language syntax can be summed up by two concepts: aliasing
>> and alignment.
>
> No. How can a compiler vectorized a function in another binary .o ?

The compiler. Someone had to build that .o.

> Like who is gonna vectorize cos and its ilk ?

The library vendor. Lots of vendors produce special vector versions of
these.

> If you read the slides, you would have seen that pack is like the
> messenger of the whole sidm range system which is fitting right into
> *higher level of abstraction* and not some piggy backing of the compiler.

It's not a high level of abstraction. It's a very low level one. Users
are barely willing to restructure loops to enable vectorization. Many
will be unwilling to rewrite them completely. On the other hand, the
data show that they are quite willing to add directives here and there.

>> pack<> does address alignment, but it's overkill. It's also
>> pessimistic. One does not always need aligned data to vectorize, so the
>> conditions placed on pack<> are too restrictive. Furthermore, the
>> alignment information pack<> does convey will likely get lost in the
>> depths of the compiler, leading to suboptimal code generation unless
>> that alignment information is available elsewhere (and it often is).
>
> Well, my benchmarks disagree with this. See this old post of mine one
> year ago about the same subject. If getting 95% of peak performances
> is pessimistic, then sorry.

On what code? It's quite easy to achieve that on something like a
DGEMM. DGEMM is also an embarrassingly vectorizable code.

>> What's under the operators on pack<>? Is it assembly code?
>
> No as naked assembly prevent proper inlining and other register based
> compiler optimisation. We use w/e intirnsic is avialable for the
> current compiler/architecture at hand.

That's effectively assembly code.

>> I wonder how pack<T> can know the best vector length. That is highly,
>> highly code- and implementation-dependent.
>
> No. On SSEx machine, SIMD vector are 128 bits, this means pack<T,
> sizeof(T)/16> is optimal so a simple meta-function finds it.

No. On SSEx machines, a vector of 32-bit floats can have 1, 2, 3 or 4
elements.

Consider AVX. This is _not_ an easy problem to solve. It is not always
the right answer to vectorize using the fully available vector length.

>> How does simd::where define pack<> elements of the result where the
>> condition is false? Often the best solution is to leave them undefined
>> but your example seems to require maintaining current values.
>
> This make no sense. False is [0 ... 0] True is [ ~0 ... ~0]. Period.
> SIMD is all about branchless, so everything is computed in the whole
> vector. I seems to me you didnt get that pack is NOT a data container
> but a layer above SIMD registers that then get hidden under concept of
> ContiguousRange.

I know what a pack<> is. Perhaps I wasn't clear. If I have an
operation (say, negation) under where() in which the even condition
elements are true and the odd condition elements are false, what is the
produced result for the odd elements of the result vector?

>> How portable is Boost.simd? By portable I mean, how easy is it to move
>> the code from one machine to another get the same level of performance?
>
> Works on gcc, msvc, sse and altivec, and we started looking at ARM NEON.
> Most of these have the same level of performance

What happens if you move the code from Nehalem to Barcelona? How about
from an NVIDIA GPU to Nehalem?

> I'll keep my archaic stuff giving me a x4-x8 speed up rather than
> waiting for compiler based solution nobody were able to give me since
> 1999 ...

Compilers have been doing this since the '70's. gcc is not an adequate
compiler in this respect, but it is slowly getting there.

> We already had this discussion two years ago, so i am not keen to go
> all over again as it clearly seems you are just retelling the same FUD
> that last time.

It's not FUD. It's my experience.

-Dave

Next message: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Previous message: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
In reply to: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Next in thread: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Reply: Joel falcou: "Re: [boost] [gsoc] boost.simd news from the front."
Reply: Gruenke, Matt: "Re: [boost] [gsoc] boost.simd news from the front."

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk