Boost logo

Boost :

Subject: Re: [boost] Going forward with Boost.SIMD
From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2013-04-24 16:18:27


Hi,

On 13:00 Wed 24 Apr , dag_at_[hidden] wrote:
> All of the scalar and complex arithmetic using simple binary operators
> can be easily vectorized if the compiler has knowledge about
> dependencies. That is why I suggest standardizing keywords, attributes
> and/or pragmas rather than a specific parallel model provided by a
> library. The former is more general and gives the compiler more freedom
> during code generation.

It seems like the auto-parallelizing compiler is constantly just a
couple of years away. I know there is progress, but apparently the
complexity of today's architectures counteracts this.

> But see that's exactly the problem. Look at the X1. It has multiple
> levels of parallelism. So does Intel MIC and GPUs. The compiler has to
> balance multiple parallel models simultaneously. When you hard-code
> vector loops you remove some of the compiler's freedom to transform
> loops and improve parallelism.

But isn't the current programming model broken? If you let the
programmer write loops which the compiler will aim to parallelize,
then the programmer will still always think of the iterations of
running sequentially, thus creating an "impedance mismatch".
Programming models such as Intel's ispc or Nvidia's CUDA fare so well
because they exhibit an acceptable amount of parallelism to the user,
while simultaneously maintaining some leeway for the compiler.

> A library-based short vector model like the SIMD library is very
> non-portable from a performance perspective. It is exactly for this
> reason that things like OpenACC are rapidly replacing CUDA in production
> codes. Libraries are great for a lot of things. General parallel code
> generation is not one of them.

CUDA is being rapidly replaced by things like OpenACC? Hmm, in my
world people are still rubbing their eyes as the slowly realize that
this "#pragma omp parallel for" gives them poor speedups, even on
quad-core UMA nodes. And seeing how "well" the auto-magical offload
mode on MIC works, they are very suspicious of things like OpenACC.

Best
-Andreas

--
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk