Subject: Re: [boost] Going forward with Boost.SIMD
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2013-04-25 04:37:33
On 24/04/13 23:00, dag_at_[hidden] wrote:
> Compilers exist in the field today that generate CPU/GPU code that
> outperforms hand-coded CUDA. Compilers exist in the field today that
> vectorize and parallelize code that outperforms hand-parallelized code.
That just means that the hand-parallelized code was badly done.
Can you beat optimized libraries like CUBLAS or CUFFT ?
Can you generate an optimized GPU sort from the code of std::sort ?
I have seen the published results of many different types of
auto-parallelization technology. Even when specifically engineered to
parallelize specific algorithms they still don't beat the state of the
art optimized implementation, and sometimes are quite far from it.
> Hand-tuned scalar code can beat compiler-generated code yet we don't
> advocate people write in asm all the time.
There is no need to go down to ASM to optimize scalar code, you can
optimize with C or C++.
A simple optimization like scalarization for example is not done
reliably by today's compilers, and doing it manually can help performance.
Likewise doing register rotation explicitly can also help performance
Unrolling or pipelining can also be done at the source level, and give
performance benefits even on modern out-of-core architectures.
It's all a matter of how important a specific piece of code is and how
much work it would take to make it faster.
> CUDA *is* being replaced by OpenACC in our cutomers' codes. Not
> overnight, but every month we see more use of OpenACC.
I don't know much about Cray, but I would think that your customers
probably do not represent the whole of CUDA users at large.