Subject: Re: [boost] Going forward with Boost.SIMD
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2013-04-21 06:34:14
On 19/04/13 06:55, Andrey Semashev wrote:
> According to my experience, compilers are reluctant at pattern matching the
> intrinsics and replacing them with other intrinsics (which is a good thing).
> So if the user's code a*b+c*d is equivalent to two
> _mm_mullo_epi16/_mm_mulhi_epi16 and _mm_add_epi32 then that's what you'll get
> in the output instead of a single _mm_madd_epi16. Note also that
> _mm_madd_epi16 requires a special layout of its operands in xmm register
> elements, which is also a blocker for the compiler optimization.
_mm_madd_epi16 is not a vertical operation, so it's a fairly special
function, and you can't expect the compiler to recognize cases where it
can use it.
_mm_macc_epi16 is the vertical one (XOP only), and quite more easy on
There are fma and correct_fma functions in any case.
> Regarding FMA, this is probably easier for compilers, but due to the
> difference in accuracy I don't expect compilers to perform this optimization
> lightly (i.e. without a specific compiler switch explicitly allowing it).
A compiler is allowed to use higher precision for intermediate results
whenever it wants. This is what also allows compilers to use 80-bit of
precision for operations on float or double.
> I think special opreations like FMA, madd, hadd/hsub, avg, min/max should be
> provided as functions. Also, it might be helpful to be able to convert packs
> to the compiler-specific types, like __m128i, and back to be able to use other
> more special intrinsics that are not available as functions or interoperate
> with inline assembler.
> What I also forgot to ask is how the paper and Boost.SIMD handle overflowing
> and saturating integer arithmetics? I assume, the operators on packs implement
> overflowing operations since that's how scalar operations work. Is it possible
> to do saturating operations then?
The standard proposal tried to keep things simple, the library itself
has quite a few more things.