Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Going forward with Boost.SIMD
From: jtl (jtlapreste_at_[hidden])
Date: 2013-04-19 02:31:25

Next message: Michael Marcin: "Re: [boost] [gsoc-2013] Physics Library Abstraction Layer"
Previous message: Andrey Semashev: "Re: [boost] Going forward with Boost.SIMD"
In reply to: Andrey Semashev: "Re: [boost] Going forward with Boost.SIMD"
Next in thread: Mathias Gaunard: "Re: [boost] Going forward with Boost.SIMD"

Le 19/04/2013 07:55, Andrey Semashev a écrit :
> On Friday 19 April 2013 01:21:58 Marc Glisse wrote:
>> On Thu, 18 Apr 2013, Andrey Semashev wrote:
>>> 3. It supports division and modulus for integers?
>> Why not?
>>
>>> Is it supported by any hardware?
>> At least some special cases are, like division by a power of 2.
> I think these special cases are better coded explicitly.
>
>> And if the
>> divisor is constant, you can also let the implementation handle turning it
>> into a multiplication.
> Does the compiler do that with user-defined operators (which are user-defined
> in case of packs)? Or do you mean the implementation of the operator will
> handle that? The latter means that the division will be very slow, but ok,
> since the division is slow even in hardware...
>
>>> 4. How would advanced operations be implemented, such as FMA and integer
>>> madd? Is it through additional library provided functions? IMHO, the
>>> availability of these operations is often crucial for performance of the
>>> user's algorithm, if it is more complicated than just accumulating
>>> integers.
>> If you only want fma as a fast way to compute a+b*c, you could just let
>> your compiler optimize an addition and a multiplication to fma. They are
>> not bad at that. If you rely on the extra accuracy of fma, then library
>> functions seem necessary.
> According to my experience, compilers are reluctant at pattern matching the
> intrinsics and replacing them with other intrinsics (which is a good thing).
> So if the user's code a*b+c*d is equivalent to two
> _mm_mullo_epi16/_mm_mulhi_epi16 and _mm_add_epi32 then that's what you'll get
> in the output instead of a single _mm_madd_epi16. Note also that
> _mm_madd_epi16 requires a special layout of its operands in xmm register
> elements, which is also a blocker for the compiler optimization.
>
> Regarding FMA, this is probably easier for compilers, but due to the
> difference in accuracy I don't expect compilers to perform this optimization
> lightly (i.e. without a specific compiler switch explicitly allowing it). And
> a switch, being a global option, may not be suitable in every place of the
> application. So having a way to explicitly express programmer's intention is
> useful here too.
>
> I think special opreations like FMA, madd, hadd/hsub, avg, min/max should be
> provided as functions. Also, it might be helpful to be able to convert packs
> to the compiler-specific types, like __m128i, and back to be able to use other
> more special intrinsics that are not available as functions or interoperate
> with inline assembler.
>
> What I also forgot to ask is how the paper and Boost.SIMD handle overflowing
> and saturating integer arithmetics? I assume, the operators on packs implement
> overflowing operations since that's how scalar operations work. Is it possible
> to do saturating operations then?
>
They are already present in Boost simd.

Overflowing operations are the current operators + - * / abs and neg.
Saturating operations are abss, adds,subs,muls,divs,negs (the final s
standing for saturated).

abs and negs differ from abs and neg in that they handle Valmin ->
Valmax with integer.
(versus Valmin -> Valmin for the standard ones).

We also have saturate<A>(a) which returns the saturated value of a in
the type A
(the available types in Bosst.simd are signed/unsigned/integers
(8,16,32,64) and pack
of such)

All operations in Boost.simd are coded in a way that if they do not
exist or have no
speed interest to be written using proper intrinsics they fallback to
a map of the scalar available implementation on each element of
the SIMD vectors (however note that is very uncommon).
This is the case of integer division for 64 bits integers (without
entering to far in implementation, it is often (when possible) speedier to
use floating division intrinsics to implement integer division on today's
processors)
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

Next message: Michael Marcin: "Re: [boost] [gsoc-2013] Physics Library Abstraction Layer"
Previous message: Andrey Semashev: "Re: [boost] Going forward with Boost.SIMD"
In reply to: Andrey Semashev: "Re: [boost] Going forward with Boost.SIMD"
Next in thread: Mathias Gaunard: "Re: [boost] Going forward with Boost.SIMD"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk