Boost logo

Boost :

Subject: [boost] Back to Boost.SIMD - Some performances ...
From: Joel Falcou (joel.falcou_at_[hidden])
Date: 2009-03-26 12:13:58


I'm still working on a potential Boost.SIMD proposal despite the
apparent lack of interest by the list. Last discussion spawned the fact
that actual performances figures may be interesting. So here's some (see
end of mail). This table show for a subset of non-trivial functions the
cycles needed to compute one value in scalar, using SSE2, some precision
concerns and the actual speed-up.

Most of them are super-linear because either :
1/ libc algorithm is badly implemented, or
2/ non-SIMD architectural difference between SSE2 FPU and scalar FPU
leads to additional speed-up

Most transcendental functions used a SIMD evrsion of the old yet useful
Cephes C library based around various polynomial estimations.
Results on Altivec processor are roughly the same except for
transcendental where the use of a proper FMA instead of sequence of
mul-add increases performances.For trivial function like +,-,*,/, I was
happily surprised that indeed gcc is able to generate SIMD code. Alas,
gcc auto-SIMD speed-up never exceed 2.54 while our code can go up to 3.5.

Concerning the problem of interface and support of odd-ball vector size
in a platform independant fashion, we use the remark of Matthias and
provide a vec<T,C> class in which the vector cardinal can be speficified
(and is equal to the native cardinal of said type by default). Things like
vec<double,5> are handled as boost::array and provide same interface and
set of functions. for any given functions, it cna be applied either to
any vec<T,C> types or any native SIMD type (__m128 in SSEx or vector xxx
in Altivec). Syntaxic sugar like v = v+4 is provided and perform
consatnt splatting before SIMD evaluation.

This still has to be boostified and made independant of the whole
project it depends on.
Once done, a preliminary version will be uploaded into the Vault.
Current target architecture are :
- SSE2, SSSE3,SSE3
- AltiVec for PPC and Cell processor (a patched version of boost is needed)

Comments and questions welcomed.

|| -------------- || --------------- || scalar || -------------- vector
---------------- || ---- ||
|| || || cycles ||cycles | ulp
| rms | peak || s-up ||
||
---------------------------------------------------------------------------------------------
||
|| abs_ || float || 2.0 || 0.8 | 0 |
0.00e+00 | 0.00e+00 || 2.4 ||
|| acosh_ || float || 148.2 || 30.2 | 1 |
9.15e-09 | 1.19e-07 || 4.9 ||
|| acos_ || float || 261.8 || 14.7 | 3 |
7.01e-08 | 2.38e-07 || 17.8 ||
|| arg_ || float || 5.0 || 1.2 | 0 |
0.00e+00 | 0.00e+00 || 4.2 ||
|| asinh_ || float || 152.8 || 32.4 | 1 |
1.22e-08 | 1.19e-07 || 4.7 ||
|| asin_ || float || 256.5 || 11.6 | 2 |
5.32e-08 | 2.28e-07 || 22.1 ||
|| atanh_ || float || 123.9 || 20.4 | 2 |
2.27e-08 | 4.55e-07 || 6.1 ||
|| atan_ || float || 160.7 || 12.7 | 1 |
3.55e-08 | 6.74e-08 || 12.7 ||
|| bitofsign_ || float || 5.1 || 0.8 | 0 |
0.00e+00 | 0.00e+00 || 6.1 ||
|| boolean_ || float || 5.4 || 1.0 | 0 |
0.00e+00 | 0.00e+00 || 5.4 ||
|| cbrt_ || float || 152.5 || 39.7 | 1 |
2.76e-08 | 7.77e-08 || 3.8 ||
|| ceil_ || float || 16.6 || 2.8 | 0 |
0.00e+00 | 0.00e+00 || 5.9 ||
|| cosh_ || float || 211.5 || 19.1 | 2 |
4.00e-08 | 1.83e-07 || 11.1 ||
|| cos_ || float || 112.2 || 14.6 | 1 |
2.98e-08 | 1.11e-07 || 7.7 ||
|| cospi_ || float || 103.6 || 12.1 | 1 |
3.43e-08 | 1.19e-07 || 8.6 ||
|| cot_ || float || 142.8 || 17.8 | 3 |
5.54e-08 | 2.38e-07 || 8.0 ||
|| cotpi_ || float || 142.1 || 17.1 | 6 |
9.62e-08 | 4.08e-07 || 8.3 ||
|| exp10_ || float || 169.3 || 32.1 | 1 |
2.88e-08 | 1.19e-07 || 5.3 ||
|| exp_ || float || 171.3 || 19.3 | 1 |
2.60e-08 | 1.19e-07 || 8.9 ||
|| expm1_ || float || 294.1 || 42.6 | 3 |
2.89e-08 | 1.94e-07 || 6.9 ||
|| floor_ || float || 16.9 || 2.7 | 0 |
0.00e+00 | 0.00e+00 || 6.4 ||
|| gd_ || float || 602.4 || 35.8 | 3 |
3.93e-08 | 2.46e-07 || 16.8 ||
|| indeg_ || float || 2.2 || 0.8 | 0 |
2.59e-08 | 5.94e-08 || 2.7 ||
|| inrad_ || float || 2.2 || 0.8 | 0 |
2.53e-08 | 5.94e-08 || 2.6 ||
|| iseqz_ || float || 5.5 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 6.3 ||
|| iseven_ || float || 45.0 || 2.3 | 0 |
0.00e+00 | 0.00e+00 || 19.9 ||
|| isfin_ || float || 5.9 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 6.6 ||
|| isflint_ || float || 38.3 || 1.8 | 0 |
0.00e+00 | 0.00e+00 || 21.4 ||
|| isgez_ || float || 6.0 || 0.8 | 0 |
0.00e+00 | 0.00e+00 || 7.4 ||
|| isgtz_ || float || 6.0 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 6.5 ||
|| isinf_ || float || 6.0 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 6.5 ||
|| islez_ || float || 5.0 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 5.8 ||
|| isltz_ || float || 5.0 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 5.8 ||
|| isnan_ || float || 3.0 || 0.8 | 0 |
0.00e+00 | 0.00e+00 || 3.6 ||
|| isnegative_ || float || 5.6 || 1.1 | 0 |
0.00e+00 | 0.00e+00 || 5.0 ||
|| isnez_ || float || 5.4 || 0.9 | 0 |
0.00e+00 | 0.00e+00 || 6.2 ||
|| isnotfinite_ || float || 3.0 || 0.8 | 0 |
0.00e+00 | 0.00e+00 || 3.6 ||
|| isodd_ || float || 47.8 || 2.5 | 0 |
0.00e+00 | 0.00e+00 || 18.8 ||
|| ispositive_ || float || 5.6 || 1.3 | 0 |
0.00e+00 | 0.00e+00 || 4.2 ||
|| log10abs_ || float || 107.5 || 17.3 | 2 |
6.59e-08 | 2.12e-07 || 6.2 ||
|| log10_ || float || 105.4 || 16.9 | 2 |
6.58e-08 | 2.12e-07 || 6.2 ||
|| log1p_ || float || 149.7 || 18.8 | 1 |
9.16e-09 | 1.19e-07 || 8.0 ||
|| log2abs_ || float || 107.5 || 17.2 | 1 |
1.89e-08 | 1.19e-07 || 6.3 ||
|| log2_ || float || 105.5 || 17.1 | 4 |
1.90e-08 | 2.51e-07 || 6.2 ||
|| logabs_ || float || 107.5 || 23.6 | 1 |
4.36e-08 | 1.19e-07 || 4.6 ||
|| log_ || float || 108.3 || 15.4 | 1 |
9.12e-09 | 1.19e-07 || 7.0 ||
|| mantissa_ || float || 22.1 || 5.0 | 0 |
0.00e+00 | 0.00e+00 || 4.4 ||
|| oneminus_ || float || 3.3 || 0.8 | 0 |
5.16e-10 | 5.96e-08 || 4.0 ||
|| oneplus_ || float || 3.4 || 0.8 | 0 |
2.87e-10 | 5.74e-08 || 4.1 ||
|| rec_ || float || 37.4 || 4.3 | 0 |
2.48e-08 | 5.92e-08 || 8.8 ||
|| round_ || float || 43.7 || 5.5 | 0 |
0.00e+00 | 0.00e+00 || 8.0 ||
|| rsqrt_ || float || 105.4 || 11.3 | 1 |
3.62e-08 | 8.81e-08 || 9.3 ||
|| signedbool_ || float || 9.3 || 0.9 | 0
| nan | 0.00e+00 || 10.6 ||
|| sign_ || float || 12.1 || 2.5 | 0 |
0.00e+00 | 0.00e+00 || 4.9 ||
|| signnz_ || float || 12.1 || 1.4 | 0 |
0.00e+00 | 0.00e+00 || 8.5 ||
|| sinh_ || float || 267.2 || 19.1 | 3 |
2.38e-07 | 3.84e-07 || 14.0 ||
|| sin_ || float || 110.5 || 17.0 | 1 |
2.98e-08 | 1.12e-07 || 6.5 ||
|| sinpi_ || float || 115.8 || 14.4 | 1 |
2.98e-08 | 1.10e-07 || 8.0 ||
|| sqr_ || float || 2.0 || 0.7 | 0 |
2.53e-08 | 5.94e-08 || 2.9 ||
|| sqrt_ || float || 68.3 || 7.1 | 0 |
2.62e-08 | 5.96e-08 || 9.6 ||
|| sqrtabs_ || float || 68.3 || 7.0 | 0 |
2.61e-08 | 5.96e-08 || 9.7 ||
|| tanh_ || float || 206.4 || 20.8 | 197 |
1.63e-07 | 1.77e-05 || 9.9 ||
|| tan_ || float || 153.0 || 17.8 | 2 |
4.18e-08 | 1.45e-07 || 8.6 ||
|| tanpi_ || float || 156.2 || 18.0 | 2 |
4.16e-08 | 1.48e-07 || 8.7 ||
||
---------------------------------------------------------------------------------------------
||

-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk