|
Boost : |
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-02-26 10:24:02
On Tue, Feb 26, 2008 at 3:04 PM, John Maddock <john_at_[hidden]> wrote:
> John Maddock wrote:
> > 1) Floating point classification routines: these are optimised
> > implementations of the C99 and C++ TR1 functions fpclassify, isinf,
> > isnan, isnormal and isfinite. From Boost-1.35 onwards these are
> > already a part of Boost.Math (see
> > http://svn.boost.org/svn/boost/trunk/libs/math/doc/sf_and_dist/html/math_toolkit/special/fpclass.html)
> > so if accepted the two implementations will get merged.
> >
> > The review here should focus on the implementation used, and testing
> > on whatever platforms you have available - in particular are there any
> > circumstances (compiler optimisation settings etc) where this
> > implementation breaks?
>
> These look fine to me, and as Johan mentioned, any breakages should be fixed
> in endian.hpp.
>
> Sorry to bring this up again (!), but I've been trying to get my head around
> the aliasing rules and what is and isn't allowed, in particular by following
> the rather useful article here:
> http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html
also see: http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
>
> >From this it appears that your current code is perfectly fine, however I
> still can't help feeling that we could do better than call an external
> function (memcpy) to move 4 or 8 bytes. The suggestion in the article is to
> use a union to move from one type to another:
>
> union float_to_int32
> {
> float f;
> boost::uint32_t i;
> };
>
> boost::uint32_t get_bits(float f)
> {
> float_to_int32 u;
> u.f = f;
> return u.i;
> }
>
> I can't claim to have tested exhaustively, but modifying get_bits and
> set_bits to use this method seems to work OK for VC8&9 and Minwg GCC.
>
GCC 4.2 under x86_64 produces better code with std::memcpy (which is
treated as an intrinsic)
than with the union trick (compiled with -O3):
uint32_t get_bits(float f) {
float_to_int32 u;
u.f = f;
return u.i;
}
Generates:
_Z8get_bitsf:
movss %xmm0, -4(%rsp)
movl -4(%rsp), %edx
movl %edx, %eax
ret
Which has an useless "movl %edx, %eax". I think that using the union
confuses the optimizer.
This instead:
uint32_t get_bits2(float f) {
uint32_t ret;
std::memcpy(&ret, &f, sizeof(f));
return ret;
}
Generates:
_Z9get_bits3f:
movss %xmm0, -4(%rsp)
movl -4(%rsp), %eax
ret
Which should be optimal (IIRC you can't move from an xmms register to
an integer register without passing through memory).
Note that the (illegal) code:
uint32_t get_bits3(float f) {
uint32_t ret = *reinterpret_cast<uint32_t*>(&f);
return ret;
}
Generates the same code as get_bits2 if compiled with
-fno-strict-aliasing. Without that flag it miscompiles (rightly) the
code. I've tested the code under plain x86 and there is no difference
between all 3 functions.
So the standard compliant code is also optimal, at least with recent GCCs.
-- gpd
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk