Boost logo

Boost :

From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-02-26 10:24:02


On Tue, Feb 26, 2008 at 3:04 PM, John Maddock <john_at_[hidden]> wrote:
> John Maddock wrote:
> > 1) Floating point classification routines: these are optimised
> > implementations of the C99 and C++ TR1 functions fpclassify, isinf,
> > isnan, isnormal and isfinite. From Boost-1.35 onwards these are
> > already a part of Boost.Math (see
> > http://svn.boost.org/svn/boost/trunk/libs/math/doc/sf_and_dist/html/math_toolkit/special/fpclass.html)
> > so if accepted the two implementations will get merged.
> >
> > The review here should focus on the implementation used, and testing
> > on whatever platforms you have available - in particular are there any
> > circumstances (compiler optimisation settings etc) where this
> > implementation breaks?
>
> These look fine to me, and as Johan mentioned, any breakages should be fixed
> in endian.hpp.
>
> Sorry to bring this up again (!), but I've been trying to get my head around
> the aliasing rules and what is and isn't allowed, in particular by following
> the rather useful article here:
> http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html

also see: http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html

>
> >From this it appears that your current code is perfectly fine, however I
> still can't help feeling that we could do better than call an external
> function (memcpy) to move 4 or 8 bytes. The suggestion in the article is to
> use a union to move from one type to another:
>
> union float_to_int32
> {
> float f;
> boost::uint32_t i;
> };
>
> boost::uint32_t get_bits(float f)
> {
> float_to_int32 u;
> u.f = f;
> return u.i;
> }
>
> I can't claim to have tested exhaustively, but modifying get_bits and
> set_bits to use this method seems to work OK for VC8&9 and Minwg GCC.
>

GCC 4.2 under x86_64 produces better code with std::memcpy (which is
treated as an intrinsic)
than with the union trick (compiled with -O3):

  uint32_t get_bits(float f) {
      float_to_int32 u;
      u.f = f;
      return u.i;
  }

Generates:

  _Z8get_bitsf:
        movss %xmm0, -4(%rsp)
        movl -4(%rsp), %edx
        movl %edx, %eax
        ret

Which has an useless "movl %edx, %eax". I think that using the union
confuses the optimizer.
This instead:

  uint32_t get_bits2(float f) {
    uint32_t ret;
    std::memcpy(&ret, &f, sizeof(f));
    return ret;
  }

Generates:

   _Z9get_bits3f:
        movss %xmm0, -4(%rsp)
        movl -4(%rsp), %eax
        ret

Which should be optimal (IIRC you can't move from an xmms register to
an integer register without passing through memory).

Note that the (illegal) code:

  uint32_t get_bits3(float f) {
    uint32_t ret = *reinterpret_cast<uint32_t*>(&f);
    return ret;
 }

Generates the same code as get_bits2 if compiled with
-fno-strict-aliasing. Without that flag it miscompiles (rightly) the
code. I've tested the code under plain x86 and there is no difference
between all 3 functions.

So the standard compliant code is also optimal, at least with recent GCCs.

-- 
gpd

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk