Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2008-02-26 11:49:51


Giovanni Piero Deretta wrote:
> also see: http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html

Thanks, will take a look.

>
> GCC 4.2 under x86_64 produces better code with std::memcpy (which is
> treated as an intrinsic)
> than with the union trick (compiled with -O3):
>
> uint32_t get_bits(float f) {
> float_to_int32 u;
> u.f = f;
> return u.i;
> }
>
> Generates:
>
> _Z8get_bitsf:
> movss %xmm0, -4(%rsp)
> movl -4(%rsp), %edx
> movl %edx, %eax
> ret
>
> Which has an useless "movl %edx, %eax". I think that using the union
> confuses the optimizer.
> This instead:
>
> uint32_t get_bits2(float f) {
> uint32_t ret;
> std::memcpy(&ret, &f, sizeof(f));
> return ret;
> }
>
> Generates:
>
> _Z9get_bits3f:
> movss %xmm0, -4(%rsp)
> movl -4(%rsp), %eax
> ret
>
> Which should be optimal (IIRC you can't move from an xmms register to
> an integer register without passing through memory).
>
> Note that the (illegal) code:
>
> uint32_t get_bits3(float f) {
> uint32_t ret = *reinterpret_cast<uint32_t*>(&f);
> return ret;
> }
>
> Generates the same code as get_bits2 if compiled with
> -fno-strict-aliasing. Without that flag it miscompiles (rightly) the
> code. I've tested the code under plain x86 and there is no difference
> between all 3 functions.
>
> So the standard compliant code is also optimal, at least with recent
> GCCs.

Very interesting!

In that case I think we should document what Johan has now, to avoid this
comming up again in the future :-)

Thanks! John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk