Boost logo

Boost :

Subject: Re: [boost] Boost.Endian comments
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-09-08 10:36:31


On Thu, Sep 8, 2011 at 4:42 AM, Pyry Jahkola <pyry.jahkola_at_[hidden]> wrote:

> If you're doing a proper benchmark, Beman, I'd add one more trick to compare
> in the test:
>
>   inline void reorder(uint64_t source, uint64_t & target)
>   {
>       uint64_t step32, step16;
>       step32 = source << 32 | source >> 32;
>       step16 = (step32 & 0x0000FFFF0000FFFF) << 16
>              | (step32 & 0xFFFF0000FFFF0000) >> 16;
>       target = (step16 & 0x00FF00FF00FF00FF) << 8
>              | (step16 & 0xFF00FF00FF00FF00) >> 8;
>   }

Nice!

Because my test setup is currently int32_t, I tested that flavor (and
only on VC++10). Your approach is 27% faster. The assembly code is 8
instructions versus 12 instructions.

Here is the actual code tested:

  inline int32_t by_return(int32_t x)
  {
    return (static_cast<uint32_t>(x) << 24)
      | ((static_cast<uint32_t>(x) << 8) & 0x00ff0000)
      | ((static_cast<uint32_t>(x) >> 8) & 0x0000ff00)
      | (static_cast<uint32_t>(x) >> 24);
  }

  inline int32_t by_return_pyry(int32_t x)
  {
    uint32_t step16;
    step16 = static_cast<uint32_t>(x) << 16 | static_cast<uint32_t>(x) >> 16;
    return
        ((static_cast<uint32_t>(step16) << 8) & 0xff00ff00)
      | ((static_cast<uint32_t>(step16) >> 8) & 0x00ff00ff);
  }

The static_casts are important; the results are wrong without them,
and less instructions are generated. The Microsoft compiler is smart
enough to fold "step16 = static_cast<uint32_t>(x) << 16 |
static_cast<uint32_t>(x) >> 16;" into a single "rol ecx, 16"
instruction.

Thanks,

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk