|
Boost : |
Subject: Re: [boost] Boost.Endian comments
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-09-08 10:36:31
On Thu, Sep 8, 2011 at 4:42 AM, Pyry Jahkola <pyry.jahkola_at_[hidden]> wrote:
> If you're doing a proper benchmark, Beman, I'd add one more trick to compare
> in the test:
>
> inline void reorder(uint64_t source, uint64_t & target)
> {
> uint64_t step32, step16;
> step32 = source << 32 | source >> 32;
> step16 = (step32 & 0x0000FFFF0000FFFF) << 16
> | (step32 & 0xFFFF0000FFFF0000) >> 16;
> target = (step16 & 0x00FF00FF00FF00FF) << 8
> | (step16 & 0xFF00FF00FF00FF00) >> 8;
> }
Nice!
Because my test setup is currently int32_t, I tested that flavor (and
only on VC++10). Your approach is 27% faster. The assembly code is 8
instructions versus 12 instructions.
Here is the actual code tested:
inline int32_t by_return(int32_t x)
{
return (static_cast<uint32_t>(x) << 24)
| ((static_cast<uint32_t>(x) << 8) & 0x00ff0000)
| ((static_cast<uint32_t>(x) >> 8) & 0x0000ff00)
| (static_cast<uint32_t>(x) >> 24);
}
inline int32_t by_return_pyry(int32_t x)
{
uint32_t step16;
step16 = static_cast<uint32_t>(x) << 16 | static_cast<uint32_t>(x) >> 16;
return
((static_cast<uint32_t>(step16) << 8) & 0xff00ff00)
| ((static_cast<uint32_t>(step16) >> 8) & 0x00ff00ff);
}
The static_casts are important; the results are wrong without them,
and less instructions are generated. The Microsoft compiler is smart
enough to fold "step16 = static_cast<uint32_t>(x) << 16 |
static_cast<uint32_t>(x) >> 16;" into a single "rol ecx, 16"
instruction.
Thanks,
--Beman
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk