Boost logo

Boost :

Subject: Re: [boost] [beast] Formal review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2017-07-11 16:59:25


Peter Dimov wrote:
> Phil Endecott wrote:
>> > That hurts 32-bit ARM.
>>
>> I think that's an issue with whatever compiler you're using, not the
>> architecture; I've just done a quick test with arm-linux-gnueabihf-g++-6
>> 6.3.0 and I get about a 5% speedup by using memcpy.
>
> No, it's an issue with ARM32 not allowing unaligned loads. The memcpy code,
> at best, uses four byte loads, instead of one 32 bit one.
> __builtin_assume_aligned doesn't help.
>
> https://godbolt.org/g/iC9X35

That link only seems to redirect to about:blank for me, for some reason.

For this code:

unsigned long f(const uint8_t* p)
{
#if 1
   unsigned long w;
   memcpy(&w, p, sizeof(w));
   return w;
#else
   return * reinterpret_cast<const unsigned long*>(p);
#endif
}

For me, with 32-bit ARM g++ 6.3.0, I get the same trivial code for both
cases:

        ldr r0, [r0]
        bx lr

It seems that gcc has options to control this; quoting from
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html :

  -munaligned-access
  -mno-unaligned-access

     Enables (or disables) reading and writing of 16- and 32- bit
     values from addresses that are not 16- or 32- bit aligned. By
     default unaligned access is disabled for all pre-ARMv6, all
     ARMv6-M and for ARMv8-M Baseline architectures, and enabled for
     all other architectures. If unaligned access is not enabled then
     words in packed data structures are accessed a byte at a time.

If I use -mno-unaligned-access, I get a call to the library memcpy():

        push {lr}
        sub sp, sp, #12
        mov r1, r0
        movs r2, #4
        add r0, sp, r2
        bl memcpy(PLT)
        ldr r0, [sp, #4]
        add sp, sp, #12
        @ sp needed
        ldr pc, [sp], #4

But for me, __builtin_assume_aligned *does* help:

unsigned long f(const uint8_t* p)
{
   unsigned long w;
   memcpy(&w, __builtin_assume_aligned(p,sizeof(w)), sizeof(w));
   return w;
}

That gives the trivial code even when I use -mno-unaligned-access.

> Myself, I'd go with the reinterpret_cast for the time being. It's indeed
> _technically UB_, but it works.

Well, whatever. The is the "Beast" review. My opinion is still that the
code is over-optimised, and that this has made it more likely to be buggy
and insecure - and the claimed performance benefits may not be as great
as claimed.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk