|
Boost : |
Subject: Re: [boost] [beast] Formal review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2017-07-11 16:59:25
Peter Dimov wrote:
> Phil Endecott wrote:
>> > That hurts 32-bit ARM.
>>
>> I think that's an issue with whatever compiler you're using, not the
>> architecture; I've just done a quick test with arm-linux-gnueabihf-g++-6
>> 6.3.0 and I get about a 5% speedup by using memcpy.
>
> No, it's an issue with ARM32 not allowing unaligned loads. The memcpy code,
> at best, uses four byte loads, instead of one 32 bit one.
> __builtin_assume_aligned doesn't help.
>
> https://godbolt.org/g/iC9X35
That link only seems to redirect to about:blank for me, for some reason.
For this code:
unsigned long f(const uint8_t* p)
{
#if 1
unsigned long w;
memcpy(&w, p, sizeof(w));
return w;
#else
return * reinterpret_cast<const unsigned long*>(p);
#endif
}
For me, with 32-bit ARM g++ 6.3.0, I get the same trivial code for both
cases:
ldr r0, [r0]
bx lr
It seems that gcc has options to control this; quoting from
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html :
-munaligned-access
-mno-unaligned-access
Enables (or disables) reading and writing of 16- and 32- bit
values from addresses that are not 16- or 32- bit aligned. By
default unaligned access is disabled for all pre-ARMv6, all
ARMv6-M and for ARMv8-M Baseline architectures, and enabled for
all other architectures. If unaligned access is not enabled then
words in packed data structures are accessed a byte at a time.
If I use -mno-unaligned-access, I get a call to the library memcpy():
push {lr}
sub sp, sp, #12
mov r1, r0
movs r2, #4
add r0, sp, r2
bl memcpy(PLT)
ldr r0, [sp, #4]
add sp, sp, #12
@ sp needed
ldr pc, [sp], #4
But for me, __builtin_assume_aligned *does* help:
unsigned long f(const uint8_t* p)
{
unsigned long w;
memcpy(&w, __builtin_assume_aligned(p,sizeof(w)), sizeof(w));
return w;
}
That gives the trivial code even when I use -mno-unaligned-access.
> Myself, I'd go with the reinterpret_cast for the time being. It's indeed
> _technically UB_, but it works.
Well, whatever. The is the "Beast" review. My opinion is still that the
code is over-optimised, and that this has made it more likely to be buggy
and insecure - and the claimed performance benefits may not be as great
as claimed.
Regards, Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk