|
Boost : |
From: Ivan Matek (libbooze_at_[hidden])
Date: 2024-12-06 18:52:13
Hi Vinnie,
did not know pasting image will get me sent straight to moderator approval
due to message size, apologies, here is my message with image removed.
On Fri, Dec 6, 2024 at 7:48â¯PM Ivan Matek <libbooze_at_[hidden]> wrote:
>
>
> On Fri, Dec 6, 2024 at 7:19â¯PM Vinnie Falco <vinnie.falco_at_[hidden]>
> wrote:
>
>>
>>> How?
>>
>> maybe we are not talking about same situation, but this is what I meant,
> godbolt <https://godbolt.org/z/K8xEjEoMK> link
>
If you look at
nt f_span_static<3ul>(std::__1::span<int, 3ul>):
mov eax,DWORD PTR [rdi+0x4]
add eax,DWORD PTR [rdi]
add eax,DWORD PTR [rdi+0x8]
ret
nop DWORD PTR [rax+0x0]
int f_span_static<4ul>(std::__1::span<int, 4ul>):
movdqu xmm0,XMMWORD PTR [rdi]
pshufd xmm1,xmm0,0xee
paddd xmm1,xmm0
pshufd xmm0,xmm1,0x55
paddd xmm0,xmm1
movd eax,xmm0
ret
> you will see that compiler will generate "specialized" functions for each
> different size of span with non dynamic extent. Here you can see how he
> implemented summation for 3 and 4 integers in different ways.
>
> This is great for performance and makes checking easier since as Peter
> explained compiler knows more, but it creates larger binaries(generally
> speaking, I know compilers are smart, can inline, for 2 instantiations does
> not really matter, etc).
> Function taking dynamic span or runtime specified n will probably be
> slower because it does normal loop, but there is only one copy of it in
> resulting assembly.
>
> There is a real life example of this here in fmt. note link gives
> certificate error and I did not manage to find another link
> https://vitaut.net/posts/2020/reducing-library-size/
>
>
>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk