Boost logo

Boost :

From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-20 16:31:14


El 20/05/2025 a las 18:00, Ivan Matek escribió:
>
>
> On Tue, May 20, 2025 at 5:10 PM Joaquin M López Muñoz via Boost
> <boost_at_[hidden]> wrote:
>
> That's a matter of opinion, I guess, but I'd rather have people not
> wanting the fallback write the compile-time check instead of the
> other way around. Sometimes you're not writing a final application
> but a library (say, on top of candidate Boost.Bloom), and you don't
> control compilation flags or target architecture.
>
>
> I guess my concern is that people will assume reading documentation
> that if fast_ compiles it uses SIMD. But I see your point.
> To be clear what I mean here:
> /"but uses faster SIMD-based algorithms when SSE2, AVX2 or Neon are
> available".
> /
> User might think: my CPU supports AVX2, so surely it will use SIMD
> algorithms. But available here refers to compiler options(and
> obviously CPU support when binary is started), not just on CPU
> support. I know I am not telling you anything you do not know, I just
> think large percentage of users might misunderstand what available means.

Yes, you're right, I can rewite "are available" as "are enabled at compile
time".

>
>
>  I fail to see any run-time table
> initialization in your original snippet at
> https://godbolt.org/z/sYfc7rffa .
>
>
> I am not a SIMD expert, but is this not creating those variables on
> stack?
> gcc asm
> vbroadcastsdymm1, qwordptr[rip+ .LCPI0_1]
> vmovapsxmm3, xmm1
> vmovapsymmwordptr[rsp+ 64], ymm3
> vpmovsxbqymm4, dwordptr[rip+ .LCPI0_4]
> vmovapsymmwordptr[rsp+ 128], ymm4
> vmovapsymmwordptr[rsp+ 192], ymm1
> vmovapsymmwordptr[rsp+ 256], ymm1
> vmovapsymmwordptr[rsp+ 320], ymm1
> vmovapsymmwordptr[rsp+ 384], ymm1
> vmovapsymmwordptr[rsp+ 448], ymm1

Umm, yes, maybe. Anyway, scratch what I said about compilers
not really caring about const vs. static const: adding static to your
snippet severely pessimizes the codegen, with static initialization
guards and all. So there goes your explanation to why static
was not used :-) For the record, during develpment I examined
the gencode for all fast_multiblockXX classes with the three
major compilers, Intel and ARM to check that nothing looked bad.

Joaquin M Lopez Munoz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk