Boost logo

Boost :

From: Ivan Matek (libbooze_at_[hidden])
Date: 2025-05-20 16:00:40


On Tue, May 20, 2025 at 5:10 PM Joaquin M López Muñoz via Boost <
boost_at_[hidden]> wrote:

> That's a matter of opinion, I guess, but I'd rather have people not
> wanting the fallback write the compile-time check instead of the
> other way around. Sometimes you're not writing a final application
> but a library (say, on top of candidate Boost.Bloom), and you don't
> control compilation flags or target architecture.
>

I guess my concern is that people will assume reading documentation that if
fast_ compiles it uses SIMD. But I see your point.
To be clear what I mean here:

*"but uses faster SIMD-based algorithms when SSE2, AVX2 or Neon are
available". *
User might think: my CPU supports AVX2, so surely it will use SIMD
algorithms. But available here refers to compiler options(and obviously CPU
support when binary is started), not just on CPU support. I know I am not
telling you anything you do not know, I just think large percentage of
users might misunderstand what available means.

> I fail to see any run-time table
> initialization in your original snippet at https://godbolt.org/z/sYfc7rffa
> .
>

I am not a SIMD expert, but is this not creating those variables on stack?
gcc asm
        vbroadcastsd ymm1, qword ptr [rip + .LCPI0_1]
        vmovaps xmm3, xmm1
        vmovaps ymmword ptr [rsp + 64], ymm3
        vpmovsxbq ymm4, dword ptr [rip + .LCPI0_4]
        vmovaps ymmword ptr [rsp + 128], ymm4
        vmovaps ymmword ptr [rsp + 192], ymm1
        vmovaps ymmword ptr [rsp + 256], ymm1
        vmovaps ymmword ptr [rsp + 320], ymm1
        vmovaps ymmword ptr [rsp + 384], ymm1
        vmovaps ymmword ptr [rsp + 448], ymm1

But again my question was mostly about how certain those optimizations are
for Bloom considering huge variety of compilers and compiler options, not
to mention some future refactoring that might trip up the compiler
optimizations. Now I may be just too paranoid, but those variables are not
simple ints so I suspect that is why compilers have a problem computing
them at compile time in my godbolt example, although as you said they do it
successfully for Bloom, and I have verified that in my example code on my
machine compiler optimizes it.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk