Re: [boost] [Bloom] Some questions

20 May 2025

      On Tue, May 20, 2025 at 5:10 PM Joaquin M López Muñoz via Boost <
boost@lists.boost.org> wrote:
...
That's a matter of opinion, I guess, but I'd rather have people not
wanting the fallback write the compile-time check instead of the
other way around. Sometimes you're not writing a final application
but a library (say, on top of candidate Boost.Bloom), and you don't
control compilation flags or target architecture.
I guess my concern is that people will assume reading documentation that if
fast_ compiles it uses SIMD. But I see your point.
To be clear what I mean here:

*"but uses faster SIMD-based algorithms when SSE2, AVX2 or Neon are
available". *
User might think: my CPU supports AVX2, so surely it will use SIMD
algorithms. But available here refers to compiler options(and obviously CPU
support when binary is started), not just on CPU support. I know I am not
telling you anything you do not know, I just think large percentage of
users might misunderstand what available means.
...
I fail to see any run-time table
initialization in your original snippet at https://godbolt.org/z/sYfc7rffa
.
I am not a SIMD expert, but is this not creating those variables on stack?
gcc asm
        vbroadcastsd    ymm1, qword ptr [rip + .LCPI0_1]
        vmovaps xmm3, xmm1
        vmovaps ymmword ptr [rsp + 64], ymm3
        vpmovsxbq       ymm4, dword ptr [rip + .LCPI0_4]
        vmovaps ymmword ptr [rsp + 128], ymm4
        vmovaps ymmword ptr [rsp + 192], ymm1
        vmovaps ymmword ptr [rsp + 256], ymm1
        vmovaps ymmword ptr [rsp + 320], ymm1
        vmovaps ymmword ptr [rsp + 384], ymm1
        vmovaps ymmword ptr [rsp + 448], ymm1

But again my question was mostly about how certain those optimizations are
for Bloom considering huge variety of compilers and compiler options, not
to mention some future refactoring that might trip up the compiler
optimizations. Now I may be just too paranoid, but those variables are not
simple ints so I suspect that is why compilers have a problem computing
them at compile time in my godbolt example, although as you said they do it
successfully for Bloom, and I have verified that in my example code on my
machine compiler optimizes it.

Re: [boost] [Bloom] Some questions

Ivan Matek