Boost logo

Boost :

From: Ivan Matek (libbooze_at_[hidden])
Date: 2025-05-20 19:07:20


On Tue, May 20, 2025 at 6:31 PM Joaquin M López Muñoz via Boost <
boost_at_[hidden]> wrote:

> > User might think: my CPU supports AVX2, so surely it will use SIMD
> > algorithms. But available here refers to compiler options(and
> > obviously CPU support when binary is started), not just on CPU
> > support. I know I am not telling you anything you do not know, I just
> > think large percentage of users might misunderstand what available means.
>
> Yes, you're right, I can rewite "are available" as "are enabled at compile
> time".
>

Thank you, I believe that is big improvement.

>
> Umm, yes, maybe. Anyway, scratch what I said about compilers
> not really caring about const vs. static const: adding static to your
> snippet severely pessimizes the codegen, with static initialization
> guards and all. So there goes your explanation to why static
> was not used :-)

Yes, I have noticed static messes it up, although for ints
<https://godbolt.org/z/oYM15zYoW> compiler is smart enough to not emit that
guard. That is one of reasons why I am so paranoid this optimization might
stop working with some future compiler.
simd intrisics may be harder for compiler to reason about that "just" ints.

> For the record, during develpment I examined
> the gencode for all fast_multiblockXX classes with the three
> major compilers, Intel and ARM to check that nothing looked bad.
>
I agree that 99% it will never break, since I presume compilers will rarely
regress in this manner... but I still think there is tiny chance they
might. :)

One more question:
I have some handcrafted tests (where bloom filter is so small it fits in
L1/L2 cache, and hit rate of lookups is 0%(beside false positives) ) and
simd one is a bit slower than no simd for certain values of K.

constexpr size_t num_inserted = 10'000;
constexpr double fpr = 1e-5;
constexpr size_t K = 5;
using vanilla_filter = boost::bloom::filter<uint64_t, 1,
boost::bloom::multiblock<uint64_t, K>, 1>;
using simd_filter = boost::bloom::filter<uint64_t, 1,
boost::bloom::fast_multiblock64<K>, 1>;

I presume that is expected since it is hard to make sure SIMD is always
faster, but just wanted to double check with you that this is not a
unexpected result.
So to recap my question: If bloom filter fits in L1 or L2 cache is it best
practice to check if SIMD or normal version is faster instead of assuming
SIMD always wins?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk