Re: [boost] [bloom] Benchmarks with Knuth multiplier-based hash production

15 Jun 2025


      El 14/06/2025 a las 19:32, Ivan Matek escribió:
...
One more datapoint that confirms what you suggested(and what 
intuitively makes sense), that for
large K effect of breaking early for unsuccessful lookups looks more 
impressive(when filter fits in L2).
[...]
There is also another interesting fact about performance that will 
take a bit to explain so I ask for patience:
fast_multiblock<8> is much faster than fast_block<11> for 
unsuccessful_lookup_time, this does not
initially make sense because we would expect we execute same amount of 
instructions, and branch is
easily predictable since 100% of lookups fail(beside FPR).
I am not asm expert, but it seems clang interleaves SIMD for for 11 
case so some stuff that is not strictly
necessary before branch is done before branch. Now this is great when 
we do not branch(early return)
but is waste when we have close to 100% unsuccessful lookups.
Yes, this is odd. If your thesis is correct, the budget optimizer of the 
compiler seems
to have decided to go branchless despite what the code says.
...
This made me investigate a bit and on my machine I can get really 
great performance for
unsuccessful_lookup_time when I combine SIMD and scalar filter for 
"remainder"
(e.g. SIMD 8 , scalar 3 for K==11).
This is an idea that deserves some investigation. When K%8 is small, 
going scalar could
actually be faster than SIMD. This is what your experiments seem to 
suggest anyway.
...
This combined filter(bobw_filter) has really great performance for 
unsuccessful lookups[...]
[...]
I understand that mixing 2 filters instead of modifying one to have 
mix of simd and scalar code
is not really greatest comparison, but it was much easier for me to 
make combined filter, than
to figure out how to mix scalar and simd filter code in one filter.
If you want to keep on playing, please find attached a sketch of a 
bobw_multiblock64_avx2
filter for your enjoyment. (Why the "bobw" preffix btw?)

Joaquin M Lopez Munoz

Re: [boost] [bloom] Benchmarks with Knuth multiplier-based hash production

Joaquin M López Muñoz