Re: [boost] [bloom] Benchmarks with Knuth multiplier-based hash production

11 Jun 2025


      El 10/06/2025 a las 14:52, Ivan Matek escribió:
...
One more thing I noticed/checked and I believe may be of interest, 
again not in particular for this proposed Knuth multiplier change, but 
in general.
Some filters have much weaker performance for mixed lookup.
To be honest I have expected that for some(branch misprediction) but I 
do not know enough about all filters to say if slowdowns I have 
observed are as expected.
Interesting that some fast_ have bad performance for mixed lookup, 
although naively I would assume SIMD code is branchless. I presume the 
issue is that sometimes we operate on more than 256 bits so there is 
branching...
fast_multiblock* has branches (early termination on lookup)
at k = 8, 16, ..., that is, whenever a full subblock (a pair of __m128is,
a __m256i, a pair of __m256is or a uint32x4x2_t, depending on the
subfilter and the SIMD technology) has been processed.
...
What I find interesting beside this slowdown is that 
unordered_flat_set  also has bad performance for mixed lookup so bloom 
filters that do not degrade on mixed lookup look much better than if 
we just compare 100%/0% cases.
For example if we round performance to 1 decimal digit and compare
unordered_flat set
4.7/3.1/*9.4*
vs
filter<int, 1ul, block<unsigned long, 7ul>, 1ul, hash<int>, 
allocator<int>, mcg_and_fastrange>
3.3/3.3/*3.3*
comparison looks much much different if we drop mixed 
performance(bolded part).
Unlike fast_multiblock*, [multi]block is branchless, so your results
make sense. I wonder if multiblock<K> should be made branchful as
the working memory area can get quite large for large values of K.

Joaquin M Lopez Munoz

Re: [boost] [bloom] Benchmarks with Knuth multiplier-based hash production

Joaquin M López Muñoz