
El 08/06/2025 a las 17:13, Ivan Matek escribió:
On Sat, Jun 7, 2025 at 8:05 PM Joaquin M López Muñoz via Boost <boost@lists.boost.org> wrote:
Anyway, why don't you run it locally and play with the #pragmas?
Because when I quickly go to benchmark something 9 hours later I am just quickly benchmarking something :) Also assuring reproducibility is pain, e.g. I do not have unused machine on which I can SSH into, to avoid my browser use or random background process messing with benchmark, especially considering bloom uses L3 cache a lot.
Hey, thanks so much for running the benchmarks! Yes, variance hurts analysis. I'm plannning to move my GHA-based benchmarks to dedicated machines so that results are more stable.
Besides, I'm interested in results outside my local machine and GHA. You just have to compile this in release mode (note the repo branch):
https://github.com/joaquintides/bloom/blob/feature/alternative-hash-producti...
Well it was more complicated since I already have modular boost on my machine so I had to do some hacks to get CMakeLists.txt to work and also benchmark did not have CMakeLists.txt, and also I did use march=native, mtune=native instead of what your scripts do...
But to quickly recap:
1. There seems to be no unrolling happening without me doing it with pragmas. 2. I have increased constants to reduce chance of noise affecting results: - static const int num_trials=10; - static const milliseconds min_time_per_trial(10); + static const int num_trials=20; + static const milliseconds min_time_per_trial(50); 3. I did this to make tables more aligned: - "<table>\n" + "<table style=\"font-family: monospace\">\n" 4. In terms of benchmark setup I would add 5% of "opposite" lookups(e.g. success in failures) since I presume current setup does not penalize branchy code as realistic scenarios would(although it is possible real code might also might have close to 100% of successes or failures). Just to be clear: I did not make this change. 5. I would suggest to to consider switching benchmark repo to use native instead of mavx2
So, unrolling does not happen, this is out of the way, thanks for investigating. I'll use -native as you suggest. As for the difference between the original hash production scheme and the one proposed by Kostas (cells marked with *), numbers are not very conclusive, but looks like Kostas's approach incurs a slight degradation in execution time. I hope we can see this more clearly with the upcoming GHA benchmarks on dedicated machines. Joaquin M Lopez Munoz