
On Sat, Jun 7, 2025 at 8:05 PM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:
Anyway, why don't you run it locally and play with the #pragmas?
Because when I quickly go to benchmark something 9 hours later I am just quickly benchmarking something :) Also assuring reproducibility is pain, e.g. I do not have unused machine on which I can SSH into, to avoid my browser use or random background process messing with benchmark, especially considering bloom uses L3 cache a lot.
Besides, I'm interested in results outside my local machine and GHA. You just have to compile this in release mode (note the repo branch):
https://github.com/joaquintides/bloom/blob/feature/alternative-hash-producti...
Well it was more complicated since I already have modular boost on my machine so I had to do some hacks to get CMakeLists.txt to work and also benchmark did not have CMakeLists.txt, and also I did use march=native, mtune=native instead of what your scripts do... But to quickly recap: 1. There seems to be no unrolling happening without me doing it with pragmas. 2. I have increased constants to reduce chance of noise affecting results: - static const int num_trials=10; - static const milliseconds min_time_per_trial(10); + static const int num_trials=20; + static const milliseconds min_time_per_trial(50); 3. I did this to make tables more aligned: - "<table>\n" + "<table style=\"font-family: monospace\">\n" 4. In terms of benchmark setup I would add 5% of "opposite" lookups(e.g. success in failures) since I presume current setup does not penalize branchy code as realistic scenarios would(although it is possible real code might also might have close to 100% of successes or failures). Just to be clear: I did not make this change. 5. I would suggest to to consider switching benchmark repo to use native instead of mavx2 my tests were of form: taskset --cpu-list 0 {binary} {number} >> {description}.html cpu was i7-13700H, core speed was not locked, range between 3.2 and 3.8GHz, it is possible avx code was affecting cpu speed, but did not check, could be just accumulated heat. flags: FLAGS = -O3 -DNDEBUG -fcolor-diagnostics -march=native -mtune=native I have attached 2 runs so you can see the noise of measurement on my machine. I have also attached one unrolled run, just to see it can cause difference, but as I said this does not matter much since by default clang does not unroll.