
El 06/06/2025 a las 22:06, Ivan Matek escribió:
On Fri, Jun 6, 2025 at 9:02 PM Joaquin M López Muñoz via Boost <boost@lists.boost.org> wrote:
Feedback welcome,
Just a general question about benchmarks, not this one in particular. If you are doing something in a loop(e.g. this one <https://github.com/joaquintides/bloom/blob/1f0f953196c8e16e1c5fd8f62e18f4883dbcd44e/benchmark/comparison_table.cpp#L161>) did you check compiler did not unroll it in one case and not in another. You might say this is "natural" in a sense you did not tell compiler to unroll or not unroll the loop, but I would disagree partially since in real code it is unlikely that people will always iterate over an array of contiguous data, in real code they might get some data parse it, then do one lookup, get more data, parse it, do one lookup... I am not saying this is likely to happen, but may be worth checking. because long time ago I was playing around with benchmarking some hash map code and "weird" results that did not made sense were caused because compiler unrolled loop in one case, and not in another. After I did #pragma <https://releases.llvm.org/4.0.0/tools/clang/docs/AttributeReference.html#pragma-unroll-pragma-nounroll> to disable loop unrolling results made sense.
I haven't examined the codegen so I can't tell for sure. My intuition is that loop unrolling won't take place because the body of the loop is not trivial and unrolling would add a lot of pressure to the instruction cache and take up too many registers for effective pipelining. Anyway, why don't you run it locally and play with the #pragmas? Besides, I'm interested in results outside my local machine and GHA. You just have to compile this in release mode (note the repo branch): https://github.com/joaquintides/bloom/blob/feature/alternative-hash-producti... and execute with ./a.out 1000000 (for 1M elements, you can try 10M elements or any other value as well). Thank you! Joaquin M Lopez Munoz