![]() |
Boost : |
From: Ivan Matek (libbooze_at_[hidden])
Date: 2025-05-12 13:24:09
On Mon, May 12, 2025 at 1:48â¯PM Joaquin M López Muñoz via Boost <
boost_at_[hidden]> wrote:
> El 12/05/2025 a las 12:06, Ivan Matek escribió:
> You're right. We'd have to ask the author, but I guess the "| 1" in
> "hash_mult = uniform_dist(rnd) | 1" is there to avoid the (very
> unlikely) case
> hash_mult = 0. In any case, this small adjustment doesn't change the big
> picture
>
I think reason people pick odd number is because if your multiplier is
power of 2 last bits are guaranteed to be 0. Not sure that matters a lot
here since we only use few top bits.
https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
* the number 11400714819323198486 is closer but we donât want multiples of
two because that would throw away one bit)*
> statistically --the output distribution is largely dominated by C/B,
> where C is
> the multiplier and B is 2^(64-m), so the least significant bit of C is
> mostly
> swallowed along the way.
>
>
> This case of 231 type_ids scattered in a range of 3000 positions is
> lower-bounded
> by (has better chances than) the baseline case with 3000 adjacent
> positions, which
> has a probability 0.14 of finding a perfect hash multiplier for 4096
> buckets. Note
> that a "hard" case would involve having your type_ids distributed across
> the
> entire 64-bit space of possible numerical values, and in the example you
> wrote
> the values are confined to a range that is around 10^14 times smaller
> than 2^64
>
Thank you for explanation. For me it is interesting even if they are not
consecutive, but bounded within "small"( compared to 64bit range) interval
perfect hashing still works great.
That was surprising to me, but I guess even noneven distribution among
small range of values can be nicely hashed.
To think a bit more broadly, not just in terms of openmethods... this is
quite interesting to me, e.g. does that mean that randomly picked
short(16bit) numbers can be hashed much better(assuming "proper" 64 bit
hash, even though that is not what many STL implementations do, i.e. they
use identity <https://godbolt.org/z/bYYT3M5j1>) than same number of
randomly picked int(32 bit) numbers...
What about randomly picked int vs randomly picked positive int(not unsigned
int, but int whose value is positive)...
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk