Boost logo

Boost :

From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-12 11:48:11


El 12/05/2025 a las 12:06, Ivan Matek escribió:
>
>
> On Sun, May 11, 2025 at 9:01 PM Joaquin M López Muñoz via Boost
> <boost_at_[hidden]> wrote:
>
>
> https://github.com/joaquintides/perfect_range_hash
>
>
> Thank you for this. Unfortunately even this basic math is a bit too
> complex for me
> so few small comments:
>
> 1. stylistically I would link to particular hash of source code,
> although documentation
> link is nicer. Reason is that documentation link can die

I can relink to a more stable doc if the lib is accepted.

> 2. afaik C is not randomly chosen between 0<= C <= 2^64, i.e. it is
> always an odd number.

You're right. We'd have to ask the author, but I guess the "| 1" in
"hash_mult = uniform_dist(rnd) | 1" is there to avoid the (very
unlikely) case
hash_mult = 0. In any case, this small adjustment doesn't change the big
picture
statistically --the output distribution is largely dominated by C/B,
where C is
the multiplier and B is 2^(64-m), so the least significant bit of C is
mostly
swallowed along the way.

> 3. in my brief experiments I tried to add large gap, and randomly pick
> subset of types,
> and performance was still amazing.
>
> Example for 3.: [...]

This case of 231 type_ids scattered in a range of 3000 positions is
lower-bounded
by (has better chances than) the baseline case with 3000 adjacent
positions, which
has a probability 0.14 of finding a perfect hash multiplier for 4096
buckets. Note
that a "hard" case would involve having your type_ids distributed across the
entire 64-bit space of possible numerical values, and in the example you
wrote
the values are confined to a range that is around 10^14 times smaller
than 2^64
--this is the fact that the algorithm takes an implicit advantage of.

Joaquin M Lopez Munoz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk