![]() |
Boost : |
From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-30 16:54:36
El 30/05/2025 a las 18:27, Peter Dimov via Boost escribió:
> Kostas Savvidis wrote:
>>> On 29 May 2025, at 20:37, Joaquin M López Muñoz via Boost
>> <boost_at_[hidden]> wrote:
>>> given two sequences of hash values hi and gi, we're not interested in
>>> internal correlation of either hi or gi, but cross correlation between
>>> the two sequences.
>> I am actually not sure which is more important for a Bloom filter,
>> autocorrelation or cross-correlation.
>>
>> What is known is that the cross-correlation between two sequences in any
>> MCG or LCG ( x' = a * x mod k) is even worse than the autocorrelation.
>> It goes like this: if in the first bucket h1 = 2*g1, then hi=2*gi for all i or
>> buckets. This is independent of a and k. Same if you replace the "2" with "3"
>> etc.
>> Effectively there is 100% correletion between buckets. One cannot fix that
>> even with a better multiplier.
> It's not clear to me why this would be a problem, but if it is, it can be fixed
> by using an LCG (a*x+b) instead of an MCG (a*x).
Umm, I like the LCG idea as an additional sum is basically free. Kostas,
would a
smart choice of b improve the statistical properties of thhe procedure?
Note that
we can afford determining b as a function of a (here a=m where m is the
capacity
of the array).
Joaquin M Lopez Munoz
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk