Boost logo

Boost :

Subject: Re: [boost] Interest in a Bloom Filter?
From: Milosz Marian HULBOJ (mhulboj_at_[hidden])
Date: 2009-06-15 07:21:29


Dean Michael Berris wrote:
> Hi Milosz,
>
> On Mon, Jun 15, 2009 at 2:49 PM, Milosz Marian HULBOJ<mhulboj_at_[hidden]> wrote:
>> - It would be nice to have the option of using `double-hashing' schema,
>> where all the hashes are generated according to:
>> hash(i) = h1(x) + i*h2(x)
>> or the squared/cubed variant:
>> hash(i) = h1(x) + i*h2(x) + i*i (*i)
>>
>
> This is interesting. I'd think this would have to be implemented by
> the user of the bloom_filter in the HashFunctions template parameter.
>
>> - one hash value can be potentially used to generate many hash values. For
>> example, if you need 16-bit hashes, then from the hash function producing
>> 32-bit hash-value you could get 2 values. This can be in turn combined with
>> the previous point - double hashing.

Hi Dean,

The bottom line of the two points above is that in order to hash for the
bloom filter (in the double-hashing schema) one only needs to:
- invoke hash function once
- split the bits and do few multiplications and additions
Still I didn't have time to look into yours implementation to see
whether the above can be used seamlessly.

There are also variants of double hashing to be more cache-line
friendly, but personally the idea presented above was sufficient for my
needs (for the hashing functions I've been using the TR1 hash and murmur
hash).

Regards,
Milosz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk