|
Boost : |
Subject: Re: [boost] Interest in a Bloom Filter?
From: Milosz Marian HULBOJ (mhulboj_at_[hidden])
Date: 2009-06-16 04:26:17
Manuel Fiorelli wrote:
> There is no way to express that the i-th hash depends on the previous
> hashes other than recompute them inside the i-th hash function.
I've got the same impression.
> I suspect that the interface could be changed to be like the following
>
> template<class Input, class HashingPolicy, class Hasher>
> class bloom_filter;
>
> The Hashing policy must compute the K hash functions, using the value produced
> by the hash function described by the hasher.
>
> If we want to accomodate an HashingPolicy using M different hash
> functions, Hasher could be
> a fusion vector, possibly with less elements than K.
Seems sound, but it would be good to list the typical use cases and work
out the best generic interface for them. So far we have got following
basic cases:
Independent case:
A simple case with distinct and independent hash functions that can be
evaluated separately:
h_1(x), h_2(x), h_3(x), ...
Dependent cases:
B case when from one hash value (i.e. 64bit) we can get multiple hash
values (i.e. 4*16bit):
h(x) -> {h_1(x), h_2(x), h_3(x), ...}
C hash_combine - one hash function and the combining method
D double-hashing schema
(http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf)
h_i(x) = h_1(x) + i*h_2(x) + optional
In this schema one can provide:
- two different hash functions (A)
- one hash function and use approach (B)
- use one hash function and do something along hash_combine (C)
other, less popular (?)
D partitioned bloom filters, combinations of above, other?
E different types of bloom filters:
- bloomier filters
- compressed bloom filter
- ...
> PS: I don't know if this less elegant design could really determine
> any advantage, or even if it is buggy :-)
Neither do I, small brainstorming with use cases is likely to help...
Regards,
Milosz
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk