Boost logo

Boost :

Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-06-09 11:57:17


Hi Vicente!

On Tue, Jun 9, 2009 at 9:49 PM, Vicente Botet
Escriba<vicente.botet_at_[hidden]> wrote:
>
>>
>> On Tue, Jun 9, 2009 at 12:52 PM, Dean Michael
>> Berris<mikhailberis_at_[hidden]> wrote:
>>>
>>> Yes, I have but I've put a premium at type safety and putting M and K
>>> as part of the type instead of being a runtime value. The reasons are:
>>>
>>> 1. Specifying a static M allows the use of the standard bitset instead
>>> of Boost's dynamic_bitset.
>>> 2. A static K allows for enforcing the number of hash functions at
>>> compile time instead of at runtime.
>>> 3. Since M and K are not variable and are part of the type, it makes
>>> little sense for me to have either of them set at construction time.
>>>
>>> One thing I might be amenable to is a static K and a runtime M, but
>>> that makes the hash functions unable to determine statically the
>>> bounds of the bitset (M). I can change the concept for the hash
>>> function, but the simplicity of size_t(Input) just makes it easier for
>>> users I think.
>>
>> I thought about it a little more and I don't think I like the idea of
>> a separate static and dynamic bloom_filter. If there are more reasons
>> to allow for dynamically growing bloom filters, then I might support
>> this.
>>
>
> Two bloom_filters with different hash functions are not comparable.
> Don't you think that the hash functions make part of the bloom_filter type?
>
>    bloom_filter<M, hash_1, ... hash_k>bf;
>

Thinking about it a little more, yes this makes sense. However,
pending the formalization of variadic templates, I'm leaning towards
supporting Fusion sequences in the collection of hash functions:

    bloom_filter<Input, M, Sequence> bf;

This way I can deduce K from size<Sequence> and maybe even derive the
bloom_filter linearly from the Sequence elements (to optimize the
storage, using the compressed_pair trick). I'm still mulling it around
in my head, and will be trying to implement this in the morning (PH
time).

My only worry with this is that I'd have to define the Concept for
hash function types. It shouldn't be too hard especially if I'll build
upon the Boost.Hash concepts (if they're applicable).

>>
>> Done, but I've run into the issue of not being able to compare
>> boost::function objects of the same type -- for some reason this has
>> been disabled in Boost Trunk.
>>
>>
>
> If hash functions make part of the bloom_filter type there is no issue. No
> need to compare the hash functions at run-time. The compiler will do it for
> you.
>

Yup, this is true. Now I just have to implement it using Boost.Fusion
-- this I'm worried about because this somewhat makes the bloom_filter
implementation (and usage) a little more complex than I'd like, but if
the requirements dictate it then there would be less reasons for me to
avoid it. ;-)

>>
>>>> 8. bloom_filters are not currently serializable.
>>>
>>> True. I might need to have a file 'boost/bloom_filter/serialize.hpp'
>>> to add serializability non-intrusively.
>>>
>>
>> This should come later... I still need to look into whether
>> serialization should only be done for the bitset state rather than the
>> whole type. I say this because up to now I don't think there's a way
>> of serializing boost::function objects and deserializing them
>> effectively either.
>>
>>
>
> If hash functions make part of the bloom_filter type there is no issue, only
> the data must be serialized.
>

You're correct here. Now it's just a matter of me implementing it then. :-)

>
> HTH,

It definitely does help. Thanks! :-)

-- 
Dean Michael Berris | Software Engineer, Friendster, Inc.
blog.cplusplus-soup.com | twitter.com/mikhailberis |
linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis |
deanberris.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk