Boost logo

Boost :

Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-06-10 05:22:53

On Wed, Jun 10, 2009 at 1:51 AM, Dean Michael
Berris<mikhailberis_at_[hidden]> wrote:
> On Tue, Jun 9, 2009 at 11:58 PM, <joaquin_at_[hidden]> wrote:
>> Umm, fair point :) this raises two points:
>> 2.1. Allocators are still needed if you implement a variant where M (and
>> maybe K)
>> are passed at construction time.
> Indeed. I'm leaning towards making this the default behavior using
> Boost.Dynamic_bitset instead and implementing the
> static_bloom_filter<> template that uses std::bitset internally now.

Now I've decided that boost::bloom_filter<> will take a runtime
parameter which is the size of the internal bitset (a
Boost.Dynamic_bitset) -- and the static version that uses std::bitset
is boost::static_bloom_filter<>.

>> 2.2. Is it wise to use the stack-based std::bitset when M is huge? I think
>> it's not
>> unlikely that M can be very large (say millions) which can easily lead to
>> stack overflow problems. This might bring allocators back into scene.
> You're right, it doesn't make it practical to use huge bitsets that
> sit on the stack. Perhaps Boost.Dynamic_bitset is my friend. ;-)

This has been addressed in revision 53785 of the Boost Sandbox. :-)

>>>> One thing I might be amenable to is a static K and a runtime M, but
>>>> that makes the hash functions unable to determine statically the
>>>> bounds of the bitset (M). I can change the concept for the hash
>>>> function, but the simplicity of size_t(Input) just makes it easier for
>>>> users I think.
>> I'd ask you to think this over before dismissing it: I can envision
>> scenarios where M
>> is not known at compile time (for instance, when dealing with external data
>> whose
>> size is not known beforehand).
>> We can easily enter into overdesign land, but you can always let this be
>> driven by
>> a policy, so that everybody's happy.
> Now that you mention it, I agree. :-) First I'll do two things:
> - Make the runtime-constructed bloom_filter the default implementation
> (uses Boost.Dynamic_bitset and allows for custom allocators)

This is done.

> - Rename the current implementation as static_bloom_filter that takes
> Input, M, and a Fusion Sequence of Hash Function types as arguments.

Also done. :)

>>> Done, but I've run into the issue of not being able to compare
>>> boost::function objects of the same type -- for some reason this has
>>> been disabled in Boost Trunk.
>> Ummm, this is tricky, because of course it does not suffice to test the
>> state only --it
>> critically depends on the hash functions to be used (At first sight I
>> thought you only
>> needed to test state and ignore auxiliary objects like std containers do).
> Yup. But apparently, if I make the hash functions part of the type
> instead of be function objects passed in at runtime, that would solve
> the problem (based on earlier comments in the thread).

Which is also done. :D

>> Intrusiveness is not necessarily related with having a separate
>> serialize.hpp (you can
>> friend declare inside bloom_filter in bloom_filter.hpp and have the
>> implementation
>> elsewhere), but it's a good idea nonetheless.
> The friend declaration might be an option, and if I can't do it
> non-intrusively I'll fall back to this option. Thanks for pointing
> this out. :-)

And I've yet to work on this. ;-)


Dean Michael Berris | Software Engineer, Friendster, Inc. | | | |

Boost list run by bdawes at, gregod at, cpdaniel at, john at