Boost logo

Boost :

Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-06-09 13:51:25


On Tue, Jun 9, 2009 at 11:58 PM, <joaquin_at_[hidden]> wrote:
> Dean Michael Berris escribió:
>>
>> And apparently I mis-typed -- std::bitset<> doesn't have an allocator
>> parameter. Because of this, I've chosen to not support for the
>> meantime a custom allocator.
>>
>
> Umm, fair point :) this raises two points:
>
> 2.1. Allocators are still needed if you implement a variant where M (and
> maybe K)
> are passed at construction time.

Indeed. I'm leaning towards making this the default behavior using
Boost.Dynamic_bitset instead and implementing the
static_bloom_filter<> template that uses std::bitset internally now.

> 2.2. Is it wise to use the stack-based std::bitset when M is huge? I think
> it's not
> unlikely that M can be very large (say millions) which can easily lead to
> stack overflow problems. This might bring allocators back into scene.
>

You're right, it doesn't make it practical to use huge bitsets that
sit on the stack. Perhaps Boost.Dynamic_bitset is my friend. ;-)

>>>
>>> One thing I might be amenable to is a static K and a runtime M, but
>>> that makes the hash functions unable to determine statically the
>>> bounds of the bitset (M). I can change the concept for the hash
>>> function, but the simplicity of size_t(Input) just makes it easier for
>>> users I think.
>>>
>
> I'd ask you to think this over before dismissing it: I can envision
> scenarios where M
> is not known at compile time (for instance, when dealing with external data
> whose
> size is not known beforehand).
>
> We can easily enter into overdesign land, but you can always let this be
> driven by
> a policy, so that everybody's happy.
>

Now that you mention it, I agree. :-) First I'll do two things:

- Make the runtime-constructed bloom_filter the default implementation
(uses Boost.Dynamic_bitset and allows for custom allocators)
- Rename the current implementation as static_bloom_filter that takes
Input, M, and a Fusion Sequence of Hash Function types as arguments.

>>
>> Done, but I've run into the issue of not being able to compare
>> boost::function objects of the same type -- for some reason this has
>> been disabled in Boost Trunk.
>>
>
> Ummm, this is tricky, because of course it does not suffice to test the
> state only --it
> critically depends on the hash functions to be used (At first sight I
> thought you only
> needed to test state and ignore auxiliary objects like std containers do).
>

Yup. But apparently, if I make the hash functions part of the type
instead of be function objects passed in at runtime, that would solve
the problem (based on earlier comments in the thread).

>>>> 8. bloom_filters are not currently serializable.
>>>>
>>>
>>> True. I might need to have a file 'boost/bloom_filter/serialize.hpp'
>>> to add serializability non-intrusively.
>>>
>
> Intrusiveness is not necessarily related with having a separate
> serialize.hpp (you can
> friend declare inside bloom_filter in bloom_filter.hpp and have the
> implementation
> elsewhere), but it's a good idea nonetheless.
>

The friend declaration might be an option, and if I can't do it
non-intrusively I'll fall back to this option. Thanks for pointing
this out. :-)

-- 
Dean Michael Berris | Software Engineer, Friendster, Inc.
blog.cplusplus-soup.com | twitter.com/mikhailberis |
linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis |
deanberris.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk