Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
Date: 2009-06-09 11:58:17
Dean Michael Berris escribió:
> Just an update -- as of Sandbox revision 53772, I've addressed some of
> the issues below:
> On Tue, Jun 9, 2009 at 12:52 PM, Dean Michael
> Berris<mikhailberis_at_[hidden]> wrote:
>>> 2. bloom_filter should have an allocator template
>>> parameter, just like any other container around.
>> Okay, I'll just forward this anyway to the underlying data structure
>> (bitset) so that should be easy to add.
> And apparently I mis-typed -- std::bitset<> doesn't have an allocator
> parameter. Because of this, I've chosen to not support for the
> meantime a custom allocator.
Umm, fair point :) this raises two points:
2.1. Allocators are still needed if you implement a variant where M (and
are passed at construction time.
2.2. Is it wise to use the stack-based std::bitset when M is huge? I
think it's not
unlikely that M can be very large (say millions) which can easily lead to
stack overflow problems. This might bring allocators back into scene.
>>> 3. Have you considered the possibility to let
>>> the user specify M and K at constuction time rather
>>> than compile tiem and the pros and cons of both
>> Yes, I have but I've put a premium at type safety and putting M and K
>> as part of the type instead of being a runtime value. The reasons are:
>> 1. Specifying a static M allows the use of the standard bitset instead
>> of Boost's dynamic_bitset.
>> 2. A static K allows for enforcing the number of hash functions at
>> compile time instead of at runtime.
>> 3. Since M and K are not variable and are part of the type, it makes
>> little sense for me to have either of them set at construction time.
>> One thing I might be amenable to is a static K and a runtime M, but
>> that makes the hash functions unable to determine statically the
>> bounds of the bitset (M). I can change the concept for the hash
>> function, but the simplicity of size_t(Input) just makes it easier for
>> users I think.
I'd ask you to think this over before dismissing it: I can envision
scenarios where M
is not known at compile time (for instance, when dealing with external
size is not known beforehand).
We can easily enter into overdesign land, but you can always let this be
a policy, so that everybody's happy.
>>> 7. bloom_filters are not currently testable for
>> You're correct. I'll add this soon.
> Done, but I've run into the issue of not being able to compare
> boost::function objects of the same type -- for some reason this has
> been disabled in Boost Trunk.
Ummm, this is tricky, because of course it does not suffice to test the
state only --it
critically depends on the hash functions to be used (At first sight I
thought you only
needed to test state and ignore auxiliary objects like std containers do).
>>> 8. bloom_filters are not currently serializable.
>> True. I might need to have a file 'boost/bloom_filter/serialize.hpp'
>> to add serializability non-intrusively.
Intrusiveness is not necessarily related with having a separate
serialize.hpp (you can
friend declare inside bloom_filter in bloom_filter.hpp and have the
elsewhere), but it's a good idea nonetheless.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk