|
Boost : |
Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
From: joaquin_at_[hidden]
Date: 2009-06-09 11:58:17
Dean Michael Berris escribió:
> Just an update -- as of Sandbox revision 53772, I've addressed some of
> the issues below:
>
> On Tue, Jun 9, 2009 at 12:52 PM, Dean Michael
> Berris<mikhailberis_at_[hidden]> wrote:
>
>>> 2. bloom_filter should have an allocator template
>>> parameter, just like any other container around.
>>>
>> Okay, I'll just forward this anyway to the underlying data structure
>> (bitset) so that should be easy to add.
>>
>>
>
> And apparently I mis-typed -- std::bitset<> doesn't have an allocator
> parameter. Because of this, I've chosen to not support for the
> meantime a custom allocator.
>
Umm, fair point :) this raises two points:
2.1. Allocators are still needed if you implement a variant where M (and
maybe K)
are passed at construction time.
2.2. Is it wise to use the stack-based std::bitset when M is huge? I
think it's not
unlikely that M can be very large (say millions) which can easily lead to
stack overflow problems. This might bring allocators back into scene.
>>> 3. Have you considered the possibility to let
>>> the user specify M and K at constuction time rather
>>> than compile tiem and the pros and cons of both
>>> approaches?
>>>
>> Yes, I have but I've put a premium at type safety and putting M and K
>> as part of the type instead of being a runtime value. The reasons are:
>>
>> 1. Specifying a static M allows the use of the standard bitset instead
>> of Boost's dynamic_bitset.
>> 2. A static K allows for enforcing the number of hash functions at
>> compile time instead of at runtime.
>> 3. Since M and K are not variable and are part of the type, it makes
>> little sense for me to have either of them set at construction time.
>>
>> One thing I might be amenable to is a static K and a runtime M, but
>> that makes the hash functions unable to determine statically the
>> bounds of the bitset (M). I can change the concept for the hash
>> function, but the simplicity of size_t(Input) just makes it easier for
>> users I think.
>>
I'd ask you to think this over before dismissing it: I can envision
scenarios where M
is not known at compile time (for instance, when dealing with external
data whose
size is not known beforehand).
We can easily enter into overdesign land, but you can always let this be
driven by
a policy, so that everybody's happy.
>>> 7. bloom_filters are not currently testable for
>>> equality.
>>>
>> You're correct. I'll add this soon.
>>
>>
>
> Done, but I've run into the issue of not being able to compare
> boost::function objects of the same type -- for some reason this has
> been disabled in Boost Trunk.
>
Ummm, this is tricky, because of course it does not suffice to test the
state only --it
critically depends on the hash functions to be used (At first sight I
thought you only
needed to test state and ignore auxiliary objects like std containers do).
>>> 8. bloom_filters are not currently serializable.
>>>
>> True. I might need to have a file 'boost/bloom_filter/serialize.hpp'
>> to add serializability non-intrusively.
>>
Intrusiveness is not necessarily related with having a separate
serialize.hpp (you can
friend declare inside bloom_filter in bloom_filter.hpp and have the
implementation
elsewhere), but it's a good idea nonetheless.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk