Boost logo

Boost :

Subject: Re: [boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project
From: Alejandro Cabrera (cpp.cabrera_at_[hidden])
Date: 2011-06-26 18:41:30


Arash Partow wrote:
>
> Alejandro Cabrera wrote:
>> I'm writing to request feedback, suggestions, and advice from the Boost
>> community. I've been working on a Bloom filter data structure for about a
>> month and a half now. At this point, the following is complete and
>> available in the Boost Sandbox:
>> - working, basic prototype (insert & query operations, union & intersect)
>> - a test suite covering the public interface
>> - five examples
>> - documentation covering many aspects of the project (tested on Chrome 12
>> and 13, Firefox 4 and 6)
>>
>
> Looks ok, but one important question - Why is the BF typed? Its not
> necessary and in fact there are many use-cases where one might want to
> insert and/or test membership for a range of different types using the
> same BF - all that those types require are that they be hashable.
>

Though not necessary, I believe having a typed BF is safer. If there are
users that want to eschew type safety to allow operations on multiple types,
I believe this can be accomplished either by using a Bloom filter that
accepts insertions of type void *.

Arash Partow wrote:
>
> Another issue, along the lines of what Phil mentioned wrt naming of the
> "contains" method. It should really be called "contains" for two reasons,
> firstly its a transitive verb - "a doing" label which is indicative of how
> it would be used in code, and secondly you've provided the method
> false_positive_rate which coupled with the fact that a BF is a
> probabilistic set, indicates to the user that any result will inherently
> have a false positive probability assigned with it, which is naively
> proportional to a function of the bits in the BF and the number of
> elements inserted into the BF (and not necessarily the element's
> bit-length).
>

I agree that the transitive verb property is crucial. It communicates the
intent of the operation. I've opted to go with the name "probably_contains"
for the time being. Even though it is slightly longer to type, it highlights
its probabilistic nature. I want there to be minimal confusion for users new
to Bloom filters about the nature of the data structure.

Arash Partow wrote:
>
> All in all a good start - keep up the good work!
>

Thank you for the feedback!

--
View this message in context: http://boost.2283326.n4.nabble.com/GSoC-Request-for-Feedback-on-Boost-Bloom-Filter-Project-tp3614026p3626643.html
Sent from the Boost - Dev mailing list archive at Nabble.com.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk