Boost logo

Boost :

Subject: Re: [boost] [Feedback] Towards a Better Boost.BloomFilter
From: Alejandro Cabrera (cpp.cabrera_at_[hidden])
Date: 2011-08-26 05:02:10


Phil Endecott-48 wrote:
>
> There is still no "adaptor" functionality, so I can't mmap() a file to
> use as the bloom filter's raw content, as I might want to do for e.g. a
> spellcheck, URL blocklist, etc. etc.
>

Could you describe this is more detail? I am familiar with the mmap()
interface, but how would one go about providing an adaptor so that mmap()
could be used as the Bloom filter's raw content? Would this be accomplished
through a constructor, for example:

// dynamic_basic_bloom_filter(void *addr, const size_t len);
dynamic_basic_bloom_filter<std::string> bloom(address, length);

Would you happen to know if there is any work being done on a Boost.Posix or
any similar C++ project?

Phil Endecott-48 wrote:
>
> data() returns a std::bitset, but that doesn't provide access to its
> data in a form that I can write to a file (e.g. in preparing the data
> for the above examples). I consider this a fault of std::bitset. I
> believe you should use a std::vector or array instead.
>

data() returns the underlying type in each case. For the basic Bloom filter,
this is a bitset (std:: or dynamic), and for counting Bloom filters, this is
either a boost::array or an std::vector.

I see the problem with std::bitset now. In order to serialize the bitset, it
would take O(num_bits) operations, rather than the number of blocks (using
operator[]). Using an std::vector, the serialization can be accomplished in
O(num_elements). It also helps that boost.Serialization provides an
implementation for std::vector. Thank you for the insight. I'll work on
converting the underlying storage type next week.

Phil Endecott-48 wrote:
>
> I see that you still have intersection without discussing the error
> rate implications.
>

I will add a section in the tutorial that discusses the implications of
union/intersect operations.

Phil Endecott-48 wrote:
>
> Your dynamic_bitset has a resize() method. Was that there before? I
> don't understand how it can work in any useful way.
>

Yes, this was there initially. The dynamic_basic_bloom_filter is the only
dynamic Bloom filter that has the resize() method. I intend to remove this
method, as users requiring a resize can instantiate a new dynamic Bloom
filter to achieve the same effect. It was a design relic that should've been
discarded much sooner, as it is more dangerous to users than useful.

Phil Endecott-48 wrote:
>
> The final documentation should mention the complexity of each method.
>

I've made sure to include the complexity of all methods. This is located at:
http://svn.boost.org/svn/boost/sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/reference/functions.html

In the class references, do you think it would be convenient to make each
function name clickable and make them link to the specific entry in the
function reference?

Thank you,
-Alej

--
View this message in context: http://boost.2283326.n4.nabble.com/Feedback-Towards-a-Better-Boost-BloomFilter-tp3766988p3770326.html
Sent from the Boost - Dev mailing list archive at Nabble.com.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk