Boost logo

Boost :

Subject: Re: [boost] [Feedback] Towards a Better Boost.BloomFilter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-30 16:13:21


Alejandro Cabrera wrote:
> Phil,
>
> Phil Endecott-48 wrote:
>>
>> It is normally preferable to pass a begin-end pair rather than address
>> and length, but fundamentally yes I would like to be able to construct
>> a read-only bloom filter from a pair of const_iterators i.e. const
>> pointers in this case.
>>
>
> The input iterator constructor is already available.

No, I don't think so. What you have is a ctor that iterates over a
sequence of values and inserts them:

       template <typename InputIterator>
       basic_bloom_filter(const InputIterator start, const InputIterator
end) {
        for (InputIterator i = start; i != end; ++i)
          this->insert(*i);
       }

What I want is a way to adapt a chunk of raw memory, which I've got
from mmap(), and to treat it as a bloom filter. I haven't yet seen any
"adaptor" functionality in your code.

> All that remains is what you describe below.
>
> Phil Endecott-48 wrote:
>>
>> I don't care about serialisation. I just want to be able to
>>
>> const T* p = &(*(bloom_filter.data().begin()));
>> size_t len = sizeof(T) * bloom_filter.data().size();
>> write(fd,p,len);
>>
>> Regards, Phil.
>>
>
> All this should take is changing the underlying storage type to something
> that supports the operations begin() and size(). The intent is to use
> std::vector, as with the dynamic counting Bloom filter implementations. The
> operations you've described above are serialization. This will be supported
> soon, as soon as I can work on the basic Bloom filters. I'll add a test to
> my suite to make sure there is a way to support this in both directions
> (write/read, store/load).

I really don't want to "load" the filter i.e. to do something that
needs O(N) time and O(N) memory. I just want to mmap the file into
memory and "declare" the bloom filter into existence in O(1) time.
Then when I eventually use it, pages will be faulted-in on demand. The
aim is to ensure that I can have a perhaps very large filter and yet to
perform small numbers of queries quite quickly.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk