Boost logo

Boost :

Subject: Re: [boost] [Feedback] Towards a Better Boost.BloomFilter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-29 18:01:52


Alejandro Cabrera wrote:
> Phil Endecott-48 wrote:
>>
>> There is still no "adaptor" functionality, so I can't mmap() a file to
>> use as the bloom filter's raw content, as I might want to do for e.g. a
>> spellcheck, URL blocklist, etc. etc.
>>
>
> Could you describe this is more detail? I am familiar with the mmap()
> interface, but how would one go about providing an adaptor so that mmap()
> could be used as the Bloom filter's raw content? Would this be accomplished
> through a constructor, for example:
>
> // dynamic_basic_bloom_filter(void *addr, const size_t len);
> dynamic_basic_bloom_filter<std::string> bloom(address, length);

It is normally preferable to pass a begin-end pair rather than address
and length, but fundamentally yes I would like to be able to construct
a read-only bloom filter from a pair of const_iterators i.e. const
pointers in this case.

> Would you happen to know if there is any work being done on a Boost.Posix or
> any similar C++ project?

Not relevant.

> Phil Endecott-48 wrote:
>>
>> data() returns a std::bitset, but that doesn't provide access to its
>> data in a form that I can write to a file (e.g. in preparing the data
>> for the above examples). I consider this a fault of std::bitset. I
>> believe you should use a std::vector or array instead.
>>
>
> data() returns the underlying type in each case. For the basic Bloom filter,
> this is a bitset (std:: or dynamic), and for counting Bloom filters, this is
> either a boost::array or an std::vector.
>
> I see the problem with std::bitset now. In order to serialize the bitset, it
> would take O(num_bits) operations, rather than the number of blocks (using
> operator[]). Using an std::vector, the serialization can be accomplished in
> O(num_elements). It also helps that boost.Serialization provides an
> implementation for std::vector. Thank you for the insight. I'll work on
> converting the underlying storage type next week.

I don't care about serialisation. I just want to be able to

const T* p = &(*(bloom_filter.data().begin()));
size_t len = sizeof(T) * bloom_filter.data().size();
write(fd,p,len);

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk