Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project
From: jakub szymanski (qba.szymanski_at_[hidden])
Date: 2011-06-21 10:06:56

Next message: John Maddock: "[boost] [Phoenix] Two sets of docs?"
Previous message: Beman Dawes: "Re: [boost] [1.47.0] Beta 1 Release candidate files available"
In reply to: Alejandro Cabrera: "[boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project"
Next in thread: Phil Endecott: "Re: [boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project"

Having bloom_filter in boost:: is great idea! It is super useful, we use in
production code (our own implementation (actually we use Bloomier filters
with murmur hash perfect hashing from your wiki reference).
It lets us use very memory and lookup efficient data structure for very
large datasets (think 100 GB file with strings on SSD disk indexed by e.g.
200 MB Bloomier filter in memory)

Did you considered adding serialization of a bloom_filter to your
implementation?
In general reconstructing hash based containers with series of inserts is
pretty inefficient.
Use case that I'm talking about: e.g. for you web proxy scenario proxy
service keeps running and downloading, caching URLs and adding them to bloom
filter. Than the process needs to be restarted for some reason. All
documents downloaded and stored on disk will have to be reitereted and their
URLs reinserted to newly created bloom_filter, which makes the startup of
the process slow.
btw I have the same problem with standard containers (except std::vector).
There is no efficient serialization / deserialization for them rendering
them useless for any larger side project (like unordered_set of 1m strings).

--
View this message in context: http://boost.2283326.n4.nabble.com/GSoC-Request-for-Feedback-on-Boost-Bloom-Filter-Project-tp3614026p3614200.html
Sent from the Boost - Dev mailing list archive at Nabble.com.

Next message: John Maddock: "[boost] [Phoenix] Two sets of docs?"
Previous message: Beman Dawes: "Re: [boost] [1.47.0] Beta 1 Release candidate files available"
In reply to: Alejandro Cabrera: "[boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project"
Next in thread: Phil Endecott: "Re: [boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk