Subject: [Boost-commit] svn:boost r72515 - sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html
Date: 2011-06-09 12:34:51
Date: 2011-06-09 12:34:50 EDT (Thu, 09 Jun 2011)
New Revision: 72515
Filled in design section details. Consider this commit a rough draft. Completed tutorial section.
Text files modified:
sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/design.html | 63 +++++++++++++++++++++++++++++++++++++--
sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/tutorial.html | 2
2 files changed, 60 insertions(+), 5 deletions(-)
--- sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/design.html (original)
+++ sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/design.html 2011-06-09 12:34:50 EDT (Thu, 09 Jun 2011)
@@ -36,16 +36,71 @@
+ This Bloom filter implementation is designed to allow compile-time determination of storage space(in bits), insertion/query type, and hash functions. The design goals were:
+ <li>Easy to use correctly</li>
+ <li>Not difficult to customize regarding use of hash functions</li>
+ <li>Compact - the amount of space used by the Bloom filter is capped by the size of the underlying std::bitset</li>
+ <li>Type-safe - its impossible to insert an element of the wrong type into the Bloom filter.</li>
+ <li>Type-safe - its impossible to perform a union or intersect operation between incompatible Bloom filters.</li>
+ To meet these design goals, the Bloom filter is implemented using template parameters. There are three templates:
+ <li>Size - number of bits to use for state representation</li>
+ <li>Type - type of elements to hash</li>
+ <li>HashFunctions - a boost::mpl::vector of hash functions objects</li>
+ Currently, the Bloom filter executes the hash functions one at a time. It is not hard to imagine an implementation that dispatches lightweight threads to perform each of the k hash functions.
<h4>Default Hash Functions</h4>
+ To make the Bloom filter easy to use, the HashFunctions template partameter has a default value. This way, if the user wishes to use a Bloom filter for a quick, non-critical task, the user has the ability to just declare and use.
+ The default hash function set consists of three boost_hash function objects seeded with the values 3, 5, and 7. These are implemented using Boost.Functional hash_range. The boost_hash function object is made available in the file boost/bloom_filter/hash/default.hpp for users that wish to use more or fewer hash functions to use their Bloom filter without having to create their own hash function object.
+ Currently, this implementation uses boost::mpl::vector, boost::mpl::size, and boost::mpl::at_c for managing the hash function set. When C++0x is widely implemented, it may be beneficial to implement hash function set management in terms of std::tuple, std::tuple_size, and std::tuple_element. This will have to be investigated.
+ Alternatively, it may be possible to use a custom, minimal data structure for storing hash functions based on variadic template expansion. This may yield faster compile times. However, this may be entirely unnecessary since, to the best of my knowledge, std::tuple in (at least) GCC is implemented using variadic templates.
+ C++0x also introduces lambdas. It may be possible to define hash function objects in terms of lambdas for small, localized uses of the Bloom filter class. Testing needs to be done to verify that lambdas can be safely used.
+ <h4>Future Directions</h4>
+ <a name="future"></a>
+ There are a few planned features that have not yet made it to the Bloom filter package:
+ <li>[De]Compression policy support via template policy class</li>
+ <li>Bloom filter variations implemented as separate classes:
+ <li>Counting Bloom filter (deletion support)</li>
+ <li>Scaling Bloom filter (constant false positive rate)</li>
+ <li>Two-hash Bloom filter (uses two-hash functions to generate k hash values)</li>
+ <li>Bloomier filters (?)</li>
+ <li>Union/intersection of Bloom filters of the same size and same number/type of hash functions differing only by type (types must be convertible).</li>
+ <li>Parallel execution of hashing, especially useful for large (>1MB, >32 hash functions) Bloom filters. Such a feature should have a smart, defaulting mechanism that enables parallelism once Size/HashFunction parameters exceed a certain threshold, if parallelism is available on the given platform.</li>
+ The design space for Bloom filter-like data structures is immense. Given feedback in the future, this section is likely to grow. Please send suggestions to <a href="mailto:cpp.cabrera_at_[hidden]">Alejandro Cabrera</a>
--- sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/tutorial.html (original)
+++ sandbox/bloom_filter/trunk/libs/bloom_filter/doc/html/tutorial.html 2011-06-09 12:34:50 EDT (Thu, 09 Jun 2011)
@@ -183,7 +183,7 @@
- Last revised: <time datetime="2011-06-06">June 6, 2011</time>.
+ Last revised: <time datetime="2011-06-08">June 8, 2011</time>.
Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk