Boost logo

Boost-Commit :

Subject: [Boost-commit] svn:boost r53745 - sandbox/bloom_filter/trunk/libs/bloom_filter/doc
From: mikhailberis_at_[hidden]
Date: 2009-06-08 07:28:22


Author: mikhailberis
Date: 2009-06-08 07:28:21 EDT (Mon, 08 Jun 2009)
New Revision: 53745
URL: http://svn.boost.org/trac/boost/changeset/53745

Log:
Adding sparse (initial) documentation for boost::bloom_filter.
Added:
   sandbox/bloom_filter/trunk/libs/bloom_filter/doc/bloom_filter.rst (contents, props changed)
   sandbox/bloom_filter/trunk/libs/bloom_filter/doc/index.html (contents, props changed)

Added: sandbox/bloom_filter/trunk/libs/bloom_filter/doc/bloom_filter.rst
==============================================================================
--- (empty file)
+++ sandbox/bloom_filter/trunk/libs/bloom_filter/doc/bloom_filter.rst 2009-06-08 07:28:21 EDT (Mon, 08 Jun 2009)
@@ -0,0 +1,116 @@
+:Authors:
+ Dean Michael Berris <mikhailberis_at_[hidden]>
+
+:Version:
+ 0.1 of 2009/06/08
+
+:License:
+ Distributed under the Boost Software License
+ (See accompanying LICENSE_1_0.txt or copy at
+ http://www.boost.org/LICENSE_1_0.txt)
+
+==================
+Boost.Bloom_filter
+==================
+
+From http://en.wikipedia.org/wiki/Bloom_filter ::
+
+ A Bloom Filter is a space-efficient probabilistic data structure
+ that is used to test whether an element is a member of a set. False
+ positives are possible but false negatives are not. Elements can be
+ added to the set, but not removed. The more that are added to the
+ set, the larger the probability of false positives.
+
+Boost.Bloom_filter is a generic implementation of a Bloom Filter that uses
+a variable number of bits to represent a generic bloom filter that takes a
+configurable input type and a configurable number of hash functions. Bloom
+Filters work by using multiple hash functions to determine the membership
+of an element in the represented set.
+
+For instance, we can use a Bloom Filter to maintain a set of URLs that have
+been cached by a caching web proxy. Instead of storing every possible URL,
+we use a Bloom Filter to mark certain parts of a bit field as '1' for every
+location that a set of hash functions returns. As an example, consider the
+empty 32-bit bitset representing a Bloom Filter below::
+
+ 00000000000000000000000000000000
+
+From here, we then have for instance three (3) hash functions that when we
+feed a URL will return numbers from [0..31]. Let's say when we hash the
+string 'http://www.boost.org' we get the three indexes ``(0,9,15)``. Our
+bitset will then look like::
+
+ 10000000010000010000000000000000
+
+To find out whether we've encountered a certain URL before, we simply pass
+the URL to the same three hash functions. If we see that all the positions
+returned by the three hash functions are all ``1``, then we might have seen
+the URL before -- this is called a false positive. In the case of a caching
+web proxy this shouldn't be a problem because if it really hasn't gotten it
+yet, it's just going to fetch it again accordingly (and update the Bloom
+Filter after that). This matters when you have a hierarchical network of
+web cache proxies that need to know where a page may have been cached before.
+Sharing these bloom filters across the nodes of the network allows for
+optimizing for the case when you may have already cached a page before.
+
+This also applies when writing distributed filesystems where you may have a
+system of multi-layer caches. Sharing the bloom filters of the pages that may
+have already been accessed (read) by nearby storage/retrieval nodes that have
+these pages cached minimizes the number of times a page that has already been
+retrieved before will have to be retrieved again across the network. In these
+situations you may be able to tolerate false positives but would rather not
+have false negatives -- meaning data you haven't ever accessed before would
+not be available in the caches and therefore will need to be fetched again.
+
+Synopsis
+--------
+
+A `bloom_filter` is a template class which takes three parameters:
+
+Input
+ The type of the elements contained in a bloom filter.
+M
+ The number of bits contained by the internal bitset.
+K
+ The number of hash functions used by the bloom filter.
+
+To use a `bloom_filter` that maintains a set of integers using 256 bits and
+two hash functions, we could have the following code listing::
+
+ #include <boost/bloom_filter.hpp>
+
+ ...
+
+ using boost::bloom_filter;
+ using boost::array;
+ using boost::function;
+
+ ...
+
+ array<function<size_t(int)>, 2> hash_array;
+ hash_array[0] = hash1;
+ hash_array[1] = hash2;
+ bloom_filter<int, 256, 2> integers(hash_array);
+
+Here we first create a Boost.Array of two Boost.Functions that we then use
+to construct the `bloom_filter`. In this code listing we are then able to
+insert new integers into the set by simply doing::
+
+ integers.insert(1);
+ integers.insert(7);
+ integers.insert(5);
+
+We are also then able to check whether certain integers are part of the set.
+Remember that in Bloom Filters, there is a possibility of getting false
+positives::
+
+ integers.contains(1); // should be true
+ integers.contains(9); // may be true or false
+ integers[5]; // should be true
+
+Reference
+---------
+
+* Bloom Filter (Wikipedia Entry) http://en.wikipedia.org/wiki/Bloom_filter
+
+

Added: sandbox/bloom_filter/trunk/libs/bloom_filter/doc/index.html
==============================================================================
--- (empty file)
+++ sandbox/bloom_filter/trunk/libs/bloom_filter/doc/index.html 2009-06-08 07:28:21 EDT (Mon, 08 Jun 2009)
@@ -0,0 +1,407 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="Docutils 0.5: http://docutils.sourceforge.net/" />
+<title></title>
+<meta name="authors" content="Dean Michael Berris &lt;mikhailberis&#64;gmail.com&gt;" />
+<style type="text/css">
+
+/*
+:Author: David Goodger (goodger_at_[hidden])
+:Id: $Id: html4css1.css 5196 2007-06-03 20:25:28Z wiemann $
+:Copyright: This stylesheet has been placed in the public domain.
+
+Default cascading style sheet for the HTML output of Docutils.
+
+See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
+customize this style sheet.
+*/
+
+/* used to remove borders from tables and images */
+.borderless, table.borderless td, table.borderless th {
+ border: 0 }
+
+table.borderless td, table.borderless th {
+ /* Override padding for "table.docutils td" with "! important".
+ The right padding separates the table cells. */
+ padding: 0 0.5em 0 0 ! important }
+
+.first {
+ /* Override more specific margin styles with "! important". */
+ margin-top: 0 ! important }
+
+.last, .with-subtitle {
+ margin-bottom: 0 ! important }
+
+.hidden {
+ display: none }
+
+a.toc-backref {
+ text-decoration: none ;
+ color: black }
+
+blockquote.epigraph {
+ margin: 2em 5em ; }
+
+dl.docutils dd {
+ margin-bottom: 0.5em }
+
+/* Uncomment (and remove this text!) to get bold-faced definition list terms
+dl.docutils dt {
+ font-weight: bold }
+*/
+
+div.abstract {
+ margin: 2em 5em }
+
+div.abstract p.topic-title {
+ font-weight: bold ;
+ text-align: center }
+
+div.admonition, div.attention, div.caution, div.danger, div.error,
+div.hint, div.important, div.note, div.tip, div.warning {
+ margin: 2em ;
+ border: medium outset ;
+ padding: 1em }
+
+div.admonition p.admonition-title, div.hint p.admonition-title,
+div.important p.admonition-title, div.note p.admonition-title,
+div.tip p.admonition-title {
+ font-weight: bold ;
+ font-family: sans-serif }
+
+div.attention p.admonition-title, div.caution p.admonition-title,
+div.danger p.admonition-title, div.error p.admonition-title,
+div.warning p.admonition-title {
+ color: red ;
+ font-weight: bold ;
+ font-family: sans-serif }
+
+/* Uncomment (and remove this text!) to get reduced vertical space in
+ compound paragraphs.
+div.compound .compound-first, div.compound .compound-middle {
+ margin-bottom: 0.5em }
+
+div.compound .compound-last, div.compound .compound-middle {
+ margin-top: 0.5em }
+*/
+
+div.dedication {
+ margin: 2em 5em ;
+ text-align: center ;
+ font-style: italic }
+
+div.dedication p.topic-title {
+ font-weight: bold ;
+ font-style: normal }
+
+div.figure {
+ margin-left: 2em ;
+ margin-right: 2em }
+
+div.footer, div.header {
+ clear: both;
+ font-size: smaller }
+
+div.line-block {
+ display: block ;
+ margin-top: 1em ;
+ margin-bottom: 1em }
+
+div.line-block div.line-block {
+ margin-top: 0 ;
+ margin-bottom: 0 ;
+ margin-left: 1.5em }
+
+div.sidebar {
+ margin: 0 0 0.5em 1em ;
+ border: medium outset ;
+ padding: 1em ;
+ background-color: #ffffee ;
+ width: 40% ;
+ float: right ;
+ clear: right }
+
+div.sidebar p.rubric {
+ font-family: sans-serif ;
+ font-size: medium }
+
+div.system-messages {
+ margin: 5em }
+
+div.system-messages h1 {
+ color: red }
+
+div.system-message {
+ border: medium outset ;
+ padding: 1em }
+
+div.system-message p.system-message-title {
+ color: red ;
+ font-weight: bold }
+
+div.topic {
+ margin: 2em }
+
+h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
+h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
+ margin-top: 0.4em }
+
+h1.title {
+ text-align: center }
+
+h2.subtitle {
+ text-align: center }
+
+hr.docutils {
+ width: 75% }
+
+img.align-left {
+ clear: left }
+
+img.align-right {
+ clear: right }
+
+ol.simple, ul.simple {
+ margin-bottom: 1em }
+
+ol.arabic {
+ list-style: decimal }
+
+ol.loweralpha {
+ list-style: lower-alpha }
+
+ol.upperalpha {
+ list-style: upper-alpha }
+
+ol.lowerroman {
+ list-style: lower-roman }
+
+ol.upperroman {
+ list-style: upper-roman }
+
+p.attribution {
+ text-align: right ;
+ margin-left: 50% }
+
+p.caption {
+ font-style: italic }
+
+p.credits {
+ font-style: italic ;
+ font-size: smaller }
+
+p.label {
+ white-space: nowrap }
+
+p.rubric {
+ font-weight: bold ;
+ font-size: larger ;
+ color: maroon ;
+ text-align: center }
+
+p.sidebar-title {
+ font-family: sans-serif ;
+ font-weight: bold ;
+ font-size: larger }
+
+p.sidebar-subtitle {
+ font-family: sans-serif ;
+ font-weight: bold }
+
+p.topic-title {
+ font-weight: bold }
+
+pre.address {
+ margin-bottom: 0 ;
+ margin-top: 0 ;
+ font-family: serif ;
+ font-size: 100% }
+
+pre.literal-block, pre.doctest-block {
+ margin-left: 2em ;
+ margin-right: 2em }
+
+span.classifier {
+ font-family: sans-serif ;
+ font-style: oblique }
+
+span.classifier-delimiter {
+ font-family: sans-serif ;
+ font-weight: bold }
+
+span.interpreted {
+ font-family: sans-serif }
+
+span.option {
+ white-space: nowrap }
+
+span.pre {
+ white-space: pre }
+
+span.problematic {
+ color: red }
+
+span.section-subtitle {
+ /* font-size relative to parent (h1..h6 element) */
+ font-size: 80% }
+
+table.citation {
+ border-left: solid 1px gray;
+ margin-left: 1px }
+
+table.docinfo {
+ margin: 2em 4em }
+
+table.docutils {
+ margin-top: 0.5em ;
+ margin-bottom: 0.5em }
+
+table.footnote {
+ border-left: solid 1px black;
+ margin-left: 1px }
+
+table.docutils td, table.docutils th,
+table.docinfo td, table.docinfo th {
+ padding-left: 0.5em ;
+ padding-right: 0.5em ;
+ vertical-align: top }
+
+table.docutils th.field-name, table.docinfo th.docinfo-name {
+ font-weight: bold ;
+ text-align: left ;
+ white-space: nowrap ;
+ padding-left: 0 }
+
+h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
+h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
+ font-size: 100% }
+
+ul.auto-toc {
+ list-style-type: none }
+
+</style>
+</head>
+<body>
+<div class="document">
+
+<table class="docinfo" frame="void" rules="none">
+<col class="docinfo-name" />
+<col class="docinfo-content" />
+<tbody valign="top">
+<tr><th class="docinfo-name">Authors:</th>
+<td>Dean Michael Berris &lt;mikhailberis&#64;gmail.com&gt;</td></tr>
+<tr><th class="docinfo-name">Version:</th>
+<td>0.1 of 2009/06/08</td></tr>
+<tr class="field"><th class="docinfo-name">License:</th><td class="field-body">Distributed under the Boost Software License
+(See accompanying LICENSE_1_0.txt or copy at
+<a class="reference external" href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt>)</td>
+</tr>
+</tbody>
+</table>
+<div class="section" id="boost-bloom-filter">
+<h1>Boost.Bloom_filter</h1>
+<p>From <a class="reference external" href="
http://en.wikipedia.org/wiki/Bloom_filter">http://en.wikipedia.org/wiki/Bloom_filter></p>
+<pre class="literal-block">
+A Bloom Filter is a space-efficient probabilistic data structure
+that is used to test whether an element is a member of a set. False
+positives are possible but false negatives are not. Elements can be
+added to the set, but not removed. The more that are added to the
+set, the larger the probability of false positives.
+</pre>
+<p>Boost.Bloom_filter is a generic implementation of a Bloom Filter that uses
+a variable number of bits to represent a generic bloom filter that takes a
+configurable input type and a configurable number of hash functions. Bloom
+Filters work by using multiple hash functions to determine the membership
+of an element in the represented set.</p>
+<p>For instance, we can use a Bloom Filter to maintain a set of URLs that have
+been cached by a caching web proxy. Instead of storing every possible URL,
+we use a Bloom Filter to mark certain parts of a bit field as '1' for every
+location that a set of hash functions returns. As an example, consider the
+empty 32-bit bitset representing a Bloom Filter below:</p>
+<pre class="literal-block">
+00000000000000000000000000000000
+</pre>
+<p>From here, we then have for instance three (3) hash functions that when we
+feed a URL will return numbers from [0..31]. Let's say when we hash the
+string '<a class="reference external" href="
http://www.boost.org">http://www.boost.org>' we get the three indexes <tt class="docutils literal"><span class="pre">(0,9,15)</span></tt>. Our
+bitset will then look like:</p>
+<pre class="literal-block">
+10000000010000010000000000000000
+</pre>
+<p>To find out whether we've encountered a certain URL before, we simply pass
+the URL to the same three hash functions. If we see that all the positions
+returned by the three hash functions are all <tt class="docutils literal"><span class="pre">1</span></tt>, then we might have seen
+the URL before -- this is called a false positive. In the case of a caching
+web proxy this shouldn't be a problem because if it really hasn't gotten it
+yet, it's just going to fetch it again accordingly (and update the Bloom
+Filter after that). This matters when you have a hierarchical network of
+web cache proxies that need to know where a page may have been cached before.
+Sharing these bloom filters across the nodes of the network allows for
+optimizing for the case when you may have already cached a page before.</p>
+<p>This also applies when writing distributed filesystems where you may have a
+system of multi-layer caches. Sharing the bloom filters of the pages that may
+have already been accessed (read) by nearby storage/retrieval nodes that have
+these pages cached minimizes the number of times a page that has already been
+retrieved before will have to be retrieved again across the network. In these
+situations you may be able to tolerate false positives but would rather not
+have false negatives -- meaning data you haven't ever accessed before would
+not be available in the caches and therefore will need to be fetched again.</p>
+<div class="section" id="synopsis">
+<h2>Synopsis</h2>
+<p>A <cite>bloom_filter</cite> is a template class which takes three parameters:</p>
+<dl class="docutils">
+<dt>Input</dt>
+<dd>The type of the elements contained in a bloom filter.</dd>
+<dt>M</dt>
+<dd>The number of bits contained by the internal bitset.</dd>
+<dt>K</dt>
+<dd>The number of hash functions used by the bloom filter.</dd>
+</dl>
+<p>To use a <cite>bloom_filter</cite> that maintains a set of integers using 256 bits and
+two hash functions, we could have the following code listing:</p>
+<pre class="literal-block">
+#include &lt;boost/bloom_filter.hpp&gt;
+
+...
+
+using boost::bloom_filter;
+using boost::array;
+using boost::function;
+
+...
+
+array&lt;function&lt;size_t(int)&gt;, 2&gt; hash_array;
+hash_array[0] = hash1;
+hash_array[1] = hash2;
+bloom_filter&lt;int, 256, 2&gt; integers(hash_array);
+</pre>
+<p>Here we first create a Boost.Array of two Boost.Functions that we then use
+to construct the <cite>bloom_filter</cite>. In this code listing we are then able to
+insert new integers into the set by simply doing:</p>
+<pre class="literal-block">
+integers.insert(1);
+integers.insert(7);
+integers.insert(5);
+</pre>
+<p>We are also then able to check whether certain integers are part of the set.
+Remember that in Bloom Filters, there is a possibility of getting false
+positives:</p>
+<pre class="literal-block">
+integers.contains(1); // should be true
+integers.contains(9); // may be true or false
+integers[5]; // should be true
+</pre>
+</div>
+<div class="section" id="reference">
+<h2>Reference</h2>
+<ul class="simple">
+<li>Bloom Filter (Wikipedia Entry) <a class="reference external" href="
http://en.wikipedia.org/wiki/Bloom_filter">http://en.wikipedia.org/wiki/Bloom_filter></li>
+</ul>
+</div>
+</div>
+</div>
+</body>
+</html>


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk