Boost logo

Boost-Commit :

Subject: [Boost-commit] svn:boost r74856 - trunk/libs/functional/hash/doc
From: dnljms_at_[hidden]
Date: 2011-10-09 14:14:50


Author: danieljames
Date: 2011-10-09 14:14:50 EDT (Sun, 09 Oct 2011)
New Revision: 74856
URL: http://svn.boost.org/trac/boost/changeset/74856

Log:
Hash: Note about the quality of the hash function.

In response to the thread starting at:

http://lists.boost.org/Archives/boost/2011/10/186476.php
Added:
   trunk/libs/functional/hash/doc/rationale.qbk (contents, props changed)
Text files modified:
   trunk/libs/functional/hash/doc/hash.qbk | 5 +++++
   trunk/libs/functional/hash/doc/intro.qbk | 10 +++++++---
   2 files changed, 12 insertions(+), 3 deletions(-)

Modified: trunk/libs/functional/hash/doc/hash.qbk
==============================================================================
--- trunk/libs/functional/hash/doc/hash.qbk (original)
+++ trunk/libs/functional/hash/doc/hash.qbk 2011-10-09 14:14:50 EDT (Sun, 09 Oct 2011)
@@ -14,11 +14,16 @@
     ]
 ]
 
+[def __issues__
+ [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf
+ Library Extension Technical Report Issues List]]
+
 [include:hash intro.qbk]
 [include:hash tutorial.qbk]
 [include:hash portability.qbk]
 [include:hash disable.qbk]
 [include:hash changes.qbk]
+[include:hash rationale.qbk]
 [xinclude ref.xml]
 [include:hash links.qbk]
 [include:hash thanks.qbk]

Modified: trunk/libs/functional/hash/doc/intro.qbk
==============================================================================
--- trunk/libs/functional/hash/doc/intro.qbk (original)
+++ trunk/libs/functional/hash/doc/intro.qbk 2011-10-09 14:14:50 EDT (Sun, 09 Oct 2011)
@@ -18,9 +18,6 @@
 [def __multi-index-short__ [@boost:/libs/multi_index/doc/index.html
     Boost.MultiIndex]]
 [def __bimap__ [@boost:/libs/bimap/index.html Boost.Bimap]]
-[def __issues__
- [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf
- Library Extension Technical Report Issues List]]
 [def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function hash function]]
 [def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table hash table]]
 
@@ -44,5 +41,12 @@
 * the standard containers.
 * extending [classref boost::hash] for custom types.
 
+[note
+This hash function is designed to be used in containers based on
+the STL and is not suitable as a general purpose hash function.
+For more details see the [link hash.rationale rationale].
+]
+
+
 [endsect]
 

Added: trunk/libs/functional/hash/doc/rationale.qbk
==============================================================================
--- (empty file)
+++ trunk/libs/functional/hash/doc/rationale.qbk 2011-10-09 14:14:50 EDT (Sun, 09 Oct 2011)
@@ -0,0 +1,55 @@
+
+[/ Copyright 2011 Daniel James.
+ / Distributed under the Boost Software License, Version 1.0. (See accompanying
+ / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
+
+[section:rationale Rationale]
+
+The rationale for the design can be found in the original design
+[footnote issue 6.18 of the __issues__ (page 63)], but an issue that
+occasionally comes up is the quality of the hash function, so that
+demands some more attention.
+
+Many hash functions strive to have little correlation between the input
+and output values. They attempt to uniformally distribute the output
+values for very similar inputs. This hash function makes no such
+attempt. In fact, for integers, the result of the hash function is often
+just the input value. So similar but different input values will result
+in similar but different output values.
+
+This means that it is not appropriate as a general hash function. For
+example, a hash table may discard bits from the hash function resulting
+in likely collisions, or might have poor collision resolution when hash
+values are clustered together. In such cases this hash function will
+preform poorly.
+
+So why not implement a higher quality hash function? Well, the standard
+makes no such guarantee, it just requires that the hashes of two
+different values are unlikely to collide. So containers or algorithms
+designed to work with the standard hash function will have to be
+implemented to work well when the hash function's output is correlated
+to its input. Since they are paying that cost it would be wasteful to
+expand the effort to make a higher quality hash function.
+
+If you do need a higher quality hash function, there are several options
+available. One is to use a second hash on the output of this hash
+function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm
+Thomas Wang's hash function]. But for many types this might not work as
+well as a hash algorithm tailored for the input.
+
+For strings that are several fast, high quality hash functions
+available, such as:
+
+* [@http://burtleburtle.net/bob/hash/index.html Bob Jenkins' hash
+ functions]
+* [@http://www.azillionmonkeys.com/qed/hash.html Paul Hsieh's hash
+ functions]
+* [@http://code.google.com/p/cityhash/ Google's CityHash]
+* [@http://code.google.com/p/smhasher/ MurmurHash3]
+
+These may also be appropriate for hashing a binary representation of
+your data - providing that all equal values have an equal
+representation, which is not always the case (e.g. for floating point
+values).
+
+[endsect]
\ No newline at end of file


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk