Boost logo

Boost Users :

Subject: Re: [Boost-users] boost::hash and string conflicts
From: Ovanes Markarian (om_boost_at_[hidden])
Date: 2011-04-07 14:00:05


On Thu, Apr 7, 2011 at 7:08 PM, Erik Scorelle <escorelle.work_at_[hidden]>wrote:

> Hi,
>
> We have been using boost hash to hash filenames, but have found with some
> of our user data that certain strings will produce the same hash code (
> "0012g6" and "0012fu" for example). Is there a recommended way to predict
> or resolve these sorts of conflicts?
>
> Thanks in advance!
>

Hi!
Hash functions do not guarantee uniqueness. The case when the key is not
unique called collision. Usually hash tables not only compare the hash key,
but also the actual data to be sure that the right record was referenced.
The most primitive way to implement a hash-table is to put all records
hashing to the same key into a linked list and search linear. Hashing is
effecient, if the choosen hash function has a good distribution and results
in not so many collisions, which in turn results in amortized constant time
for the access.

Boost unordered_map is the implementation of a hash table:
http://www.boost.org/doc/libs/1_46_0/doc/html/boost/unordered_map.html

I suggest reading this article from Wikipedia:
http://en.wikipedia.org/wiki/Hash_function

And for further reading refer to Cormen et al: Introduction to algorithms.
They have the whole chapter dedicated to hashing techniques.

With Kind Regards,
Ovanes



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net