Boost logo

Boost Users :

Subject: Re: [Boost-users] boost::hash and string conflicts
From: Scott McMurray (me22.ca+boost_at_[hidden])
Date: 2011-04-07 13:56:59


On Thu, Apr 7, 2011 at 10:08, Erik Scorelle <escorelle.work_at_[hidden]> wrote:
> We have been using boost hash to hash filenames, but have found with some of
> our user data that certain strings will produce the same hash code (
> "0012g6" and "0012fu" for example).  Is there a recommended way to predict
> or resolve these sorts of conflicts?
>

Having some collisions is the expected behaviour of a
(non-cryptographic) hash function. With a birthday search you can
easily find thousands of examples.

I'm not sure what you mean by "resolve". Normally, code using hashes
expects collisions, and uses a full equality operator to resolve them.

If you cannot accept any collisions, you could use perfect hashing
(libraries like <http://cmph.sourceforge.net/> or
<http://www.gnu.org/software/gperf/gperf.html>) or use a cryptographic
hash (I have a usable, but WIP library at
<http://svn.boost.org/svn/boost/sandbox/hash/>).

~ Scott


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net