Having some collisions is the expected behaviour of aOn Thu, Apr 7, 2011 at 10:08, Erik Scorelle <escorelle.work@gmail.com> wrote:
> We have been using boost hash to hash filenames, but have found with some of
> our user data that certain strings will produce the same hash code (
> "0012g6" and "0012fu" for example). Is there a recommended way to predict
> or resolve these sorts of conflicts?
>
(non-cryptographic) hash function. With a birthday search you can
easily find thousands of examples.
I'm not sure what you mean by "resolve". Normally, code using hashes
expects collisions, and uses a full equality operator to resolve them.
If you cannot accept any collisions, you could use perfect hashing
(libraries like <http://cmph.sourceforge.net/> or
<http://www.gnu.org/software/gperf/gperf.html>) or use a cryptographic
hash (I have a usable, but WIP library at
<http://svn.boost.org/svn/boost/sandbox/hash/>).
~ Scott