Boost logo

Boost Users :

Subject: Re: [Boost-users] boost::hash and string conflicts
From: Ovanes Markarian (om_boost_at_[hidden])
Date: 2011-04-07 14:08:18


On Thu, Apr 7, 2011 at 7:56 PM, Scott McMurray <me22.ca+boost_at_[hidden]>wrote:

> On Thu, Apr 7, 2011 at 10:08, Erik Scorelle <escorelle.work_at_[hidden]>
> wrote:
> > We have been using boost hash to hash filenames, but have found with some
> of
> > our user data that certain strings will produce the same hash code (
> > "0012g6" and "0012fu" for example). Is there a recommended way to
> predict
> > or resolve these sorts of conflicts?
> >
>
> Having some collisions is the expected behaviour of a
> (non-cryptographic) hash function. With a birthday search you can
> easily find thousands of examples.
>
> I'm not sure what you mean by "resolve". Normally, code using hashes
> expects collisions, and uses a full equality operator to resolve them.
>
> If you cannot accept any collisions, you could use perfect hashing
> (libraries like <http://cmph.sourceforge.net/> or
> <http://www.gnu.org/software/gperf/gperf.html>) or use a cryptographic
> hash (I have a usable, but WIP library at
> <http://svn.boost.org/svn/boost/sandbox/hash/>).
>
> ~ Scott
>

Scott,

AFAIK perfect hashing works only with the known set of keys, otherwise it is
not possible.
Just to make a note: crypto-hashes are much slower as normal ones and use
much more memory. Erik, how many file names are you going to hash? There was
an article on Dr.Dobbs which compared unordered_map (hash table)
implementation with std::map. And it turned out that until some certain
amount of items it makes no sence to use a hash table, because a normal map
is much faster.

Here is the link: http://drdobbs.com/cpp/198800559?pgno=1

With Kind Regards,
Ovanes



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net