Boost logo

Boost :

Subject: Re: [boost] [hash] regular behaviour of hash function for double values
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2012-01-31 14:28:07

On Tuesday, January 31, 2012 18:55:21 Daniel James wrote:
> The requirements for the hash function are:
> For two different values
> t1 and t2, the probability that h(t1) and h(t2) compare
> equal should be very small, approaching 1.0 / numeric_-
> limits<size_t>::max().
> There's no requirement about how h(t1) and h(t2) are distributed


> so it
> must be the container's responsibility, since the container must work
> well with any hash function that meets these requirements.

Wrong, there's no such requirement. It must work, not necessarilly well.

> How else can you interpret it?

Simple: it is possible to design a hash function that will cause suboptimal
performance with the container. No matter how you implement the container,
with bit mixing or not, this statement still holds.

I understand your reasoning. You're trying to reduce the probability of
unordered container being inefficient because of an inefficient hash function.
This also approach has the benefit of simplification of hash functions

What I'm saying is that this pessimization is the wrong thing to do because
you make users pay for what they don't need. Most of the time, when you deal
with hash containers, you have a pretty good idea of the key nature and the
better ways to hash key values. There are times when the key values already
have a good distribution (e.g. the keys are GUIDs) and there is no need in
additional bit mixing. For other cases there are known good hashing algorithms
so you often just use the one which better fits your needs. And for standard
types I would expect std::hash to do the right thing. I know, the standard
doesn't spell out that explicitly (which is a bad thing), but this is how it
works in practice (from my experience).

Boost list run by bdawes at, gregod at, cpdaniel at, john at