Boost logo

Boost :

Subject: Re: [boost] [functional/hash] integer hash function
From: Christopher Jefferson (chris_at_[hidden])
Date: 2011-10-05 05:10:20

On 5 Oct 2011, at 05:54, Peter Dimov wrote:

> Mathias Gaunard wrote:
>> On 10/04/2011 10:03 PM, Tim Blechmann wrote:
>> > i was quite amazed that the default hash function for integers seems to > be the
>> > identity function:
>> >
>> > from boost/functional/hash/hash.hpp:
>> > inline std::size_t hash_value(int v)
>> > {
>> > return static_cast<std::size_t>(v);
>> > }
>> >
>> > while this is probably the most efficient hash function, it is also the > worst
>> > possible. is this done intentionally?
>> It's only bad if you know your possible values are not equally distributed over the whole range of "int".
> The distribution of the input is irrelevant. It's always mirrored in the distribution of the output (regardless of the hash function, as long as there are no collisions) because there is 1:1 correspondence between the input and output values. The problem arises when output bits are discarded. In the pathological case, if the input is 0..31 and one discards the bottom 5 bits, the result is always 0. But the default hash function can't know that you're going to discard bits, or how to fix that for you.

A default hash function certainly can shuffle bits, to create a better distribution. It all depends on what we want the hash function to do. Simply reduce the risk of collisions, or also have low correlation between the distribution of input and output data.

At the moment boost has chosen to simply do the first. There are applications where people also want the second, and adding this would add a small cost.

The question is wether boost should do that. I would say not, as it is simple to add once a shuffle/distribution function, which can be applied to the hash, if the user wants one. It might still be worth documenting this weakness however.


Boost list run by bdawes at, gregod at, cpdaniel at, john at