
Boost : 
From: Daniel James (daniel_at_[hidden])
Date: 20050319 06:40:59
Peter Dimov wrote:
> This is not an assumption. It is the specification of hash_value that
> determines how it is to be used. Currently it does not offer any
> guarantees for the entropy of any particular bits, so the client of
> hash_value is expected to not do something stupid, like throw away 28 of
> its 32 bits.
From google's sparse hash table docs:
"(As a side note, using a table size that's a power of two has several
advantages, including the speed of calculating (i % table_size). On the
other hand, poweroftwo tables are not very forgiving of a poor hash
function. Make sure your hash function is a good one! There are plenty
of dos and don'ts on the web (and in Knuth), for writing hash functions.)"
But they seem to be using SGI's hash function, which isn't very
good for power of two hash tables. I tried changing their benchmarks
(which use increasing integers for values, so there are very few
collisions) to only use multiples of 8 and performance dropped as you'd
expect.
The problem here is that a hash function which works well with such a
container would probably punish hash tables which don't discard bits.
A better approach to using a power of 2 might be to apply a further
transformation to the hash value, which might take into account the
current hash table size.
Daniel
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk