Boost logo

Boost :

From: Dave Harris (brangdon_at_[hidden])
Date: 2005-03-15 11:20:14


In-Reply-To: <002901c527e7$9f9fb620$6501a8c0_at_pdimov2>
pdimov_at_[hidden] (Peter Dimov) wrote (abridged):
> The problem is that hash functions vary wildly in quality. Leaving the
> default function unspecified (or implementation defined) is good if you
> are lucky and the implementation you happen to use is good. It's a
> tradeoff.

I think if we provide a good quality default implementation, and also a
good rational (and the rational is important here), then vendors are
unlikely to change it to something worse. Vendors avoid unnecessary work.

The eventual standard could recommend a specific implementation in a
non-normative Note.

> A fixed and predictable default means that you don't have to measure
> the quality of the hash function for every implementation to which you
> port your program. Your original tests are still valid.

In general, performance varies between platforms. If one implementation
has the fixed string optimisation and the other doesn't, my benchmarks for
one will not apply to the other. I am not yet convinced that hashing
deserves to be a special case. On the other hand, it does seem to be an
area which can benefit from innovation. Eg a platform may have a
super-fast near-cryptographic-strength algorithm implemented in hardware,
and we're saying they can't use it?

> The other side of predictability - that the hash function is not
> affected by types or the particular representation of a sequence of
> values - means that you don't necessarily have to re-run the
> performance benchmark of your unordered_maps when you refactor the key
> to use int[3] instead of struct { short, short, short }, or a wstring
> instead of a string.

I think you need to repeat your benchmarks anyway.

You may find that values which used to fit in cache no longer do so. Or
vice versa. If you are hashing pointer values then the collisions will
vary from one run of the program to the next.

Even if you can get stability for some set of program edits, there will
always be other edits which don't get it. Reordering the hash_combines, or
refactoring an object's state, for example.

-- Dave Harris, Nottingham, UK


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk