Boost logo

Boost :

From: Topher Cooper (topher_at_[hidden])
Date: 2005-03-13 13:27:41


On Mar 13, 2005, at 11:13 AM, Peter Dimov wrote:
> implementation to which you port your program. Your original tests are
> still valid.
>
> The other side of predictability - that the hash function is not
> affected by types or the particular representation of a sequence of
> values - means that you don't necessarily have to re-run the
> performance benchmark of your unordered_maps when you refactor the key
> to use int[3] instead of struct { short, short, short }, or a wstring
> instead of a string.
>

Fundamentally, what makes for a good hash function is that things that
are equal (however you are defining equality) always hash to the same
value, while any small difference between two things should cause them
to hash to completely different values. In an arbitrary program would
I expect an int[3] to be equal to a struct { short, short, short } when
the values happened to be equal? No. What if the change in
representation you spoke of was to rotate the coordinate axes for my
vector, would I expect values with the same coordinates in the two
different coordinate-systems, represented by different structs to be
equal? No. If I have two structs representing complex numbers, one
using a Cartesian representation and the other polar would I expect
them to be treated as equal because their coordinates are equal? No --
I would be appalled if they were.

A general purpose hash function that behaves like this is a weak one at
best (and in some situations it can be a horrendously bad one). Your
argument is that one should allow those developers who are making one
specific change in representation out of the hundreds of possible
reasonable ones should be allowed to be able to get away with being
slipshod (and yes, not re-running the performance benchmark even with
this deliberate weakening of the benchmark would be slipshod, though
justifiably under many conditions). I don't buy it.

Now if you made an argument that the benefits of strengthening it were,
in practice, too minor to consider and the costs (to, e.g., seed the
hash combination sequence with a class dependent value) was too high,
then I might agree. Doing something that is theoretically "wrong"
because it is pragmatically the "right" thing to do, is good
engineering. Arguing that something that is theoretically "wrong" is
in some real sense the right thing to do is bad engineering, bad math
and bad science.

Topher


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk