Boost logo

Boost :

From: Topher Cooper (topher_at_[hidden])
Date: 2005-03-13 16:40:47


On Mar 13, 2005, at 2:04 PM, Peter Dimov wrote:

> Topher Cooper wrote:
>>
>> Fundamentally, what makes for a good hash function is that things that
>> are equal (however you are defining equality) always hash to the same
>> value, while any small difference between two things should cause them
>> to hash to completely different values. In an arbitrary program would
>> I expect an int[3] to be equal to a struct { short, short, short }
>> when the values happened to be equal? No. What if [...]
>
> Since the keys in an unordered_map are of the same type, it is usually
> not possible for one of them to contain int[3] and for another to hold
> struct { short, short, short }.
>
> A discriminated union type (boost::variant) that is able to hold
> either type can easily hash_combine the index of the currently active
> type to avoid these accidental collisions. Similarly, a
> boost::any-alike can hash_combine the address of a structure that
> uniquely identifies the type, or typeid().name(). This mirrors their
> implementation of operator==.
>
> I agree that the current design may not be well suited for a general
> purpose hash function that needs to be able to sort objects of
> arbitrary types into buckets. I'm not sure how often this occurs in
> practice. Do you have examples?
>

Actually, I agree that if the issue is only vectors, pairs and
sequences that the pragmatics argues to this approach. This behavior,
though it is contrary to the spirit of object oriented programming (at
least as I've interpreted that concept for 20+ years) is harmless.
Different types (kinds, classes) of objects are different except as
specified by is-a inheritance. I would still argue that it is not
behavior to be sought, but it is harmless if it is the most convenient
way to code things.

I am made very uncomfortable, however when we put structs (and
therefore, by a primary identity in the C++ language, classes) into the
mix. Here we may have inheritance relationship (not a union) where two
different types with different meanings to the quantities can end up
introducing a tendency for rather different things to hash to the same
values. I presented the example of two different classes representing
polar and Cartesian representation of complex numbers (and, yes, I have
written packages where these two representations co-exist and intermix
to allow optimizations). Of course for this example, you would clearly
want a custom hash, but the point is, subclassing does not make this
situation that bizarre. What about a table containing structs for
Resources, which could represent either employees or pieces of office
equipment where the characteristics of interest in each happen to all
be represented as a bunch of booleans, that happen to be the same
length?

I've perhaps ended up over-stating things. Let me summarize the
nitty-gritty: I see no good reason to promote this as *desirable*
behavior, and I worry about it occasionally producing some odd results.
  Good hashing errs on the side of not building in a bias towards
inappropriate collisions IF it can be conveniently avoided.

Topher


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk