# Boost :

From: Dave Harris (brangdon_at_[hidden])
Date: 2005-03-21 22:35:15

In-Reply-To: <003501c52a54\$a28c4b90\$6601a8c0_at_pdimov>
pdimov_at_[hidden] (Peter Dimov) wrote (abridged):
> > size_t hash_value( int v ) {
> > return size_t(v) + default_seed;
> > }
>
> I wonder what do we gain from this.

As you pointed out when I first suggested adding the constant in
hash_combine:

This would mean N fixes for a sequence of length N, when one
simple fix is enough. The initial value of the seed is arbitrary
so we can vary it at will. hash_combine, on the other hand, is
based on research and has apparently been selected as one of the
best amongst the shift+xor family. This doesn't mean that it's
sacred, just that if we change it we'll no longer be able to
just point at a PDF as a rationale but will need to do our
own research (which might be a good idea anyway, though.)

:-)

> From the point of view of hash_combine the effect is the same

The difference shows when you have something like:

struct A { int x, y; };
struct B { A a, b, c; };

size_t hash_value( const A &a ) {
size_t hash = 0xbead1;
hash_combine( hash, a.x );
hash_combine( hash, a.y );
return hash;
}

size_t hash_value( const B &b ) {
size_t hash = 0xbead2;
hash_combine( hash, b.a );
hash_combine( hash, b.b );
hash_combine( hash, b.c );
return hash;
}

Here hash_combine is called 9 times, but hash_value(int) is only called 6
times, so we get different results depending on where the constant is
added.

> and we now rely on the user overloads of hash_value to not produce
> a zero.

That's surely reasonable. The hash_value of any user-defined class should
be defined in terms of the hash_values of primitives, and boost provides
all of those.

> This reflects their intended use. The two argument overload is used
> when one has a whole range and wants its hash value, as in the
> hash_value overload for std::vector, for example.

I actually think this is going to be fairly rare. This is mainly because
there will usually be a constant thrown in to represent the type of the
object. (I appreciate you won't do that for std containers, but I maintain
its a good idea for user-defined types.)

The 2-argument version is strictly redundant as it must be defined in
terms of the 3-argument version. It's just a convenience function, used
mainly by the std containers.

Admittedly hash_combine is also (nearly) redundant, being definable as:

void hash_combine( size_t &hash, const T &t ) {
hash_range( hash, &t, &t+1 );
}

if T does not overload address-of.

Incidently, do you agree we will sometimes want to pass a hash value as
the second argument to hash_combine? Like:

hash_combine( hash, hash_range( first, last ) );
hash_combine( hash, obj.get_hash( some_arg ) );

If not, then maybe we should have a hash_combine that just combines
hashes. At the moment the combining is mixed in with the getting; it's not
a very orthogonal API.

-- Dave Harris, Nottingham, UK