Boost logo

Boost :

From: Dave Harris (brangdon_at_[hidden])
Date: 2005-03-21 22:35:15


In-Reply-To: <003501c52a54$a28c4b90$6601a8c0_at_pdimov>
pdimov_at_[hidden] (Peter Dimov) wrote (abridged):
> > size_t hash_value( int v ) {
> > return size_t(v) + default_seed;
> > }
>
> I wonder what do we gain from this.

As you pointed out when I first suggested adding the constant in
hash_combine:

    This would mean N fixes for a sequence of length N, when one
    simple fix is enough. The initial value of the seed is arbitrary
    so we can vary it at will. hash_combine, on the other hand, is
    based on research and has apparently been selected as one of the
    best amongst the shift+xor family. This doesn't mean that it's
    sacred, just that if we change it we'll no longer be able to
    just point at a PDF as a rationale but will need to do our
    own research (which might be a good idea anyway, though.)

:-)

> From the point of view of hash_combine the effect is the same

The difference shows when you have something like:

    struct A { int x, y; };
    struct B { A a, b, c; };
    
    size_t hash_value( const A &a ) {
        size_t hash = 0xbead1;
        hash_combine( hash, a.x );
        hash_combine( hash, a.y );
        return hash;
    }
    
    size_t hash_value( const B &b ) {
        size_t hash = 0xbead2;
        hash_combine( hash, b.a );
        hash_combine( hash, b.b );
        hash_combine( hash, b.c );
        return hash;
    }

Here hash_combine is called 9 times, but hash_value(int) is only called 6
times, so we get different results depending on where the constant is
added.

> and we now rely on the user overloads of hash_value to not produce
> a zero.

That's surely reasonable. The hash_value of any user-defined class should
be defined in terms of the hash_values of primitives, and boost provides
all of those.

> This reflects their intended use. The two argument overload is used
> when one has a whole range and wants its hash value, as in the
> hash_value overload for std::vector, for example.

I actually think this is going to be fairly rare. This is mainly because
there will usually be a constant thrown in to represent the type of the
object. (I appreciate you won't do that for std containers, but I maintain
its a good idea for user-defined types.)

The 2-argument version is strictly redundant as it must be defined in
terms of the 3-argument version. It's just a convenience function, used
mainly by the std containers.

Admittedly hash_combine is also (nearly) redundant, being definable as:

    void hash_combine( size_t &hash, const T &t ) {
        hash_range( hash, &t, &t+1 );
    }

if T does not overload address-of.

Incidently, do you agree we will sometimes want to pass a hash value as
the second argument to hash_combine? Like:

    hash_combine( hash, hash_range( first, last ) );
    hash_combine( hash, obj.get_hash( some_arg ) );

If not, then maybe we should have a hash_combine that just combines
hashes. At the moment the combining is mixed in with the getting; it's not
a very orthogonal API.

-- Dave Harris, Nottingham, UK


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk