Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-03-10 12:36:25


Dave Harris wrote:
> In-Reply-To: <018901c52569$ea842610$6601a8c0_at_pdimov>
> pdimov_at_[hidden] (Peter Dimov) wrote (abridged):
>> That would be bad because the hash value of the pair (x, y) would no
>> longer match the hash value of the sequence (x, y).
>
> Why is it necessary for pairs and sequences to have the same hash
> value?
> They are different things.

pair<int, int>, int[2], struct { int a; int b; } are different
representations of the same thing; it is not strictly necessary for them to
use the same algorithm, but it's desirable and will eliminate a source of
confusion and mistakes.

> If it is necessary, we can avoid the problem by fixing hash_range:
>
> template <typename It>
> size_t hash_range( It first, It last ) {
> if (first == last)
> return 0x87654321; // Or some other value;
> size_t seed = hash_value(*first);
> for (++first; first != last; ++first)
> seed = hash_combine( seed, *first );
> return seed;
> }

The simpler "initialize seed, modify seed with each element" approach is
much cleaner. It can also be expressed in terms of

    void hash_range( size_t & seed, It first, It last );

should this function be added. You are welcome to spell the above as

    size_t hash_range( size_t seed, It first, It last );

and I understand your reasons for doing so, I just don't see side effects as
so evil. Anyway, the good thing in this formulation is that you can hash a
range in stages, if you only have access to a subrange at a time. The
non-uniform treatment of empty sequences and the first element of the
sequence would force you to keep state. State is even more evil than side
effects. ;-)

> This also gives hash_range( &a, &a+1 ) the same result as
> hash_value(a), which your approach does not.

Yes, you are right, starting with a nonzero seed makes sequences of one
element produce a different hash value than the element itself.

>> Do you have a different hash_combine in mind that doesn't suffer from
>> this problem and is better than the proposed implementation? ;-)
>> (Keeping in mind that even the current hash_combine is perceived as
>> slower than necessary, of course.)
>
> I had in mind something like:
>
> template <typename T>
> size_t hash_combine( size_t seed, const T &v ) {
> const size_t c = 0x12345678; // Or some other value.
> return seed ^ (hash_value(v) + c + (seed<<6) + (seed>>2));
> }
>
> This costs one extra addition, which can probably be scheduled in
> parallel with other operations anyway.

This may be good enough.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk