Boost logo

Boost :

From: Dave Harris (brangdon_at_[hidden])
Date: 2005-03-13 10:11:13


In-Reply-To: <d0kr8r$gl6$1_at_[hidden]>
nesotto_at_[hidden] (Thorsten Ottosen) wrote (abridged):
> It's a pleasure to announce that the review of Daniel James' hash
> functions library starts today (8th of March) and ends the 12th
> of March.

I see I have missed the deadline. Oh, well.

Several queries have been raised during the review, some of which have
merit. Firstly, I think we have some agreement that the signature:

    size_t hash_range( It first, It last );

is wrong and there should be some way to provide an initial seed. I am not
yet decided on what the best signature should be; I am tempted towards a
3rd argument with a default value, but I can also see some merit in
mimicking hash_combine() by making the seed the first argument.

Secondly, the proposed implementation for hashing pointers produces
undefined behaviour because it subtracts pointers not in the same array.
We have some agreement that using reinterpret_cast would be better.

Thirdly, I think there is also a legitimate query over whether the default
hash_range should provide better quality hashes for arrays of bytes, eg
along the lines discussed in:
   http://www.azillionmonkeys.com/qed/hash.html
   
With boost interface is more important than implementation, but apparently
implementations of the hash library will not be allowed to vary. My
impression is that the proposed implementation of hash_combine:
    seed ^= hash_value(v) + (seed << 6) + (seed >> 2);

was designed to combine 8-bit character values; do we have any evidence it
also performs well to combine 32-bit values? Perhaps hash_value(v) would
be better treated as an array of 4 bytes?

Fourthly, there seems to be a design philosophy behind the current
proposal which is not stated explicitly in the documentation. It
apparently favours some notion of safety over speed and hash quality,
where "safety" means that different objects should have the same hash
values if they look at all alike to someone possibly.

For example, the proposal has no specialisation for chars at all. I can't
tell if that is an oversight, or whether it is a design aim that:

    char chars[] = { 1, 2, 3, 4 };
    int ints[] = { 1, 2, 3, 4 };

    assert( hash_range( chars, chars+4 ) == hash_range( ints, ints+4 ) );

I gather it is a design aim that hash_value( pair<int,int> ) match
hash_range( int[2] ). Presumably if hash_range takes a seed argument, this
means hash_value must too, else how can it match it?

I am not sure about all this philosophy. I had kinda hoped that a boost
hashing library would offer good quality industrial-strength hashing; that
it would save me having to do my own research. Regardless, I think the
aims and guarantees of the proposed library should be documented more
explicitly.

In conclusion, I feel that if we given only 5 days to review a proposal,
that proposal needs to be spot on, uncontroversial, with all issues nailed
accurately in advance. I don't think that is the case here. Although I
think all the problems could be resolved, I don't think we have enough
time to do that in the given review period. If I have to decide now, I'd
say:

   This should not be accepted as a Boost library.

Stock questions: I have not tried to compile the code. I have looked at
the documentation, the source, and followed some of the references. I have
raised design questions here and followed the discussion. Total time is a
few hours.

I am not an expert on hashing, but I have implemented my own hashing
library. I have encountered practical problems due to different objects
having the same hash values.

-- Dave Harris, Nottingham, UK


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk