Boost logo

Boost :

From: Daniel James (daniel_at_[hidden])
Date: 2005-03-13 11:31:33


Dave Harris wrote:

> Firstly, I think we have some agreement that the signature:
>
> size_t hash_range( It first, It last );
>
> is wrong

I didn't get the impression of any agreement about that.

> and there should be some way to provide an initial seed.

Although, I agree with this.

> Secondly, the proposed implementation for hashing pointers produces
> undefined behaviour because it subtracts pointers not in the same array.
> We have some agreement that using reinterpret_cast would be better.

Yes, and this will be fixed.

> Thirdly, I think there is also a legitimate query over whether the default
> hash_range should provide better quality hashes for arrays of bytes, eg
> along the lines discussed in:
> http://www.azillionmonkeys.com/qed/hash.html

Well, I was the one who pointed to that one. I'll keep quiet in the
future ;)

> With boost interface is more important than implementation, but apparently
> implementations of the hash library will not be allowed to vary.

This isn't exactly true. It is proposed that the standard explicitly
specifies the hash functions (and I've been criticized for not doing
that) but part of the purpose of this library is to get real world
experience on what the specification should be. And change it, if necessary.

> My
> impression is that the proposed implementation of hash_combine:
> seed ^= hash_value(v) + (seed << 6) + (seed >> 2);
>
> was designed to combine 8-bit character values; do we have any evidence it
> also performs well to combine 32-bit values? Perhaps hash_value(v) would
> be better treated as an array of 4 bytes?

I do intend to spend more time looking into alternatives in the future
(and will look into adding a constant to hash_combine very soon).

> Fourthly, there seems to be a design philosophy behind the current
> proposal which is not stated explicitly in the documentation. It
> apparently favours some notion of safety over speed and hash quality,
> where "safety" means that different objects should have the same hash
> values if they look at all alike to someone possibly.

Did you see Peter's original proposal? It's issue 6.18 in:

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf

Looking back at the documentation I should make the link more obvious.
Originally, I had a link in the first sentence, but that slipped while
editing it. Maybe (if it's okay with Peter) I should copy some of it
into the documentation.

> For example, the proposal has no specialisation for chars at all. I can't
> tell if that is an oversight, or whether it is a design aim that:
>
> char chars[] = { 1, 2, 3, 4 };
> int ints[] = { 1, 2, 3, 4 };
>
> assert( hash_range( chars, chars+4 ) == hash_range( ints, ints+4 ) );

I don't think it's either. Peter?

> I gather it is a design aim that hash_value( pair<int,int> ) match
> hash_range( int[2] ). Presumably if hash_range takes a seed argument, this
> means hash_value must too, else how can it match it?

When hash_range is called with a seed argument, it would be equivalent
to calling hash_combine for pair<int, int>. Perhaps it would be better
spelt as:

template <class It>
void hash_combine(std::size_t& seed, It begin, It end);

> I am not sure about all this philosophy. I had kinda hoped that a boost
> hashing library would offer good quality industrial-strength hashing; that
> it would save me having to do my own research. Regardless, I think the
> aims and guarantees of the proposed library should be documented more
> explicitly.
>
> In conclusion, I feel that if we given only 5 days to review a proposal,
> that proposal needs to be spot on, uncontroversial, with all issues nailed
> accurately in advance. I don't think that is the case here. Although I
> think all the problems could be resolved, I don't think we have enough
> time to do that in the given review period.

Well, we did discus the type of review before it started, and nobody
raised any objections. I'm sorry that you didn't feel that it was
enough. This does not have to be the end of the discussion.

On a related note, I am going to be requesting a review for the
unordered associative containers soon. It has been suggested that since
they are based on TR1 they should have a shorter than normal review (but
longer than this one). So I guess you think it should have a full length
review?

Although, it might turn out to be less controversial since it doesn't
add anything to the specification. Which is the reverse of what I thought.

> If I have to decide now, I'd
> say:
>
> This should not be accepted as a Boost library.

I'm sorry to hear that. Thank you for reviewing the library and bringing
up several good points.

> I am not an expert on hashing, but I have implemented my own hashing
> library. I have encountered practical problems due to different objects
> having the same hash values.

Is this in C++? I would have thought that type safety would have helped
avoid such problems. To use the hash function object with a polymorphic
type you'll need to write your own function which should take account of
the type. Such a function could be written for boost::variant and
similar types.

Daniel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk