Boost logo

Boost :

From: Daniel James (daniel_at_[hidden])
Date: 2005-03-11 07:56:25


Tobias Schwinger wrote:
> I am missing a way to parametrize how hash values are combined.
> This is especially desirable because the skeleton could then be
> exploited to implement double hashing for compound data (a comon
> technique to avoid collisions by varying the hash function) and can be
> useful in other situations (e.g. a small hash table in memory vs. a
> large one on disk).

I think if you want to combine hash values in a different manner, it's
best just write your own version of hash_combine. It could save a little
code if we supplied a version of hash_range that works for different
combining functions, but I don't think it'll be that useful (and would
require that your hash_combine function had the same signature as ours,
you might want to do it differently).

> - What is your evaluation of the documentation?
>
> Good. Easily understandable.
>
> Nitpicking: a hash table is not required to store its elements
> in buckets in general - this is a property of the most common way to
> implement hash tables.

Good point. I've been spending too much time looking at the draft
standard for hash based containers (which require buckets).

> - What is your evaluation of the implementation?
>
> Here are my notes from reading the source:
>
> The implementation of hash_combine does not need conditional jumps as a
> probably slightly slower variant, which I've seen quite often does. I
> know this variant works empirically well in practice, so will this one,
> I figure.
>
> However, I believe that hashing speed is generally overrated and that
> there are numerous situations where it's worth to trade some of it for
> fewer collisions (as memory access generally gives more room for
> performance optimization than CPU instructions do).

I do need to spend more time looking into alternative hashing algorithms
for a future version.

> One idea that immediately came to my mind was some "syntactic sugar" to
> ease hashing of class members:
>
> std::size_t hash_value(point const& p)
> {
> return (boost::hash_value = p.x, p.y, p.z);
> }
>
> Note: It's just a spontanous idea I am not convinced of myself...

A hash function for boost::tuple would be a good way to do this,
although something along those lines might be more efficient.

> The for loop in hash_float_value is a nice one to be unrolled via
> metaprogramming.

Yes, for anyone who hasn't looked at the code it loops a fixed number of
times based on the accuracy of the float type. Compilers could unroll
that loop themselves, but that's probably too much to expect.

> - Do you think the library should be accepted as a Boost library?
>
> Yes. Though there should be some research if it is possible to add to
> its genericity as mentioned in the first paragraph (just saw the latest
> discussions and seems we're already at it).

Thanks for your review.

Daniel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk