Boost logo

Boost :

From: Daniel James (daniel_at_[hidden])
Date: 2005-03-10 05:35:51


Peter Dimov wrote:
> Thorsten Ottosen wrote:
>> Peter, I don't like that hasing a string is suboptimal. Likewise,
>> I would like hasing of sub_range<string> or sub_range<vector<char>>
>> to be really fast.
>
>
> See my other post. When talking about hash functions, you really need to
> have a clear understanding of "suboptimal" in mind. I don't claim to be
> an expert, but at least I have tried to familiarize myself with the
> research done in this area. My advice is: if you are going to mess with
> a (standard) hash function, you better (a) understand what you are
> doing, (b) have an extensive test suite that benchmarks an unordered_map
> with a wide range of real-world key sets.

Well, these two pages describe algorithms for hashing strings, which
apparently are faster with good (or better?) quality (I haven't tested
them yet):

http://burtleburtle.net/bob/hash/doobs.html
http://www.azillionmonkeys.com/qed/hash.html

They are faster because they process several characters at a time.

The problem is that it while it would be fairly easy to specialise
hash_range for this, it would be hard to get hash_combine to support
them. It would probably have to be replaced with an object that
maintains hashing state. Something like:

struct hash_combiner {
     hash_combiner();
     template <class T> void combine(T const&);
     std::size_t value() const;
};

And the internal state would be complex, especially since it has to deal
with arbitrary types. So it'd probably be slow, and loose its simple
definition, which is an important part of the design.

An alternative would be to drop the requirement that hash_range is
defined as returning the same result as using hash_combine. I think that
would be a bad idea.

A hash function that is optimised for specific types will probably
require reducing the libraries generic functionality, and make it less
transparent. This could be done within the current draft standard, but
not Peter's proposal. And I think that Peter's extensions are more
useful than the extra speed, which in practise might not be that great
(I haven't tested this, so I could be wrong).

I would like to investigate this kind of thing further. But it'll take time.

Daniel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk