From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-03-10 09:48:16
Thorsten Ottosen wrote:
> "Peter Dimov" <pdimov_at_[hidden]> wrote in message
>> Peter Dimov wrote:
>>> But as I said, benchmarking hash functions is pointless. Real
>>> programs do not compute the hash value of a random 256 byte buffer
>>> five million times.
>> ... or if they do, they use different 256 byte buffers and do not
>> operate from the L1 cache. ;-)
> no, but AFAICT we cannot exclude that a certain hash-function is much
> faster *and* otherwise behaves as good as the other "naive" ones.
Simplistic benchmarks can't be used to evaluate the performance of a
component; it should be benchmarked in its usual role. In most real world
scenarios, the performance of a hash function is memory access bound,
because the hashed data isn't already in the L1 cache. The proper way to
evaluate the performance of a hash function is to benchmark unordered_map
Which certain hash function is much faster and otherwise has all the
desirable characteristics of the proposed design and implementation?
> Can anybody answer this question: is it always optimal to use the
> whole range to compute the hash value; for eg. strings, if I know the
> average length of my strings, I can't see that it would make sense to
> process much more the the average length (or perhaps even shorter).
The default hash function must be good enough if no extra information about
the properties of the key set is available. When extra information _is_
available, you shouldn't use the default.
Excluding some parts of the range leads to pathological cases when the key
set only differs in the excluded parts. This leads to orders of magnitude
slowdowns. (This is not speculation, it is a fact.) To add insult to injury,
because CPU reads are typically a cache line wide (16/32 bytes), it can turn
out that the "optimized" hash function isn't even much faster.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk