Boost logo

Boost :

From: Thorsten Ottosen (nesotto_at_[hidden])
Date: 2005-03-10 10:53:26


"Peter Dimov" <pdimov_at_[hidden]> wrote in message
news:000901c52580$359f96a0$6601a8c0_at_pdimov...
| Thorsten Ottosen wrote:
| > "Peter Dimov" <pdimov_at_[hidden]> wrote in message
| > news:019c01c5256a$561efe40$6601a8c0_at_pdimov...
| >> Peter Dimov wrote:
| >>> But as I said, benchmarking hash functions is pointless. Real
| >>> programs do not compute the hash value of a random 256 byte buffer
| >>> five million times.
| >>
| >> ... or if they do, they use different 256 byte buffers and do not
| >> operate from the L1 cache. ;-)
| >
| > no, but AFAICT we cannot exclude that a certain hash-function is much
| > faster *and* otherwise behaves as good as the other "naive" ones.
|
| Simplistic benchmarks can't be used to evaluate the performance of a
| component; it should be benchmarked in its usual role. In most real world
| scenarios, the performance of a hash function is memory access bound,
| because the hashed data isn't already in the L1 cache. The proper way to
| evaluate the performance of a hash function is to benchmark unordered_map
| lookups.

right; then maybe somebody should make those test?

| Which certain hash function is much faster and otherwise has all the
| desirable characteristics of the proposed design and implementation?

one fo the two links that James mentioned mentioned something about the
distribution being as good as normal.

| > Can anybody answer this question: is it always optimal to use the
| > whole range to compute the hash value; for eg. strings, if I know the
| > average length of my strings, I can't see that it would make sense to
| > process much more the the average length (or perhaps even shorter).
|
| The default hash function must be good enough if no extra information about
| the properties of the key set is available. When extra information _is_
| available, you shouldn't use the default.
|
| Excluding some parts of the range leads to pathological cases when the key
| set only differs in the excluded parts. This leads to orders of magnitude
| slowdowns. (This is not speculation, it is a fact.)

so basically you're saying that any collision is much worse than a slower
hash function because we must use == to go through the bucket ?

-Thorsten


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk