Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-03-10 06:48:37


Daniel James wrote:
> Peter Dimov wrote:
>> Thorsten Ottosen wrote:
>>> Peter, I don't like that hasing a string is suboptimal. Likewise,
>>> I would like hasing of sub_range<string> or sub_range<vector<char>>
>>> to be really fast.
>>
>>
>> See my other post. When talking about hash functions, you really
>> need to have a clear understanding of "suboptimal" in mind. I don't
>> claim to be an expert, but at least I have tried to familiarize
>> myself with the research done in this area. My advice is: if you are
>> going to mess with a (standard) hash function, you better (a)
>> understand what you are doing, (b) have an extensive test suite that
>> benchmarks an unordered_map with a wide range of real-world key sets.
>
> Well, these two pages describe algorithms for hashing strings, which
> apparently are faster with good (or better?) quality (I haven't tested
> them yet):
>
> http://burtleburtle.net/bob/hash/doobs.html
> http://www.azillionmonkeys.com/qed/hash.html

These pages are good.

But as I said, benchmarking hash functions is pointless. Real programs do
not compute the hash value of a random 256 byte buffer five million times.
They usually need to do something with the hash value. This something
typically involves a division on a prime to obtain a modulus, several
pointer chasings, and at least one equality comparison (which on a 256 byte
buffer will have considerable impact).


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk