Boost logo

Boost :

From: Wyss, Felix (FelixW_at_[hidden])
Date: 2004-01-04 21:46:24


> > [Wyss, Felix] The only types I can think of, where the comparison is
> > more expensive, are types of the same (or smaller) size than the
> > hash. That usually means integers or raw pointers. For those
> > types, it would incur one additional compare, but they're already
> > cheap. For types that are expensive to compare, the savings can be
> > substantial, in particular as the worst case (lots of collisions) is
> > not as bad. This also means lower quality hash functions can be
> > used without excessive worst-case penalties.
>
> The use of extra storage has a bad effect on cache locality, and thus
> performance.

[Wyss, Felix] I bed to disagree. See below.

>
> > I have a hard time buying the memory argument, as doubling the
> > maximum load factor will only increase the runtime cost by a single
> > hash value compare but keep the memory consumption the same
> > (assuming the hash value uses the same number of bytes as a
> > pointer).
>
> What does load factor have to do with anything?

[Wyss, Felix] The size of hash tables are commonly chosen to make
collisions reasonably unlikely. Let's assume an application expects on
average to store at most 1000 elements in a hashed container while
maintaining a load factor of 0.75 or less. In that case, the hash table
would have to have 1333 elements. As using a cached hash value allows
for efficient resolution of collisions, we can run the table with a
higher load factor without incurring additional expensive element
comparisons.
For example, by doubling the maximum load factor to 1.5, we can reduce
the table size to 667 elements. Assuming that the hash value uses the
same number of bytes as a table slot (int vs. pointer), the memory
overhead of our table with 1000 elements would be the same.
A smaller hash table favorably affects the cache locality, in particular
if the table is only partially filled or the distribution of element
lookups is non-uniform.

There is another advantage of caching the hash value: Lookups that are
misses will not require an element comparison unless two dissimilar
elements have the same hash value.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk