|
Boost : |
From: Greg Colvin (greg_at_[hidden])
Date: 2001-01-06 20:49:07
From: David Abrahams <abrahams_at_[hidden]>
> From: "Daryle Walker" <darylew_at_[hidden]>
>
> > Since it seems that hash stuff is useful, maybe we could add hash-based
> > classes to Boost.
I've never been big on general-purpose hashtables, as a
tree-based set or map provides the same interface and
more, and reasonable average and worst case performance,
whereas a hashtable can have truly terrible worst case
performance.
> Funny you should mention that. Beman has been looking into that for some
> time, and I've recently had a idea:
>
> There are lots of different ways to formulate a hashtable, but the
> fundamental idea is that there's a randomly-accessible container of hash
> buckets, each of which can contain an arbitrary number of items. You could
> choose vector or deque (or something else?) for the array of buckets. Each
> bucket could be represented by any number of different containers, with
> different trade-offs. Sound familiar? To me it sounds a lot like the BGL
> adjacency list, and a ripe area for the use of selectors.
> Overgeneralization? Quite possibly; I don't know.
I don't think so.
When I've needed a hashtable I've needed it badly, because
I needed absolutely the best possible performance, and
have had to hand-tune every aspect of it to fit the task
at hand. I recall adapting one of Knuth's algorithms for
an LZW encoder, where he used a 'goto' as part of a fast
rehash strategy. I cleaned up the loop to eliminate the
'goto', and measured a 10% performance degradation. So I
put the 'goto' back, with a comment to effect that this
was God's own 'goto', and not to be trifled with.
The point being that really fast hashtables are a dark art,
the support of which requires lots of customizability.
> Also, isn't there a hashtable representation that never keeps more than a
> single item in each top-level array element? I think you can do something on
> insertion where you bump an item down to the next available bucket when you
> find a bucket is full, only rehashing when items get more than N steps from
> their hash bucket... even if we settle on a general-purpose hashtable
> representation, it might be a good idea to support the above.
The table I described above was stored in a single array,
but nothing got bumped. Collisions were resolved by a
rehash until you found an empty slot. The table size and
hash function were carefully tuned to trade off the cost
and probability of a collision and the cache space available
for the table.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk