Boost logo

Boost :

From: Howard Hinnant (hinnant_at_[hidden])
Date: 2001-01-09 22:01:36


Lois Goldthwaite wrote on 1/9/2001 11:53 AM
>I like the approach taken in the Dinkumware hash map, where all the
>values are kept in a std::list and the buckets tie hash keys to list
>iterators. This is not only elegant, but it lets you step through the
>values with a bidirectional iterator. Most other implementations provide
>forward iterators only.

You speak of two interesting design decisions, not just one:

     1. Internal structure looks something like:

         Design A:
          list_type<T> list_;
          vector<list_type::iterator> buckets_;

         vs:

         Design B:
          struct node
          {
               T entry_;
               node* next_;
          };
          vector<node*> buckets_;

(*Disclaimer: this is not meant to be an exhaustive list of possible
hash table internal structures)

And,

     2. Whether to support bidirectional iterators or just forward
iterators.

The decision of iterator type can be implemented with either internal
structure. In Design A, you can make list_type either list or slist.
And in Design B you can add (or not) a previous pointer to the node
struct.

Design A (as you point out) favors elegant and efficient iteration over
the hash. But it comes at the cost of relative inefficiencies during
insert/erase. Imagine a buckets_ size of 1000, but an empty list_: All
those buckets_ iterators must be pointing into the list_, presumably at
list_.end(). Now insert an entry at buckets_[500]. Now (I would guess)
the first half of the buckets_ list needs to point at the beginning of
the list. So in a worst case, insert complexity is on the order of
buckets_.size(). Now as buckets_ fills up, this inefficiency disappears.

Design B on the other hand is more efficient at insert/erase, even when
the buckets_ is empty (constant complexity). But iteration can become
expensive on a large but empty buckets_ list (linear complexity). This
is because one may have to iterate over many empty buckets to find the
next element in the hash.

Summary: For well populated buckets, designs A and B have similar
performance. But for sparsely populated buckets, design A favors
iteration at the cost of insert/erase, while design B favors insert/erase
at the cost of iteration.

Depending upon your feelings about favoring design A or B (favoring
iteration vs insert/erase), the second design decision will probably fall
into place more easily: Do you favor iteration (being able to iterate
backward), or do you favor smaller overhead (which may lead to faster
find/insert/erase due to cache hits)?

These are implementation details that the hash author must grapple with.
What may be even harder though is developing the interface for hash which
will allow the developer to implement any of these tradeoffs (and perhaps
more).

-Howard


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk