Boost logo

Boost Users :

From: Bill Somerville (bill_at_[hidden])
Date: 2008-04-03 18:16:59


Robert Dailey wrote:
> On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey <rcdailey_at_[hidden]
> <mailto:rcdailey_at_[hidden]>> wrote:
>
> On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe
> <watanabesj_at_[hidden] <mailto:watanabesj_at_[hidden]>> wrote:
>
[...snip...]
>
> > Also, the definition of 'bucket' eludes me. If someone could
> explain
> > what buckets are in relation to unordered associative
> containers I'd
> > appreciate it.
>
Buckets containers for the elements with equal hash values.
>
>
> Does the following (very naive) example implementation give
> some idea of
> how a hash table works?
>
> const int num_buckets = 167;
> std::vector<int> buckets[numBuckets];
>
> void insert(int x) {
> buckets[hash(x) % numBuckets].push_back(x);
> }
>
Note that insert doesn't overwrite bucket - it adds the new element to
the bucket which is a vector.
>
>
> In Christ,
> Steven Watanabe
>
>
> According to your example, what happens if I do:
>
> insert(167);
> insert(334);
>
> ???
>
>
>
> The above is assuming that the hash() function does:
>
> hash(x) { return x; }
The hash function maps an element key with a number, all equal keys must
produce the same number, hash functions are a many to one mapping usually.

> insert(167);
> insert(334);
>
> the above code (second insert) would cause the same element in the
> vector to be overwritten. The math is a little bit odd too. It doesn't
> really answer a lot of questions. The inserted values may also be
> strings as well, however in your example it appears as if the hash
> function only handles integral values.

Elements are distributed across the buckets, if more than one element
maps to the same bucket then that is termed a collision. If the hash
function is perfect then each bucket will contain 1 element when the
hash map is full and lookup is O(1), too many elements (or too few
buckets) or an imperfect hash function cause multiple elements in one or
more buckets; tending to O(n) lookup (like a vector) in the worst case
where all elements are in one bucket.

Choose a combination of hash function and number of buckets that gives
the required performance.

HTH

-- 
Bill Somerville
Class Design Limited

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net