Boost logo

Boost Users :

From: Robert Dailey (rcdailey_at_[hidden])
Date: 2008-04-03 17:18:50


On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey <rcdailey_at_[hidden]> wrote:

> On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe <watanabesj_at_[hidden]>
> wrote:
>
> > AMDG
> >
> > Robert Dailey wrote:
> > > On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey <rcdailey_at_[hidden]
> > > <mailto:rcdailey_at_[hidden]>> wrote:
> > >
> > > Hey guys,
> > >
> > > Hash tables are new to me since they have never been part of the
> > > C++ standard template library, and having started C++ as my first
> > > language I have never been exposed to them. Ever since
> > > unordered_map appeared in the Boost library, I've been trying to
> > > understand how they function in terms of functionality, memory,
> > > and performance. A major source of information has been from
> > > Wikipedia, specifically this page
> > > <http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a
> > > few questions concerning boost::unordered_map:
> > >
> > > 1. From what I've read, the performance of a hash table is
> > > O(1), however it mentions that depending on specific
> > > situations it may be O(n). Is this true with unordered map?
> > > If so, what conditions would force it to maintain O(n)
> > > complexity?
> > >
> >
> > lookup is O(n) if the hash function happens to return the same value for
> > all the elements.
> >
> > > 1.
> > >
> > >
> > > 2. What is the memory footprint of an unordered_map? From my
> > > research, hash tables seem to be a simple wrapper over an
> > > array in that the size of the "internal array" in the hash
> > > table will contain as many elements as the largest hash
> > > value generated by one of the key elements. So if I put two
> > > elements in my hash table and the generated keys are 200 and
> > > then 1,000,000, then the hash table would be 1,000,000
> > > elements in size (which is at least a megabyte of memory in
> > > the most optimistic scenario). This logic is rather
> > > ridiculous and impractical, so I am pretty sure I'm
> > > misunderstanding how the memory mechanics work under the
> > > hood for an unordered_map.
> > >
> >
> > unordered_map takes the user's hash function and reduces it mod the
> > current number of buckets.
> >
> > > Those are the biggest questions I have now and if I think of more
> > > I'll most certainly be providing follow-up posts. I thank everyone
> > > in advance for reading and helping.
> > >
> > >
> > >
> > > Also, the definition of 'bucket' eludes me. If someone could explain
> > > what buckets are in relation to unordered associative containers I'd
> > > appreciate it.
> >
> > Does the following (very naive) example implementation give some idea of
> > how a hash table works?
> >
> > const int num_buckets = 167;
> > std::vector<int> buckets[numBuckets];
> >
> > void insert(int x) {
> > buckets[hash(x) % numBuckets].push_back(x);
> > }
> >
> > In Christ,
> > Steven Watanabe
>
>
> According to your example, what happens if I do:
>
> insert(167);
> insert(334);
>
> ???
>

The above is assuming that the hash() function does:

hash(x) { return x; }

insert(167);
insert(334);

the above code (second insert) would cause the same element in the vector to
be overwritten. The math is a little bit odd too. It doesn't really answer a
lot of questions. The inserted values may also be strings as well, however
in your example it appears as if the hash function only handles integral
values.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net