Boost logo

Boost Users :

Subject: Re: [Boost-users] [intrusive] linear hashing
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2010-12-30 11:23:59


El 29/12/2010 4:48, Ephraim Sinowitz escribió:
> Hi,
> Can anyone explain the performance characteristics of linear hashing. I
> read the Wikipedia article and I'm still not sure. The article mentions
> growing the hash table incrementally by 1, but the instrusive
> unordered_set doesn't grow/rehash unless you tell it to? When does it
> make sense to use the linear hashing option? Why is it not on by
> default? Is it because it slows down the address lookup into the table.
> Is the ideal bucket size still 1 per expected element.

The Wikipedia article explains it quite well: Linear Hashing: The cost
of hash table expansion is spread out across each hash table insertion
operation, as opposed to being incurred all at once. Linear hashing is
therefore well suited for interactive applications.

Linear hashing typically requires power of two length bucket array, but
those buckets are not fully used from the beginning, as it is when using
non-linear hashing (which typically uses prime length bucket array).
Although the bucket array, say, has 16 buckets, the implementation uses
first just one or two, then when after incremental rehashing uses three,
etc.. incrementally, until it fills those 16 buckets. Then for a new
incremental rehash, it allocates a new bucket array with length 32, but
starts using only 17, and continues with incremental hashing until
reaching 32. I think Dinkum STL used this incremental rehashing. The key
is that in each incremental hashing, not all elements are rehashed, but
just elements of a single bucket, distributing hashing impact in all
insertions.

For more information on hashing alternatives see the original standard
hashing container proposal (chapter Control of Hash Resizing):

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html

Now, intrusive containers don't allocate memory at all, so incremental
rehashing must be triggered by the user using
"incremental_rehash(bool)" (use an additional bucket, that is,
incremental rehash) and "incremental_rehash(bucket_traits)" (to update
the new bucket array with an array that should be twice/half the size of
the previous one, to allow more incremental rehashings in the future). I
admit that this is not explained at all with an example, so I will note
this issue in my to do list.

As in Intrusive incremental (linear) hashing is triggered by the user, I
think most people should go with non-linear hashing (which is the
default), except those that want to build a non-intrusive container
based on intrusive that should have incremental hashing performance needs.

Best,

Ion


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net