Boost logo

Boost :

From: Thorsten Ottosen (thorsten.ottosen_at_[hidden])
Date: 2008-04-21 04:39:48


Daniel James skrev:
> On 19/04/2008, Thorsten Ottosen <thorsten.ottosen_at_[hidden]> wrote:
>> Hi Jeremy,
>
> Jeremy wrote the original version, I'm responsible for the library nowadays.

Of course, sorry about that.

>> I tried expanding your list of primes so that it contains more numbers
>> in the small range and in particulat such that it contains numbers
>> closely above powers of 2. It happens frequently that powers of 2 comes up.
>
> The theory is that you get less collisions when the size is as far
> from a power of 2 as possible. But I haven't looked into the research
> that this is based on, so I'm not sure how trustworthy it is.
>
>> Here's my new list:
>>
>> static const std::size_t prime_list[] = {
>> 5ul, 11ul, 17ul, 29ul, 37ul, 53ul, 67ul, 79ul,
>> 97ul, 131ul, 193ul, 257ul, 389ul, 512ul, 769ul,
>> 1031ul, 1543ul, 2053ul, 3079ul, 6151ul, 12289ul, 24593ul,
>> 49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
>> 1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
>> 50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
>> 1610612741ul, 3221225473ul, 4294967291ul };
>
> It was requested that I use smaller increments during the review. I
> haven't looked at it yet, but I created a ticket and I should get
> round to it before 1.36:
>
> http://svn.boost.org/trac/boost/ticket/1710
>
>> For the test I conducted, not once did I get a max load factor above 1
>> for containers of size 4, 5, 7, 9 etc up to about 512.
>
> It's written so that it rehashes whenever an insert will cause the
> maximum load factor to be exceeded. So you never will. The main
> performance worry is that it will rehash too often

Right.

> so It might be
> necessary to either let it exceed the maximum load factor a little, or
> ensure that it always grows by at least a certain percentage.
>
> Although for larger sizes you'll generally find that the memory use of
> the nodes (the element size, one or two pointers and the memory
> management overhead for each node) is much larger than the size of the
> bucket array (one pointer per bucket and the overhead for a single
> allocation).

Ok. I assumed wrongly that the buckets where not a pointer to something.

>> Another alternative would be to allow users to control the initial
>> bucket coutn explcitly in the constructor. I think I might like this
>> better. Something like
>>
>> boost::unordered_set<T*> the_set( boost::unordered::bucket_count(32u) );
>> assert( the_set.bucket_count() == 32u );
>
> The problem is that performance will typically degrade when the bucket
> count isn't a prime number. Although if you have a high quality hash
> function this shouldn't be the case.
>
> If you did this, would you want the bucket count to remain fixed,
> regardless of how large the container gets?

My typical use case is that I will fill the container once
(and I know how many elements that will be put in there). Then this
object is used again and again for lookup. In this case, allocating room
for 2-3 times the need seems like overkill, and perhaps only allocating
20%-30% extra would be more fair.

> An alternative that has some appeal is letting the user specify a
> policy class to control the bucket count. Although I'm not sure how
> that would play with the standard.

Poor, probably.

I guess for cases where I know the intergers fall in the range, say,
[0,64], I can just use vector<optional<unsigned>>.

-Thorsten


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk