Subject: Re: [Boost-bugs] [Boost C++ Libraries] #9282: performance regression in boost::unordered on 64-bit platforms
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-10-23 21:20:17
#9282: performance regression in boost::unordered on 64-bit platforms
--------------------------+------------------------------------------------
Reporter: aiapub- | Owner: danieljames
cpp@⦠| Status: assigned
Type: Bugs | Component: unordered
Milestone: To Be | Severity: Regression
Determined | Keywords: unordered mix64_policy performance
Version: Boost |
1.53.0 |
Resolution: |
--------------------------+------------------------------------------------
Comment (by danieljames):
This particular benchmark hits the best case behaviour for the prime
policy, so it should be a bit faster than the mix policy. You're inserting
consecutive values which will result in no collisions when using the prime
policy, but for the mix policy the bucket should be chosen pseudo-randomly
which means there will be some collisions. But in theory that result
shouldn't happen in general.
I wouldn't deny that some may consider that best case desirable, so it
might be worth using the prime policy for integer types. But I first want
to check that is what's happening here, and not something worse.
I'm currently using a 32 bit operating system, which makes running your
test a bit tricky, so can you try running this with the resulting
container to get an idea of how many collisions there are:
{{{
#include <map>
#include <iostream>
#include <boost/unordered_set.hpp>
#include <stdint.h>
void bucket_histogram(boost::unordered_set<uint64_t> const &x) {
typedef std::map<std::size_t, std::size_t> map;
map frequencies;
std::cout << "Bucket count: " << x.bucket_count() << std::endl;
for (std::size_t i = 0; i < x.bucket_count(); ++i) {
++frequencies[x.bucket_size(i)];
}
for (map::const_iterator i = frequencies.begin(); i !=
frequencies.end(); ++i) {
std::cout << i->first << " " << i->second << std::endl;
}
std::cout << std::endl;
}
}}}
The output for the prime policy should be something like:
{{{
Bucket count: 100663319
0 663319
1 100000000
}}}
i.e. no collisions. All buckets either have 1 element, or no elements.
The output for the mix policy should be something like:
{{{
Bucket count: 134217728
0 63716044
1 47468076
2 17685158
3 4390091
4 819351
5 122072
6 15183
7 1564
8 176
9 13
}}}
i.e. up to 9 collisions.
I got those values by running a simulation of the container. If it's
significantly worse than that, then there's definitely a bug.
In theory, the average frequencies should follow the poisson distribution
with lambda = number of elements/number of buckets:
{{{
0 63714059.8155024
1 47470673.7812626
2 17684204.0498788
3 4391919.58553563
4 818058.771181037
5 121900.256154096
6 15137.1280543625
7 1611.14846535692
8 150.049893684399
9 12.4217647383941
10 0.925493593394311
}}}
Which is pretty similar to the results my simulation got.
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/9282#comment:4> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:14 UTC