Re: [Boost-bugs] [Boost C++ Libraries] #9282: performance regression in boost::unordered on 64-bit platforms

Subject: Re: [Boost-bugs] [Boost C++ Libraries] #9282: performance regression in boost::unordered on 64-bit platforms
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-10-23 21:20:17


#9282: performance regression in boost::unordered on 64-bit platforms
--------------------------+------------------------------------------------
  Reporter: aiapub- | Owner: danieljames
  cpp@… | Status: assigned
      Type: Bugs | Component: unordered
 Milestone: To Be | Severity: Regression
  Determined | Keywords: unordered mix64_policy performance
   Version: Boost |
  1.53.0 |
Resolution: |
--------------------------+------------------------------------------------

Comment (by danieljames):

 This particular benchmark hits the best case behaviour for the prime
 policy, so it should be a bit faster than the mix policy. You're inserting
 consecutive values which will result in no collisions when using the prime
 policy, but for the mix policy the bucket should be chosen pseudo-randomly
 which means there will be some collisions. But in theory that result
 shouldn't happen in general.

 I wouldn't deny that some may consider that best case desirable, so it
 might be worth using the prime policy for integer types. But I first want
 to check that is what's happening here, and not something worse.

 I'm currently using a 32 bit operating system, which makes running your
 test a bit tricky, so can you try running this with the resulting
 container to get an idea of how many collisions there are:

 {{{
 #include <map>
 #include <iostream>
 #include <boost/unordered_set.hpp>
 #include <stdint.h>

 void bucket_histogram(boost::unordered_set<uint64_t> const &x) {
     typedef std::map<std::size_t, std::size_t> map;
     map frequencies;

     std::cout << "Bucket count: " << x.bucket_count() << std::endl;

     for (std::size_t i = 0; i < x.bucket_count(); ++i) {
         ++frequencies[x.bucket_size(i)];
     }

     for (map::const_iterator i = frequencies.begin(); i !=
 frequencies.end(); ++i) {
         std::cout << i->first << " " << i->second << std::endl;
     }

     std::cout << std::endl;
 }
 }}}

 The output for the prime policy should be something like:

 {{{
 Bucket count: 100663319
 0 663319
 1 100000000
 }}}

 i.e. no collisions. All buckets either have 1 element, or no elements.

 The output for the mix policy should be something like:

 {{{
 Bucket count: 134217728
 0 63716044
 1 47468076
 2 17685158
 3 4390091
 4 819351
 5 122072
 6 15183
 7 1564
 8 176
 9 13
 }}}

 i.e. up to 9 collisions.

 I got those values by running a simulation of the container. If it's
 significantly worse than that, then there's definitely a bug.

 In theory, the average frequencies should follow the poisson distribution
 with lambda = number of elements/number of buckets:

 {{{
                      0 63714059.8155024
                      1 47470673.7812626
                      2 17684204.0498788
                      3 4391919.58553563
                      4 818058.771181037
                      5 121900.256154096
                      6 15137.1280543625
                      7 1611.14846535692
                      8 150.049893684399
                      9 12.4217647383941
                     10 0.925493593394311
 }}}

 Which is pretty similar to the results my simulation got.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/9282#comment:4>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:14 UTC