|
Boost : |
Subject: [boost] [unordered] equal_range is O(bucket_count()), not O(size())
From: Brad Higgins (bhiggins_at_[hidden])
Date: 2011-04-07 11:52:59
TR1 specifies that unordered_multimap::equal_range(k) worst case complexity is O(size()). However, in practice, I find that the worst case performance of boost unordred_multimap's equal_range is O(bucket_count()). Is this known?
I believe the issue is with the second iterator returned by equal_range(). Specifically, if there are sparse elements in the hash table, the implementation may have to traverse many buckets to find the second iterator.
I have a test that demonstrates the issue. The test makes 2 runs over an unordered_multimap, each using the same number of elements. In the second run, it calls rehash() in order to grow the bucket count by a large amount. Notice that the first run is quite fast, while the second run takes more than 10 seconds. Here are the results from my test:
---- $ ./main rehash(0) insert 1000 elements built hash, with 1543 buckets and 1000 elements. Now, get some ranges time to obtain 1000 elements: 00:00:00.001366 rehash(5000000) insert 1000 elements built hash, with 6291469 buckets and 1000 elements. Now, get some ranges time to obtain 1000 elements: 00:00:10.818002 ---- Here is my code for the test: ---- #include <iostream> #include <boost/unordered_map.hpp> #include <boost/date_time/posix_time/posix_time.hpp> typedef boost::unordered_multimap<size_t, size_t> hash_int_type; typedef hash_int_type::iterator hash_int_itr; typedef std::pair<size_t, size_t> hash_insert_type; void run_test(const size_t data_size, const size_t rehash_size) { hash_int_type hash; std::cout << "rehash(" << rehash_size << ")\n"; hash.rehash(rehash_size); std::cout << "insert " << data_size << " elements\n"; for (size_t i=0; i<data_size; i++) { hash_insert_type ins(i,i); hash.insert(ins); } std::cout << "built hash, with " << hash.bucket_count() << " buckets and " << hash.size() << " elements. Now, get some ranges\n"; boost::posix_time::ptime start(boost::posix_time::microsec_clock::local_time()); for (size_t i=data_size-1; i!=0; i--) { typedef std::pair<hash_int_itr, hash_int_itr> ret_pair; ret_pair ret = hash.equal_range(i); if (ret.first != ret.second) { hash.quick_erase(ret.first); } } boost::posix_time::ptime stop(boost::posix_time::microsec_clock::local_time()); boost::posix_time::time_duration diff_time(stop-start); std::cout << "time to obtain " << data_size << " elements: " << diff_time << "\n"; } int main(int argc, char* argv[]) { run_test(1000,0); run_test(1000,5000000); return 0; } ----- Thanks, Brad
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk