Boost logo

Boost :

Subject: Re: [boost] [unordered] unordered_set::erase() complexity bug?
From: Howard Hinnant (howard.hinnant_at_[hidden])
Date: 2010-03-17 09:54:42


On Mar 16, 2010, at 2:27 PM, Brad Higgins wrote:

>
> On Mar 15, 2010, at 4:11 PM, Howard Hinnant wrote:
>> I'm reviving an old thread with new information:
>>
>> The LWG looked at this issue last week:
>>
>> http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#579
>>
>> We were all over the map on it. We finally left it in Open status in hopes of learning more in the next few months.
>>
>> One solution I've been thinking of is a variant of what is currently proposed in the issue:
>>
>> void quick_erase(const_iterator);
>
> I have also run into this issue in production code, but in the multi_index implementation of the hash_table (which seems to adhere to the unordered_map standard). In my case, I seem to have hit the absolute worst-case, which results in the erase of about 600,000 elements (in a hash with 5 million buckets) to take hours. When updated to return void, the same routine runs in seconds. Here is a simplified example which closely replicates my use case, using the unordered_map:
>
> --------------------------
> #include <iostream>
> #include <boost/unordered_map.hpp>
>
> int
> main(int argc, char* argv[])
> {
> typedef boost::unordered_map<int, int> hash_map;
> hash_map hm;
> hm.rehash(1000000);
>
> int x = 500000;
> std::cout << "Add " << x << " elements...";
>
> for (int i=0; i<x; i++) {
> hm[i] = i*2;
> }
> std::cout << "done!\n";
>
> std::cout << "\nremove the elements...";
> for (int i=x-1; i>=0; i--) {
> hash_map::iterator itr = hm.find(i);
> if (itr != hm.end()) {
> hm.erase(itr);
> }
> }
> std::cout << "done!\n";
>
> return 0;
> }
> ----------------------------
>
> I don't understand why the erase(iterator) routine needs to return anything other than void. In the case of a hash table, the next node ought to be some random node, which makes it somewhat useless. However, if a client really wants an iterator to the next valid element, they can post-increment the argument iterator themselves. This idiom would suffice for other containers, as well.
>
> With the worst-case erase(iterator) performance being O(bucket_count), this effectively makes erase(iterator) unusable. Seeing the performance of the above program, it's clear to me that the implementation of erase(iterator) is broken, and it seems to me that it is broken due to this return-type requirement. A differently-named erase(iterator) routine would solve the problem, but I don't understand why the standard would leave the erase() booby-trap to ensnare others. Howard, what is the rationale for keeping the return type of the erase routine as iterator?

Thanks for sharing your real-world experience Brad. I've passed your note on to the C++ committee. To answer your question, one vendor has already shipped these containers and with a design that reportedly does not suffer from this quadratic behavior, and is very reluctant to have this signature change.

-Howard


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk