Boost logo

Boost :

Subject: Re: [boost] [unordered] unordered_set::erase() complexity bug?
From: Brad Higgins (bhiggins_at_[hidden])
Date: 2010-03-16 14:27:46


On Mar 15, 2010, at 4:11 PM, Howard Hinnant wrote:
> I'm reviving an old thread with new information:
>
> The LWG looked at this issue last week:
>
> http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#579
>
> We were all over the map on it. We finally left it in Open status
> in hopes of learning more in the next few months.
>
> One solution I've been thinking of is a variant of what is currently
> proposed in the issue:
>
> void quick_erase(const_iterator);

   I have also run into this issue in production code, but in the
multi_index implementation of the hash_table (which seems to adhere to
the unordered_map standard). In my case, I seem to have hit the
absolute worst-case, which results in the erase of about 600,000
elements (in a hash with 5 million buckets) to take hours. When
updated to return void, the same routine runs in seconds. Here is a
simplified example which closely replicates my use case, using the
unordered_map:

--------------------------
#include <iostream>
#include <boost/unordered_map.hpp>

int
main(int argc, char* argv[])
{
     typedef boost::unordered_map<int, int> hash_map;
     hash_map hm;
     hm.rehash(1000000);

     int x = 500000;
     std::cout << "Add " << x << " elements...";

     for (int i=0; i<x; i++) {
         hm[i] = i*2;
     }
     std::cout << "done!\n";

     std::cout << "\nremove the elements...";
     for (int i=x-1; i>=0; i--) {
         hash_map::iterator itr = hm.find(i);
         if (itr != hm.end()) {
             hm.erase(itr);
         }
     }
     std::cout << "done!\n";

     return 0;
}
----------------------------

   I don't understand why the erase(iterator) routine needs to return
anything other than void. In the case of a hash table, the next node
ought to be some random node, which makes it somewhat useless.
However, if a client really wants an iterator to the next valid
element, they can post-increment the argument iterator themselves.
This idiom would suffice for other containers, as well.

   With the worst-case erase(iterator) performance being
O(bucket_count), this effectively makes erase(iterator) unusable.
Seeing the performance of the above program, it's clear to me that the
implementation of erase(iterator) is broken, and it seems to me that
it is broken due to this return-type requirement. A differently-named
erase(iterator) routine would solve the problem, but I don't
understand why the standard would leave the erase() booby-trap to
ensnare others. Howard, what is the rationale for keeping the return
type of the erase routine as iterator?

Thanks,
Brad


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk