Boost logo

Boost :

Subject: Re: [boost] [unordered] unordered_set::erase() complexity bug?
From: Howard Hinnant (howard.hinnant_at_[hidden])
Date: 2010-03-15 16:11:01


On Nov 23, 2009, at 6:23 PM, Stefan Strasser wrote:

>
> I think there is something very wrong with boost's implementation of
> unordered_set/map.
>
> 1 unordered_set<int> s;
> 2 for(int c=0;c<nr;++c) s.insert(c);
> 3 for(int c=nr-1;c>=0;--c) s.erase(s.find(c));
>
> results:
>
> nr: time
> 20000: 0m0.244s
> 40000: 0m0.940s
> 60000: 0m5.584s
> 80000: 0m4.232s
> 100000: 0m34.814s
> 120000: 0m33.486s
>
>
> the function that's causing this is erase(), which is specified to be O(1)
> average, O(n) worst.
> I can't think of a reason why it should be O(n) in this case (perfect hash,
> erase doesn't rehash).
> but even if it was, the results should be at worst quadratic.
>
> load_factor() is below 1, bucket_count(x) is 1.
>
> if line 3 is changed to:
>
> 3 for(int c=0;c<nr;++c) s.erase(s.find(c));
>
> 20000: 0m0.008s
> 40000: 0m0.016s
> 60000: 0m0.028s
> 80000: 0m0.032s
> 100000: 0m0.040s
> 120000: 0m0.044s
>
> as you would expect.
>
> even if I do the very same thing as above using Boost.Intrusive (using a load
> factor of 1), I get expected results:
>
> struct element : intrusive::unordered_set_base_hook<>{
> ...
>
> for(int c=0;c<nr;++c) s.insert(*new element(c));
> for(int c=nr-1;c>=0;--c) s.erase(s.find(element(c)));
>
> 20000 0m0.008s
> 40000 0m0.016s
> 60000 0m0.024s
> 80000 0m0.024s
> 100000 0m0.032s
> 120000 0m0.040s
>
> but surprisingly enough, the TR1 implementation shipped with libstdc++ also
> has a problem like that:
>
> 20000: 0m0.368s
> 40000: 0m1.448s
> 60000: 0m1.668s
> 80000: 0m7.160s
> 100000: 0m7.808s
> 120000: 0m13.349s
>
> am I missing something or is there really the same bug in 2 major
> implementations of unordered_set?
> it effectively caused a deadlock in some of my code.

I'm reviving an old thread with new information:

The LWG looked at this issue last week:

http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#579

We were all over the map on it. We finally left it in Open status in hopes of learning more in the next few months.

One solution I've been thinking of is a variant of what is currently proposed in the issue:

    void quick_erase(const_iterator);

The solution I've thought about was actually invented to solve another problem. It is described here:

http://home.roadrunner.com/~hinnant/issue_review/lwg-closed.html#839

(see the comment dated 2009-09-19). Here is a brief synopsis:

    node_ptr remove(const_iterator p);
    pair<iterator, bool> insert(node_ptr&& nd);

where node_ptr is a nested typedef in the container to unique_ptr<__node, __node_deleter<__node_allocator>>. With this tool one can remove a node from a container, change its value, and insert it back into the container (or another which has an equal allocator). E.g.:

   auto p = m.remove(i); // no node deallocation
   p->first = new_key; // modify node external to the container
   m.insert(std::move(p)); // no node allocation

It solves the quadratic behavior by:

    for(int c=nr-1;c>=0;--c) s.remove(s.find(element(c)));

At each remove() the node_ptr is ignored/destructed, thus deallocating the node. There is no iterator increment.

This is a performance-tested solution with your test program:

20000: 0m0.012s // your time is 0m0.244s
40000: 0m0.017s // your time is 0m0.940s
60000: 0m0.024s // your time is 0m5.584s
80000: 0m0.028s // your time is 0m4.232s
100000: 0m0.037s // your time is 0m34.814s
120000: 0m0.041s // your time is 0m33.486s
140000: 0m0.052s
160000: 0m0.057s
180000: 0m0.064s
200000: 0m0.069s
220000: 0m0.074s
240000: 0m0.080s
260000: 0m0.086s
280000: 0m0.091s
300000: 0m0.097s

The bad news is that the LWG will consider this much too inventive at this late date. That being said, I believe it solves several real world performance problems (this one and others listed in LWG 839). I offer it here in case someone wants to implement it for boost containers.

In case you don't like the "too inventive" response from the C++ committee, I'm the one to yell at. I will happily forward your response to the committee.

-Howard


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk