Subject: Re: [boost] [unordered] unordered_set::erase() complexity bug?
From: Christopher Jefferson (chris_at_[hidden])
Date: 2009-11-23 18:14:55
On 23 Nov 2009, at 23:23, Stefan Strasser wrote:
> I think there is something very wrong with boost's implementation of
> 1 unordered_set<int> s;
> 2 for(int c=0;c<nr;++c) s.insert(c);
> 3 for(int c=nr-1;c>=0;--c) s.erase(s.find(c));
> nr: time
> 20000: 0m0.244s
> 40000: 0m0.940s
> 60000: 0m5.584s
> 80000: 0m4.232s
> 100000: 0m34.814s
> 120000: 0m33.486s
> the function that's causing this is erase(), which is specified to be O(1)
> average, O(n) worst.
> I can't think of a reason why it should be O(n) in this case (perfect hash,
> erase doesn't rehash).
> but even if it was, the results should be at worst quadratic.
I suspect the problem you are seeing is that erase returns an iterator to the next member of the unordered_set, which is a possibly O(n) search if the set is sparsely populated, and in particular if, as you get here, you erase the elements in a particularly bad order. I'm not sure why this doesn't occur with the intrusive containers.
One way to significantly reduce this problem is to use a different hash function, which shuffles the values more.
I don't know if there is an algorithmic improvement which can fix this. there certainly isn't an obvious one.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk