
Boost : 
Subject: Re: [boost] [unordered] unordered_set::erase() complexity bug?
From: Thorsten Ottosen (thorsten.ottosen_at_[hidden])
Date: 20091123 18:40:57
Daniel James skrev:
> 2009/11/23 Christopher Jefferson <chris_at_[hidden]>:
>> On 23 Nov 2009, at 23:23, Stefan Strasser wrote:
>>
>>> I can't think of a reason why it should be O(n) in this case (perfect hash,
>>> erase doesn't rehash).
>>> but even if it was, the results should be at worst quadratic.
>> I suspect the problem you are seeing is that erase returns an iterator to the next member of the unordered_set, which is a possibly O(n) search if the set is sparsely populated, and in particular if, as you get here, you erase the elements in a particularly bad order. I'm not sure why this doesn't occur with the intrusive containers.
>
> That's right, I profiled it and it spends all the time searching for
> the next bucket.
>
>> One way to significantly reduce this problem is to use a different hash function, which shuffles the values more.
>
> I could apply another hash function to the hash value to make this
> less likely, or perhaps store a pointer to the last nonempty bucket.
> I'll think about it.
>
> A workaround is to erase by key or iterator range if possible.
Our very own JoaquÃn anticipated this problem very early:
http://www.openstd.org/jtc1/sc22/wg21/docs/papers/2006/n2023.pdf
"In pathological cases, h(n) can be as bad as Â½ n(n + 1): this happens
when the elements
are erased in reverse order as they are arranged in the hash table."
It would be interesting to know
1) how intrusive containers avoid this, and
2) also if this is a real problem in real code.
At any rate, Joaquin shows that the *lower bound* complexity of erase is
actually lg n, and not O(1).
Does anyone know how the WG21 LWG responded to his paper?
Thorsten
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk