Boost logo

Boost :

Subject: [boost] [multi_index] hashed_index erase(iterator) performance issue
From: Brad Higgins (bhiggins_at_[hidden])
Date: 2010-03-16 10:46:35


Hello,
   I have encountered a performance issue with multi_index's
hashed_index erase(iterator) routine. The erase(iterator) call
returns an iterator to the next element in the data structure. When
using this routine in a sparsely populated, large sized hash, it takes
a very long time to complete. For example, in my program, I have
rehashed the hash table to 5 million buckets, and added 600,000
elements. Adding the elements takes a matter of seconds. Then, I
delete the elements. Deleting the elements takes hours.
   I have profiled the code, and it looks to me like the performance
hit is rooted in the increment of the iterator by the erase(iterator)
call. Since the hash is sparsely populated, the increment must
advance over many empty buckets for each erase(iterator) call, in
order to advance to the next valid element.
   My question is, why does the erase(iterator) function return an
iterator at all? (In comparison, sgi's stl::hash_map's erase()
returns void.) The hash is unordered, so the returned iterator is a
random value, which I think decreases its usefulness (I expect most
clients will discard the returned iterator.) Additionally, the same
behavior can be achieved by clients who desire this behavior if they
post-increment their iterator argument themselves. That is:

// After this line, cur_itr's initial element will be erased,
// AND cur_itr will be updated to the next element in the hash table
hash_table.erase(cur_itr++);

   Is there some way to remove elements from the hash without taking
this iterator increment hit? Would it be possible to add an erase
routine (of some other name) which accepts an iterator argument, but
does not automatically increment the argument?

Thanks,
Brad


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk