|
Boost : |
Subject: Re: [boost] Please Help: Experienced very bad performance withMulti-index
From: Jarolin, Robert (Robert.Jarolin_at_[hidden])
Date: 2009-07-31 08:47:23
Joaquin,
Thank you for the help you are providing.
To answer your first question, I am still uncertain how to put the boost library into a release mode for the usage, though I no longer have those #define statements that caused the excessive debug overhead.
I happened to go a route similar to what you stated in 2 below. I implemented a static counter that is incremented then used to be put in place for the empty fields, so that they can be more uniquely hashed. This has changed the performance from taking around 93 seconds to remove 100 entries from a list of 20,000 to around 1-2 milliseconds. I only wish there was a way to tell the boost multiindex to not create a hash on the item if it is a certain defined default value (ex. Empty string, -1, ...)
As for choice 3, I do not want to modify the boost library myself, due to the overhead of having to maintain those changes for future upgrades.
Again, thanks for your help.
-----Original Message-----
From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]] On Behalf Of Joaquin M Lopez Munoz
Sent: Thursday, July 30, 2009 4:06 PM
To: boost_at_[hidden]
Subject: Re: [boost] Please Help: Experienced very bad performance withMulti-index
Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
>
> Actually, I found the issue. I will try to explain it as follows:
>
> Assume that I have data that needs to be hashed by 4 different keys
> (key1, key2, key3, key4). If there are situations where key3 and key4
> are non unique and I make them the same value for every entry, when I
> remove an entry from my table that has a lot of entries (ex. 10000),
> it may take seconds to for the removal to take place.
Hashed non-unique indices have this problem, when deleting an entry with a much repeated key performance can be linear.
On the other hand, several *seconds* to delete an entry on a container with tens of thousands of entries is *way* too long even taking this problem into account. Which compiler/ environment are you using? Did you manage to find how to activate release settings?
> Is there a way to improve on this? Something like be able to tell the
> multiindex to not always hash a key if the value is not set (example,
> if integer, value of -1 means not set)?
I see a number of possibilities (once you've ruled out the debug vs. release issue, which I strongly suggest you take a look at before going any further):
1. Replace the indices for key3 and key4 with ordered indices. This is a trade-off, since some oeprations will get slower (lookup), so you'll have to measure and decide.
2. If I understood you right, you don't always care about the key3 or key4 value of a given element. When this is the case, set them to a random value to avoid repetitions.
3. This is drastic, but a colleague recently modified the lib source code so as to turn singly-linked hashed indices into doubly-linked indices, which don't have the deletion problem described above. Details can be found at http://lists.boost.org/Archives/boost/2009/04/151184.php
(read the whole thread).
HTH,
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk