Boost logo

Boost Users :

From: Mark Westcott (mark-boost_at_[hidden])
Date: 2008-04-28 06:02:26


Hi Joaquin.
 
> As you have found out, loading involves some amount of reindexing,
> though the algorithm is designed to minimize this, let me summarize
> how serialization of multi_index_containers is performed:
>
> Saving a multi_index_container internally consists in:
>
> 1. Saving each of the elements, in the order they are traversed
> with index #0.
> 2. For each index other than #0, some "readjusting" info is saved if
> necessary. More on this later.
>
> Loading on the other hand reduces to:
>
> 1. Loading each element and inserting them with hinted insertion
> at the end of the container: for index #0, hinting the end() position
> reduces the nomber of key comparisons performed: for the rest
> of key based indices, hinting end() usually does not have any impact
> and insertion takes as many comparison ops as a regular insertion.
> 2. After step 1 the elements in index #0 are listed in the exact
> order they were originally saved, as it should be. As for the
> rest of the indices, they can be differences in traversal order
> for non key-based indices (sequenced and random_access)
> and for ordered non-unique indices (equivalent elements will be
> adjacent as ordered non-unique indices guarantee, but they might
> be shuffled with respect to the original situation). The "adjusting"
> info stored in saving step 2 is used here to restore the exact
> original traversing order for every index of the container.
>
> So, to try to minimize your loading time you'd need to first make
> a guess about where the most overhead lies: is it in step 1
> or step 2? Step 2 can only be significant if any of your indices
> *other than #0* is of type ordered_non_unique, sequenced
> or random_access. Assuming you indeed have a situation where
> step 2 causes some impact, you can:
>
> a) if affordable, move the problematic index to position #0, so that it
> won't get any adjustment later.
> b) replace Boost.MultiIndex built-in serialization with the a handmade
> method so as to avoid step 2 altogether. Please find attached a sketchy
> implementation example of this idea. Note that this method does
> not guarantee exact traversal order replication for indices other
> than #0.
>
> In any case, do not expect spectacular improvements after applying
> any of the approaches above, as more likely most of the overhead
> comes from step 1 anyway. A more radical solution, but also one
> with a very high impact in your application code, is to put the
> multi_index_container in a memory-mapped file, so that serialization
> reduces to dumping the memory segment. You can find an example
> of this technique at:
>
> http://www.boost.org/doc/libs/1_35_0/libs/multi_index/doc/examples.html#example12

Thanks for the explanation and ideas.

Based on the information you provided, I guessed that moving from hashed
indices (where the hash function is relatively expensive) to
sorted_non_unique (with a cheap comparison function) would improve
things; and indeed that change has got the load hugely quicker.

I then tried the 'poor man serialization' example that you attached. As
you suspected, the performance effect of this was pretty small and so I
guess it is indeed step 1 that is causing the issue (I have several
ordered_non_uniques). I do like the memory mapped file idea, but as you
say, the impact on the code is large; I think too large for the time
being, though it may be sometime I come back to if needs be in the
future.

I think, for the time being at least, the performance is ok now (using
ordered rather than hashed indices).
Is there any chance that future versions of MI might be able to store
the hashes/complete ordering whilst serializing, or is that Just Not
Possible (TM)?

Thanks again,
Mark


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net