Boost logo

Boost Users :

From: JOAQUIN M. LOPEZ MUÑOZ (joaquin_at_[hidden])
Date: 2008-04-25 15:58:59


Hello Mark,

________________________________________
De: boost-users-bounces_at_[hidden] [boost-users-bounces_at_[hidden]] En nombre de Mark Westcott [mark-boost_at_[hidden]]
Enviado el: viernes, 25 de abril de 2008 18:29
Para: boost-users_at_[hidden]
Asunto: [Boost-users] [MultiIndex] re-indexing on load

> Hi,
> Loading a multi_index_container from an archive seems to be involving a
> re-index of that archive. I'm setting breakpoints on my key extract
> functions, and they're getting called when i load the container from the
> archive.
>
> I would have thought that the serialization would mimic the copy
> constructor in this respect, and not do any re-indexing?
>
> This is causing me significant performance problems at the moment. Is
> there any workaround?

As you have found out, loading involves some amount of reindexing,
though the algorithm is designed to minimize this, let me summarize
how serialization of multi_index_containers is performed:

Saving a multi_index_container internally consists in:

1. Saving each of the elements, in the order they are traversed
with index #0.
2. For each index other than #0, some "readjusting" info is saved if
necessary. More on this later.

Loading on the other hand reduces to:

1. Loading each element and inserting them with hinted insertion
at the end of the container: for index #0, hinting the end() position
reduces the nomber of key comparisons performed: for the rest
of key based indices, hinting end() usually does not have any impact
and insertion takes as many comparison ops as a regular insertion.
2. After step 1 the elements in index #0 are listed in the exact
order they were originally saved, as it should be. As for the
rest of the indices, they can be differences in traversal order
for non key-based indices (sequenced and random_access)
and for ordered non-unique indices (equivalent elements will be
adjacent as ordered non-unique indices guarantee, but they might
be shuffled with respect to the original situation). The "adjusting"
info stored in saving step 2 is used here to restore the exact
original traversing order for every index of the container.

So, to try to minimize your loading time you'd need to first make
a guess about where the most overhead lies: is it in step 1
or step 2? Step 2 can only be significant if any of your indices
*other than #0* is of type ordered_non_unique, sequenced
or random_access. Assuming you indeed have a situation where
step 2 causes some impact, you can:

a) if affordable, move the problematic index to position #0, so that it
won't get any adjustment later.
b) replace Boost.MultiIndex built-in serialization with the a handmade
method so as to avoid step 2 altogether. Please find attached a sketchy
implementation example of this idea. Note that this method does
not guarantee exact traversal order replication for indices other
than #0.

In any case, do not expect spectacular improvements after applying
any of the approaches above, as more likely most of the overhead
comes from step 1 anyway. A more radical solution, but also one
with a very high impact in your application code, is to put the
multi_index_container in a memory-mapped file, so that serialization
reduces to dumping the memory segment. You can find an example
of this technique at:

http://www.boost.org/doc/libs/1_35_0/libs/multi_index/doc/examples.html#example12

I'm sorry I can't be more helpful. Please report back whether any of
this can be of some use to you. Thanks for using Boost.MultiIndex,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo




Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net