Boost logo

Boost Users :

From: Julien Pervillé (julien.perville_at_[hidden])
Date: 2006-05-06 15:15:53


Hello boost-users.

I am writing to report an annoying serialization bug encountered while
developing a project with Boost.multi_index, which is a wonderfully
useful data structure. Thank you Joaquín!

First a fast overview of my project:

I am developing a virtual file system for my internship. In this file
system, I have to store user paths which I call "entries". "entries"
are associated with another "internal entry" which is the path to the
physical target inside the file system. "entries" are also contained in
a concept of "directory". To ease up things I store entries in a
boost::multi_index with at least 3 indexes:
- an unique index to look up user entries.
- a non-unique index to do reverse look up (by internal targets).
- a non-unique index to do directory listings.
The latter is the one that interests us. The "entry" class splits the
entry into the "dirname" and "file" parts. A getter function returns a
const-reference to "dirname", let's call it "entry::dirname()".

To do directory listings efficiently, I define an ordered-non-unique
index pointinh to the const-reference returned by the entry.dirname()
thanks to a wrapper "mem_fun" named getdirname().

Now comes the bug.

To test my file system, I import some existing directories into my file
system. When I unload it, I expect that the internal boost::multi_index
is serialized and restored properly when I reload the file system
program. When working on huge batches of small files, like untaring a
linux kernel or copying the thousands of files from /usr/share/doc, I
noticed that the import works but reloading the serialized mappings
fails with a (for now unexplained but reproducable)
boost::serialization::other_exception.

After hunting through the headers, I located the triggered exception to
line 1080 of boost/multi_index/ordered_index.hpp (line number from
boost 1.33)

Here is the context from the "rearranger" method where the exception is
thrown:

> 1080 if(position!=x){
> 1081 /* check the rearrangement is consistent */
> 1082 if(!in_place(x->value,position,Category())){
> 1083 throw_exception(
> 1084 archive::archive_exception(
> 1085 archive::archive_exception::other_exception));
> 1086 }

Since my multi_index works perfectly until I save, I checked that my
data is properly serialized. Since I saw no obvious errors, I modified
the header to swallow the exception and see what happens.

> 1080 if(position!=x){
> 1081 /* check the rearrangement is consistent */
> 1082 if(!in_place(x->value,position,Category())){
> 1083 return; throw_exception(
> 1084 archive::archive_exception(
> 1085 archive::archive_exception::other_exception));
> 1086 }

After this small hack, everything "seems" to work fine but if I do a
"find" on my file system some entries appear "missing". They are
actually not missing but they were attributed to the wrong directory!
And in the wrong parent directory, a lookup says "file not found" when
find tries to access it. However, I can still access the entries by
naming them explicitely (using the main index, not the dirname one). I
investigated and found out that the "dirname" ordered_index (const
mem_fun) is not restored properly, even through it is saved correctly.

The "entry" and the "target" ordered indexes (member) are restored as
intended. However look what happens to the mem_fun "dirname" index: its
values are restored in positions that are not a valid binary tree.
Actually it is "almost" a binary tree, just some values are wrongly
restored. Like a dozen out of 1000 entries, but always the same ones.
As a consequence even if my entry::operator<() predicate is proved
correct, lookups through the "dirname" index give byzantine results.

I tracked down the problem and found a small set of "entries" that are
not restored properly and wrote a small example file. Please have a
look at the attached source "seri-bug.cpp".

The example dumps the contents of a 5 entries multi_index, first by
entry and then by "dirname". The displaying is done before serializing
to disk and after reloading the mappings. The corruption can then be
see by checking the different position inside the dirname index for the
"/doc/binutils-doc" entry before and after.

Sample run:

Original structure dump by dirname, pre serialization.

> entry `/' with dirname `'.
> entry `/doc' with dirname `'.
> entry `/doc/binutils-doc' with dirname `/doc'.
> entry `/doc/binutils-doc/ld' with dirname `/doc/binutils-doc'.
> entry `/doc/binutils-doc/gas' with dirname `/doc/binutils-doc'.

Post serialized loading dump. Notice that the "/doc/binutils-doc" entry
was not restored at the correct position. Since the dirnames are not
properly sorted in memory, doing a find() by dirname obtains byzantine
results (doing a find of "/doc" dirname returns the
`/doc/binutils-doc/ld' result instead of the expected
`/doc/binutils-doc' entry).

> entry `/' with dirname `'.
> entry `/doc' with dirname `'.
> entry `/doc/binutils-doc/ld' with dirname `/doc/binutils-doc'.
> entry `/doc/binutils-doc' with dirname `/doc'.
> entry `/doc/binutils-doc/gas' with dirname `/doc/binutils-doc'.

Even if this sample triggers the restore anomaly, to see a
boost::archive::other_exception one needs more data inside the
multi_index, like a few thousand entries.

I would really love to see a this bug fixed since it stalls my
projects. My c++ template metaprogramming skills aren't very sharp but
I think that this bug is a good opportunity to learn some about this
subject.

I am looking forward to test patches and offer my help testing, so that
this little quirk gets solved.

Sincerely,

Julien Perville
Intern, Global Core, Amadeus IT Group.




Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net