Boost logo

Boost Users :

Subject: Re: [Boost-users] boost serialize - stack overflow for large problems
From: Stefan Strasser (strasser_at_[hidden])
Date: 2010-09-29 12:52:44


Zitat von "Jorg F. Unger" <j-unger_at_[hidden]>:

> Robert Ramey <ramey <at> rrsd.com> writes:
>
>>
>> Jorg F. Unger wrote:
>> > Robert Ramey <ramey <at> rrsd.com> writes:
>> >
>> >>
>> >> "Jörg F. Unger" wrote:
>> >>> I have implemented the serialization routine for a rather complex
>> >>> structure with many nested classes. The serialization seems to work
>> >>> for small examples, however, for bigger data sets, I get a stack
>> >>> overflow error. If I increase the stack size from 8MB to 20MB, the
>> >>> examples seems to work. However, I'd like to optimize the code the
>> >>> way that I do not need to increase the stack size (especially, since
>> >>> not all users can do that themselves)
>> >>
>> >> Are you referring to compile time or runtime?
>> >>
>> >>
>> >> If the source is that there is VERY deep nesting of data structures,
>> >> you'll just have to refactor the data. This isn't a hardship since
>> >> you would likely have other issues generated by this besides
>> >> serialization.
>> >>
>> >> Robert Ramey
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Boost-users mailing list
>> >> Boost-users <at> lists.boost.org
>> >> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>> >
>> > The problem occurs at runtime. I 've figured out that the problem is
>> > due to the nested data structure. I have kind of a graph, where every
>> > node stores a pointer to the neighboring node. This means, once I
>> > start to serialize the first node, all the neighboring nodes have to
>> > be serialized as well, which finally ends up with at many recursive
>> > function calls as I have nodes. This is probably similar to a linked
>> > list, where once the first entry is serialized, a pointer to the next
>> > element is serialized and so on, which finally ends up with as many
>> > recursive function calls as there are elements in the list. Is there a
>> > clever way to restructure the serialization (e.g. first serializing
>> > all the nodes, and in a second step, when the nodes have been created,
>> > serialize there connectivity without introducing two separate
>> > serialize routines?)
>>
>> Generally the library "tracks" those things serialized by pointers
>> so that there is not problem with cycles etc. If you can list all your
>> nodes ahead of time you can serialize them. Then when you
>> serialize your graph, there will be no recursion.
>>
>> Robert Ramey
>>
>>
>>
>> _______________________________________________
>> Boost-users mailing list
>> Boost-users <at> lists.boost.org
>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
> The problem is that the data structure is as follows
>
> class container
> {
> std::vector<object*> myObjects;
> };
>
> class object
> {
> std::vector<object*> neighborObjects;
> }
>
> The problem is, how to split the node information from the graph information.
> I've implemented a serialize routine for the container, which then calls the
> serialize routine for the object, which recursively calls the
> serialize routine
> of the neighboring objects. That means, the first function call to
> the serialize
> routine in the object class is only finished, if all the neighbors have been
> serialized. Similarly, the serialize routine of the first neighbor
> requires all
> other neighbors to be serialized. Although the serialize routine can
> track the
> pointers, this leads to a stack problem, since the number of
> recursive serialize
> routines to be called (which are all stored on the stack) is
> equivalent to the
> number of objects. Is there a clever why to serialize first all the
> objects with
> its data (these have been omitted in the class definition for
> simplicity) and
> then serialize the neighbors information without changing the data structure?

without changing the data structure, yes, but not without changing
your serialization code, since boost.serialization doesn`t make a
difference between "owning pointers" and references.
one solution is to make your class "container" a container that is
serialized by value, similar to how std::list is serialized:
std::list internally is a linked list, and if the internal nodes were
passed to boost.serialization, the same stack overflow problem would
occur.
but it is exposed to boost.serialization as a series of values
(through the STL container interface).
the topology of the internal graph of std::list is clear just by the
order of the values, in your case you`d have to save references to the
neighbouring nodes in a second pass.
so the basic idea is that only "class container" has a serialize()
function (which saves all information necessary to reconstruct the
entire graph), and your "class object" does not.

ideally, boost.serialization would always save a reference to tracked
objects and save referenced objects at the end of the archive without
recursion.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net