Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2006-07-04 11:16:23


SeskaPeel wrote:
> Hi Robert,
>
>>> 1/ First, we are using multiple files, which can be created at
>>> different moments. The main reason is that we want to be able to
>>> share resources contained in one file for multiple projects. An
>>> alarm rung in my head
>>> telling me that serialize won't be able to handle this case,
>>
>> why not?
>
> Because I saw nowhere in the docs how serialize would restore
> pointers when loading from multiple files. Suppose I load file1 that
> contains a resource named "r1". After file1 is fully loaded, and say
> 5 minutes later, I load file2. Inside that file, there is a resource
> "r2" that needs a link to "r1", how will this case be handled?

Its not a case I ever considered so off hand I'm not sure how to answer.
But I can speculate a little. Take two cases

class a {
    ... x *ptr;
    template<class Archive>
    void serialize(Archive, const unsigned int version){
        ar >> ptr;
    }
};

class b {
    x & ref;
    template<class Archive>
    void serialize(Archive, const unsigned int version){
        ar >> ref;
    }
};

Both cases are quite similar - one object refers to some external
object. In the first case a new object of type x is created each
time an instance of a is loaded. In the second case the object
is presumed to exist and the data is just loaded into it. SO
the serialization library sort of presumes that one is using
pointers and references in this common way. This is just
a decision I made. Most people don't see this as it turns
out this is the way the people use pointers vs references
so things "just work" like one expects. By making the
serialization "fancier" one could replace the behavior
for pointers to work like the the current behavior for
references which sounds like something you want to do
to address your situation. Or you could just change
the pointers to references in your own code and get
everything to "just work" Its up to you.

> I suppose I'll have to manually iterate over the freshly loaded
> resources
> and check if they need to be "post-loading associated". As I can't
> know if these resources need or not this last step, I'll have to
> check each time I load a file, and thus, I could handle the internal
> linking as well in this step, though it will be easier (and more
> efficient) if it's done by the loading lib.

Of course you could do it that way as well.

> So, is there something I misunderstood, or some feature I overlooked?

Hmmm - I suspect that since serialization works so painlessly ALMOST
all the time, one doesn't have much occasion to look under the hood
until things get more complex and specialized. I think that at that
point its valuable to take another look at what one is trying to serialize
and ask himself - hmmmm - why is this hard? going through
and considering each variable as to wheter it should be a pointer,
reference, pointer to a "const" object, reference to a "const" object,
a "const" pointer to a non-const object, or just a normal member
variable (const or not) etc. will often lead to a re-characterization
of some variables and then things will again "just work". I also
believe that it will improve the rest of one's code as it forces him
to think about why each variable is used the way it is. Also,
the excercise will end up giving the compiler more information
about how each variable is used and can permit the compiler
to better optimize generated code (at least theoretically). FWIW
I believe that "const" is generally under-appreciated and under-used.

>>> 2/ And secondly, we want to be able to load files progressively, say
>>> 100KB
>>> by 100KB. Once the file is completely loaded, the pointer
>>> restoration can happen. Does serialize support such feature or plan
>>> to? If this is not a
>>> huge work, is there a way to provide help to get this feature
>>> quickly?

If you're in a hurry - consider the prescription above. It will require
changing your own classes - for the better in my view - but you'll
be done with it.

> Not necessarily from a thread, but the aim is to suspend the loading
> of a file, and then resume it some time later. What would be even
> better would be to specify how long or how many bytes should be read
> before the loading function suspends and returns.
> I read about your custom archives (some time ago I have to admit),
> and it didn't seem an evidence to me that I could implement this
> feature. Could you provide some more hints?

The deserialization process uses the stack to store its state. Hence,
the only simple and practical way to do this in a practial way is to invoke
loading on a separate fiber, coroutine or thread.

If you want to hack your own code some you could do something like
having a top level array of serializable objects and serialize them each
independently so you could to the process piece by piece. The
serialization library does permit the same streambuf to be used
and passed around so that serialization can be "embedded" inside
of some other streambuf operations - etc. But by far the easiest
way would be to use the co-routine approach above.

>>> Today, we have a manual phase of association that is called after
>>> all files are loaded. Each time we load a new file, we "post-load"
>>> all the resources contained in it, and pointer restoration is
>>> achieved this way.

Using my suggestion above - this wouldn't be necessary. The commonly
referenced objects are "pre-created and registered" by the top level
object constructors and everything is keep in sync automatically throughout
the serialization process.

>> In the serialization library, pointers are restored "on the fly"
>> (depth first).
>
> Yes, that's why I'm considering porting to it :) What does "depth
> first" means?

a contains pointer to b which contains a pointer to c, etc so the
sequence of operations is a, b, c, .... . That is, a is complete when
only when all its components have been loaded.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk