On 7/8/07, Robert Ramey <ramey@rrsd.com> wrote:
Sean Cavanaugh wrote:
> Ok I've spent a good chunk of my day making a set of custom archive
> class, based on the portable_binary_iarchive and
> portable_binary_oarchive.
>
>
> I have a non-virtual class hierarchy named Asset, in memory this is a
> cyclic graph, but my requirements are that this structure cannot be
> serialized all at once, it has to be broken up at boundaries.   So
> the archiver serializes the first Asset* (which is always the root
> object passed into operator&).  This object serializes as normally as
> per the boost serialization code, except except that when subsequent
> Asset*'s are encountered
>
> ?which point to a previously serialized object? or any subsequent
> Asset *?
>
> I send the pointer into an AssetManager class and translate the
> Asset* into a ResourceID, and serialize the ResourceID instead of the
> object.  This is done by directly calling operator& inside
> save_override.
>
*******
Hmmm this sounds to me exactly equivalent to what the serialization
system does by default for tracked objects.  Objects serialized
through pointers are tracked by default.  Your "ResourceID" seems a
re-implemenation of the "object id" used by the serialization libary
to track

This is true, but the serialization libraries view of the id's only have archive scope, which means the same id's are used for totally different things when you use multiple archives.  My ResourceID's have to be globally unique, in the sense they are GUIDs or relative pathnames, which can then be mapped to a fully qualified pathname and can be loaded on demand.  As far as the archive class is concerned its a user defined translation (proxy-to-real and real-to-proxy) that exists for certain types.

So in this case its conceptually an archive of archives.  With the outermost archive being the filesystem, and the innermost being a single object (a file).  The file only contains one object, and all links to other objects are handles (a filename).  So the innermost code when it is loading from the filesystem, it knows that it wants a pointer to another object, but it only has its name.  So it has to ask the filesystem class to translate the name into an object, which it can do because it can lookup if its already loaded and return that, or literally open another file based archive and read it in on demand, and return that.  The archive class doesn't care about the specifics really, it just needs the means to achieve the result.

So the archive's base class code should be doing this in a conceptual way:

for_each type X, if has_user_defined_translation<X*> implement :
on_save -> translate type X* to type ProxyX via a function :  ProxyX RealToProxy(X* x), serialize the ProxyX
on_load -> read ProxyX translate to X*, via a function :  X* ProxyToReal(ProxyX& px), return the X*

Except that this on_save and on_load example needs to have yet another template argument, on whether the first occurence of X should be literally saved or not, since we might be saving a hierarchy of Bar's that always save handles to Foo's at all times.
 

>
> ...
>
> I also have to hard-code the full list of derived Asset types and
> manually provide specializations for all of them in save_override and
> load_override.
>
****
Well, since they're different - I would expect each of them to have a
different serialize function.  If all the serialize functions are the
same, it would seem that something should be moved from the derived
class to the base class.
>
> If I use the base class, I end up slicing my class down to its base,
> and cannot serialize it.
>
****
serializing through a base class pointer solves this problem as well.


In my currently kind-of-working hacked up version of the archive classes the methods load_override are nearly identical when specializing for AssetModel, AssetTexture, etc.  The behavior is constant (proxy-to-real translation or vice-versa) but the type is not.  I can slice them here safely on saving, but not on loading (since the C++ code  in the serialize method 'ar & foo', is expecting a more-derived type to be filled in).


> I can't make serialize virtual, since the intrusive serialize methods
> are templates, but it certainly would solve the problem if it were
> possible.

***
I suspect that if the other changes suggested were implemented this
would disappear as a problem.  I don't think I've tried it, but
rather than including boiler plate code in each derived class, one
might try adding a "mix-in" base class which contains the serialize
function.


I'll play around with alternatives, I basically spent the day learning the archive templates by watching the code flow.

>
> In addition the bodies of all of my overrides are completely
> identical except for the classname (AssetModel, AssetTexture, etc).
>
> Which means I'll be wrapping the bodies of a generic save_override
> and load_override in a macro, and have to manually add all Asset
> derived classes to a list of classes inside my archive class.  Which
> means that my archiver cannot be generic, even though I have managed
> to make it a template in the sense that the passed in asset manager
> and base asset types are template parameters.
>
***
looks to me that you've gotten off on the wrong foot and stuck with it.


That isn't possible with learning new code :)  This is what the path of least resistance yielded, with the docs and examples provided by boost.  Basically this is as far as I got without having to directly hack on the existing boost code, and having to deal with a crash course on the code flow and internal data structure of everything.

 
>
*** me this is exactly the wrong approach.  Now you've coupled your classes
to be serialized to a specific archive.  This means you won't be able to use
any other archive type and you've defeated one of the main benefits to the
serialization library.  Perhaps it wasn't a suitable library for your task.

The library can do what I want, because I have the source code :)   Anyway the classes aren't coupled to the archive with what I've come up with so far, its the other way around.  I definitely do not want my classes to understand archiving beyond a very basic sense of having to call operator& on most of their fields, since I plan on having several wildly different archive classes calling the serialize methods on my classes.
 

So I have a working implementation, how do I make it better?

****
Maybe you might try doing it in the simplest way.

I can't see how what you want to do is different than what everyone else
uses the library for.  And I can't see how what you want to do is different
than what the examples do.


I could get the behavior I want by altering the serialize methods, but then it would be ill formed for other archives.  I could also template specialize the serialize methods for the archive in question, but then I would have to write more than one.  Its the archives job to interpret what to do when you call   ar & foo.

I anticipate having more data than I can load, so I need to load and save at an object level.  But I still need to write the serialize methods as if they all could fit in memory, since I plan on having other archive classes that do operate on the graph of what is loaded at runtime ( i.e. to compute garbage collection).

In essence the archive classes need to be made to be programmable for these behaviors to work:

Graph of Foo:
Saving: save the first Foo*, translate all further Foo*'s into a user defined handle with a user defined function and save that instead. 
Loading: load the first Foo*, assume all further Foo*'s are saved with a user defined handle, translate them back into live objects on demand, and also use the existing caching scheme to prevent having to translate the same user-defined handle over and over.

Graph of Bar:
Saving: save all Bar's, but save all occurences of Foo*'s as handles
Loading: save all Bar's, but load all occurences of Foo*'s from handles

Garbage Collecting Foo:
'Saving' : archive an array of live root level Foo objects, build a list of all Foo* that are reachable through serialization.   Compare this list to the full list of Foo objects, and unload the ones that are missing.