Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2004-03-31 12:19:31


Anatoli Tubman wrote:

>My boss just gave me OK to discuss this, so here goes.

>A common drawback of various serialization libraries is the need to
>provide versioning code for each data structure to be serialized. This
>need not be the case for XML archives (and other formats that store
>NVPs). Because the program knows which data members it needs, and
>the names are stored in the archive, the program can leave any missing
>members in their current state (typically just after construction) and
>ignore any extra ones. After all, this is one of the ideas behind XML!
>More often than not this tactic is sufficient to handle versioning. When
>it's not, there's always the old way of explicit version checks.

The included XML archive presumes that saving and loading of class variables
are synchronized. This is consistent with all the the other archives so
that once the serialization for a class is specified it is guaranteed to
work with all archives with no code changes.

>It seems that this should be easy to implement for boost::serialization.

It might be make an archive which implements XML archive in a different way
so that data members would be optional and in an arbitrary sequence.
Clearly it would be less efficient than the current version and its not
clear whether it would be easy or hard to do. Its probably harder than it
would first appear. (as most things are).

>Just make two passes through the deserialization function. In the first
>pass, just collect information on data members to be deserialized into
>some kind of map keyed on tag. In the second pass, perform actual
>deserialization, looking at the map built in the first pass. Any tags
>in the archive but not in the map are ignored together with their
>contents.

This method would destroy the independence between class serialization
specification and archive format. If I were going to do such a thing (which
I'm not), I would build this logic into a new XML archive class. You're free
to do this. (if your boss will give you permission)

Of course, if such an archive can't made, there would be no motivation to
use the serialization library at all for this purpose.

Actually this question touches upon a central issue about what serialization
is all about and how it conflicts with what XML is all about.

Serialization is about making an arbitrary set of data structures moveable
from one context to another. The serialized data stream is a function of
the data structures to be moved. That is, program code => XML definition.
The main attraction of serialization is that the code which saves/loads data
streams is automatically generated and kept in sync with program data
structures. In the XML archive, XML data definition is driven by the
program code and data structures.

XML is about rendering data in a program independent manner. Program code
is synchronized to match the XML structure. That is, XML definition =>
program code. There are packages that, given and XML definition, will
generate C++ code to save/load data as an XML structure. In essence, these
packages are the mirror image of serialization.

So the question really is: Which is the independent variable? Program code
and data structures or the XML definition?. If it's the former
serialization to XML archives is a good choice. If it's the latter, a
different approach would probably be better.

As long as code and XML definition doesn't change, It doesn't make much
difference. When something has to change the question arises, do we change
he XML and adjust the program to match or vice versa?

This question also touches on another topic that comes up regularly. There
is often a desire to use the serialization library to generate a specific
data format. If this format is a meta-data specification such as XML it's
possible. If it's a more specific format - serialization is probably not
the right approach. Again the question comes down to who's boss. Is the
data stream format driving the program design or vice versa. With the
serialization library, data stream format is driven by the program data
structures. Attempts to mandate a too specific data stream format may be
possible but ultimately not worthwhile.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk