Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2002-09-13 00:29:25

>Date: Thu, 12 Sep 2002 11:37:23 +0400
>From: Vladimir Prus <ghost_at_[hidden]>


As I understand it, your objections to the proposed serialization library can be summarized as follows

1. The "sequential registration" method :
        a) will lead to incorrect behavior because registration sequence cannnot be guarenteed
        b) because it requires explict registration of derived classes serialized through pointers
        it will not address certain real world situtatons
        c) It is possible to create better system based on typeid()

2. "Describe"
        a) is a facility that needs to be in the serialization library
        b) is implementable in a useful way in C++

Please feel free to correct me if I have mis-stated your concerns. I will address the above
in sequence.

I will argue that each of the above statements are are exactly wrong. I will state the opposite
and make the corresponding argument.

1 a ) The "sequential registration" method will not lead to incorrect behavior because registration sequence is guarenteed.

all objects are saved and loaded in exactly the same sequence. Each time an object
is save/loaded, a table of class types is checked to see if an object of the same class has
been used before. If not it is added to the table. The index in the table becomes the
archive-specific registration key. If corresponding save/load functions save/load class
members in the same sequence, the table created on MUST correspond to the
table used on save. The same holds true for explictly registered classes. That is
if they are registered in the same place on save and load, the table created on load
MUST still correspond to the table used by save. In some previous post, you suggested
that the order would not be guarenteed when performed durring the construction of static
objects. Note that registration is an archive specific concept rather than a global one
and is in no way related to the construction of static objects. I've tried to make this
explanation as clear as I can and I hope we can agree on this much.

1 b) In those few cases where it is required, explicit registration creates no significant burden
on the program and in no way inhibits portability of code or archives.

I went back and looked at your email
and retrieved the example that you used to illustrate your argument. I regret that I failed to
give it the attention that I now realize it deserved.

>class Path_estimation {}; // polymorphic

>class Estimations {
> vector<Path_estimation*> path_estimations;

>int main()
> Estimations e;
> for .....
> e.path_estimations.push_back(compute_estimation(.....)) ;
> oarchive a;
> a << e;

>Here, only 'compute_estimation' knows the exact type of Path_estimation
>derived class which it creates and returns. But it has no idea that "main"
>saves anything. So, how "main" can register classes derived from

Of course, to use my system the example would have to be recast as the following:

#include "derived_path_estimation.hpp"

class Path_estimation {
        void save(basic_oarchive &ar);
        void load(basci_iarchive &ar, int version);
}; // polymorphic

class Estimations {
    vector<Path_estimation*> path_estimations;

int main()
        Estimations e;
        for .....
                e.path_estimations.push_back(compute_estimation(.....)) ;
        oarchive a;
        a << e;

I argue that this is not a significant burden.

Note that the declaration/definition of the class Path_estimation make no reference to serialization
of anything derived from it. That is if Path_estimation can be in its own module or in a library, and
can contain its own serialization. It doesn't even have to be recompiled when a deriviation is created
and serialized. So the requirement to pre-register most derived types doesn't compromise
portability of any other modules.

I can hear you saying - But suppose the program that reads the archive doesn't have
#include "derived_path_estimation.hpp". Any program that reads an archive that contains
a Derrived_Path_estimation must by necessity have (at least) code code to construct
a new instance of Derrived_Path_estimation as well as load functions. This code is
found in "derived_path_estimation.hpp". so any program which reads the archive must
#include "derived_path_estimation.hpp" somewhere.

Requiring at the point where an archive is created imposes no significant extra burden.
In fact it provides a concise and useful summary of the types used by the archive.

1 c) A system based on typeid() would not be as good

The above example can be used to explore this question. The current typeid() creates
a non-portable string the uniquely corresponds to each class declaration. When
a new class is serialized. this string can be written to the archive. Upon loading
the string is read and and then ? . We need to create a new object. This presupposes
the we have some how added to the table a pointer to a class factory of some sort. Now
how does that get in there. Someone has to "register" the class factory that corresponds
to each string. This is not archive specific so it need only be done once. But what is added?

Well we can write code such as:

int main()
        Estimations e;
        for .....
                e.path_estimations.push_back(compute_estimation(.....)) ;
        iarchive a;
        a >> e;

without having to use any #includes for most derived classes. The program will build but
what happens when we run it. It will have to throw an exception when it encounters
an unknown type. So we have in fact gained nothing by being able to compile such a thing.

The question really is one of global static registration vs archive specific registration. Archive
specific registration is better.

Of course this is really a moot discussion as typeid() is not even close to portable.

2 a). A "Describe" does not belong in the serialization library

My personal view is that implementing describe adds more complexity than it saves.
I think the recent posts support this view. - But I'm not going to argue that point here.

I am going to argue that it doesn't have to be in this library. Jens Maurer original serialization
library proposal included "Describe". It was implemented in terms in reader and writer functions.
This is very analogous to the save/load functions used here. I spent considerable time
trying to implement this in a manner consistent with all the objectives for the library
(see documentation) and came to the view that I now hold. I also realized that it
really wasn't an issue. There is no obstacle to anyone implementing a describe facility
using the save/load functions in this library. This is what Jens did (he called his reader/writer).
It made sense then and it still does. "Describe" is really an attempt to address lack of reflection in C++.
Serialization is orthogonal to this.

As an aside - this is the way that the dispute about what file format to use was "resolved" it
eventually became apparent that the archive format could be separated from the code that
handled serialization itself. This allows anyone to use the serialization system with his
own preferred file format (Jens system did something similar). Separating this out in this way
leave open other interesting possibilities such as an XML format (Please don't ask about this)

2 b). Describe cannot be implemented in a useful way in C++
Well, I've stated my opinion - but I changed my mind about arguing the point. I've made my arguements
in previous posts. But the real reason is that given the above argument, it is really not relevent
to whether or not this library should be included in boost.

Well, those are my arguments. I was reluctant to spend the time making them as I really don't
believe I can convince you. However, I concluded that it might be of value to all you
"serialization lurkers" out there who are enjoying the entertainment.

I encourage anyone interested in subject to download the "" from the files
section and try it out on your own classes. It is a large package by boost standards
(7500 linesincluding html documentation) and addresses an incredible range of issues from runtime
efficiency to portability aspects of template meta-programming. The nature of posting is
such that discussions tend toward more and more narrow, arcane and tangential issues.
Only by reviewing the package and its documentation - and hopefully trying it out can
one gain an accurate picture of the facility and is usage. I would much like to hear from users
as well as developers.

Robert Ramey


Boost list run by bdawes at, gregod at, cpdaniel at, john at