Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2002-09-13 05:55:48


Robert,

First of all, thanks for bringing this discussion to a clear set of
issues. This will help a lot.

> As I understand it, your objections to the proposed serialization library can be summarized as follows
>
> 1. The "sequential registration" method :
> a) will lead to incorrect behavior because registration sequence cannnot be guarenteed
> b) because it requires explict registration of derived classes serialized through pointers
        
Not exactly. Explicit registration of derived classes is needed. I don't
like registration of per-archive basis.

> it will not address certain real world situtatons
> c) It is possible to create better system based on typeid()

It is possible to create a system which will always work on a given
platform using typeid, and a better system using better typeid().

> 2. "Describe"
> a) is a facility that needs to be in the serialization library
> b) is implementable in a useful way in C++

> //////////////////////////////////////
> 1 a ) The "sequential registration" method will not lead to incorrect behavior because registration sequence is guarenteed.
>
> all objects are saved and loaded in exactly the same sequence. Each time an object
> is save/loaded, a table of class types is checked to see if an object of the same class has
> been used before. If not it is added to the table. The index in the table becomes the
> archive-specific registration key. If corresponding save/load functions save/load class
> members in the same sequence, the table created on MUST correspond to the
> table used on save. The same holds true for explictly registered classes. That is
> if they are registered in the same place on save and load, the table created on load
> MUST still correspond to the table used by save. In some previous post, you suggested
> that the order would not be guarenteed when performed durring the construction of static
> objects. Note that registration is an archive specific concept rather than a global one
> and is in no way related to the construction of static objects. I've tried to make this
> explanation as clear as I can and I hope we can agree on this much.

I've looked through your code quite carefully in the past and the scheme
used there. It works precisely as you describe, provided you register
all polymorphic types explicitly before loading, so I move to the next
point.

> //////////////////////////////////////
> 1 b) In those few cases where it is required, explicit registration creates no significant burden
> on the program and in no way inhibits portability of code or archives.
>
> I went back and looked at your email http://aspn.activestate.com/ASPN/Mail/Message/1265629
> and retrieved the example that you used to illustrate your argument. I regret that I failed to
> give it the attention that I now realize it deserved.
>
>
>>class Path_estimation {}; // polymorphic
>
>
>>class Estimations {
>> vector<Path_estimation*> path_estimations;
>>};
>
>
>>int main()
>>{
>> Estimations e;
>> for .....
>> e.path_estimations.push_back(compute_estimation(.....)) ;
>> oarchive a;
>> a << e;
>>}
>
>
>>Here, only 'compute_estimation' knows the exact type of Path_estimation
>>derived class which it creates and returns. But it has no idea that "main"
>>saves anything. So, how "main" can register classes derived from
>>Path_estimation?
>
>
> Of course, to use my system the example would have to be recast as the following:
>
> #include "derived_path_estimation.hpp"
>
> class Path_estimation {
> void save(basic_oarchive &ar);
> void load(basci_iarchive &ar, int version);
> }; // polymorphic
>
> class Estimations {
> vector<Path_estimation*> path_estimations;
> };
>
> int main()
> {
> Estimations e;
> for .....
> e.path_estimations.push_back(compute_estimation(.....)) ;
> oarchive a;
> a.register_type<Derrived_Path_estimation>();
> a << e;
> }
>
> I argue that this is not a significant burden.
>
> Note that the declaration/definition of the class Path_estimation make no reference to serialization
> of anything derived from it. That is if Path_estimation can be in its own module or in a library, and
> can contain its own serialization. It doesn't even have to be recompiled when a deriviation is created
> and serialized. So the requirement to pre-register most derived types doesn't compromise
> portability of any other modules.

I think I understand you (in the last paragraph), and agree.

> I can hear you saying - But suppose the program that reads the archive doesn't have
> #include "derived_path_estimation.hpp". Any program that reads an archive that contains
> a Derrived_Path_estimation must by necessity have (at least) code code to construct
> a new instance of Derrived_Path_estimation as well as load functions. This code is
> found in "derived_path_estimation.hpp". so any program which reads the archive must
> #include "derived_path_estimation.hpp" somewhere.

Disagree. There is code in "derived_path_estimation.cpp" which loads the
data, and it should be linked with the application. However, the module
which loads/save Estimation (say main.cpp) need not include
"derived_path_estimation.hpp". Moreover, imagine that
Derived_Path_estimation is defined in a plug-in (i.e. shared library),
which did not even existed when main.cpp was first written.

> /////////////////////////////////////
> 1 c) A system based on typeid() would not be as good
>
> The above example can be used to explore this question. The current typeid() creates
> a non-portable string the uniquely corresponds to each class declaration. When
> a new class is serialized. this string can be written to the archive. Upon loading
> the string is read and and then ? . We need to create a new object. This presupposes
> the we have some how added to the table a pointer to a class factory of some sort. Now
> how does that get in there. Someone has to "register" the class factory that corresponds
> to each string. This is not archive specific so it need only be done once. But what is added?
>
> Well we can write code such as:
>
> int main()
> {
> Estimations e;
> for .....
> e.path_estimations.push_back(compute_estimation(.....)) ;
> iarchive a;
> a >> e;
> }
>
> without having to use any #includes for most derived classes. The program will build but
> what happens when we run it. It will have to throw an exception when it encounters
> an unknown type. So we have in fact gained nothing by being able to compile such a thing.

We've gained a greater degree of isolation between modules. I can load
and save types which are unknown to main.cpp, or declared in plug-ins.
Yes, it's possible that you've saved archive with plug-ins that are not
available when loading -- hmm... it that case you have no choice ---
nobody can't read that data. Your approach prevents plugins.
>
> The question really is one of global static registration vs archive specific registration. Archive
> specific registration is better.
>
> Of course this is really a moot discussion as typeid() is not even close to portable.

But portable alternative *is* possible.

> ///////////////////////////////////////////////
> 2 a). A "Describe" does not belong in the serialization library
>
> My personal view is that implementing describe adds more complexity than it saves.
> I think the recent posts support this view. - But I'm not going to argue that point here.
>
> I am going to argue that it doesn't have to be in this library. Jens Maurer original serialization
> library proposal included "Describe". It was implemented in terms in reader and writer functions.
> This is very analogous to the save/load functions used here. I spent considerable time
> trying to implement this in a manner consistent with all the objectives for the library
> (see documentation) and came to the view that I now hold. I also realized that it
> really wasn't an issue. There is no obstacle to anyone implementing a describe facility
> using the save/load functions in this library. This is what Jens did (he called his reader/writer).
> It made sense then and it still does. "Describe" is really an attempt to address lack of reflection in C++.
> Serialization is orthogonal to this.

That's a strong point. The questions are
1. Will you mind if anybody writes a 'describe' based wrapper and
submits it.
2. Will you be willing to tweak your library if 'describe'
implementation will not be 100% orthogonal. I think there might be some
problems with making both describe and save/load work.

If yes, then I withdraw the 'describe' issue.

> As an aside - this is the way that the dispute about what file format to use was "resolved" it
> eventually became apparent that the archive format could be separated from the code that
> handled serialization itself. This allows anyone to use the serialization system with his
> own preferred file format (Jens system did something similar). Separating this out in this way
> leave open other interesting possibilities such as an XML format (Please don't ask about this)

XML format would require member names, and this is another issue which
will require support from your library.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk