Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2002-12-17 12:58:17


From: Matthias Troyer <troyer_at_[hidden]>

>1) "Definition of serialization": ...
agree

>2) "Serialization engine".
agree - except whether arrays should be primitive types. We differ on this
but I believe that this is actually a small point that would ultimately be
resolved by running some tests on a given implementation so it way
premature to try to agree on it now.

>3) "Archive preamble": ..
agree - I believe that the archive preamble and maybe a "post amble?"
are useful and almost necessary for robust systems. However, I
think the library should encourage rather than require this. In any
case this is local to the "Serialization Engine" ("archive" in the
submitted system.

>4) "Serialization of UDT (user defined types)": is the next level up.
agree

>5) "Versioning": The next level for me is versioning support. We have
>discussed versioning support on a per-archive and a per-class level. I
>would like to see both variants supported. Per-class versioning is more
>flexible, but has two disadvantages: i) it introduces overhead and ii)
>it writes extra information into the stream, which might make the
>output incompatible with some applications.
>Regarding i: we have to write the version number for each UDT
>encountered, but want to write it only once per UDT. We thus have to
>keep track of which UDTs have been serialized so far, and whenever a
>new UDT is encountered, its version number must be written to the
>archive. This introduces overhead, especially if many small objects
>have to be serialized.

overhead for version number is 1 or 2 bytes per class definition.
tracking the classes so far serialized is not expensive. What
is expensive is tracking all the objects serialized so that pointers
can be correctly handled.

>I see a two-pronged approach as the best solution:
>a) both per-archive versioning, per-class versioning and no versioning
>should be supported for compatibility with other formats (issue ii)
>above)

per-archive versioning can easily be handled by appending to a
default preamble.

>b) if per-class versioning is used, it should be possible to turn it
>off for some classes by a traits class - this will get rid of the
>overhead (issue i) above) when versioning is turned off for a UDT.

hmmm - I will have to think about this. Using MFC one has to take
extra steps to include versioning. On my last commercial
project using MFC serialization I didn't do this on some
classes because I was assured that "that class will never change".
Of course it did after the first version of the application shipped
and ended up creating a lot of extra work. So I resolved that
I would just "spend" the on byte per class definition and be done
with it.

similar logic applies to the archive preamble. My original modivation
was the concern that existent archives never become obsolete
by improvements in code - including the archiving systems. So
I needed a version for the archive system itself - hence the preamble.

>6) "Advanced functionality": Robert's serialization library includes
>further functionality, such as the serialization of pointers and of
>polymorphic types. Here I want to focus on serialization of pointers. I
>have not checked the implementation of Robert's library in detail, and
>thus please correct me if I view this wrongly. Serialization of
>pointers requires the conversion of a pointer to an integer. When
>serializing objects, the archive thus has to keep track of the
>addresses of objects, in order to later convert pointers into numbers.
>This again introduces overhead. Robert addresses this partially in his
>library by showing how to bypass this system for a UDT. His approach
>however requires that if I want to bypass the pointer serialization
>mechanism for a type T, then I have to re-implement serialization of
>all standard containers of type T, such as std::vector<T>,
>std::list<T>, std::stack<T>, ... for my type T. My proposal that I have
>mentioned before is, to just add another traits type, which specifies
>whether for a type T the pointer serialization scheme can be bypassed
>(like versioning above) and a faster, optimized serialization used.

This analysis is in general correct. Bookkeeping for objects that may
be serialized as pointers is inherently expensive. And the current
system doesn't provide a clean way to skip this book keeping for
objects that are know never to be serialized as pointers.

Lately, I have been be cleaning up the implementation along the lines
suggested by G. Rozenthal. My intention was to make the library
more "provably correct" and "logically transparent". I didn't forsee
any change of functionality. However, as things get moved around
to a more logical organization, certain things sort of mysteriously
appear. In particular, the current library skips pointer bookkeeping
for fundamental types. In the future the types for which the book
keeping will be skipped will be alterable by the user similar to the
manner which you suggest. I believethat you will find that this addresses
your concern in a natural and complete way.

A really, really fundamental issue in the submitted library is
the usage of "Archive" as a virtual base class. This is the
traditional way of separating interface from implementation.

Advantages
========
a) we're used to it
b) it permits total separation of UDT serialization specification
from archive implementation. UDT serialization specifications don't
even have to be recompiled for different archives.
c) logically decouples UDT serialization concept from archive
implementation concept.
d) permits any UDT serialization implementation to work with
with any archive implementation
e) less compile time dependency - implies simpler code and
faster compilations.

Disadvantages
===========
a) Does not permit archive implementation and UDT serialization
to be coupled. This is the fundamental obstacle to serialization
in XML format.
b) virtual functions incurr some extra overhead in calling

A newer way would be to use template specialization rather than
virtual base class to implement the interface / implementation
paradigm

Advantages
========
a) Permits archive implementation and UDT serialization to be coupled
thereby permitting archives to be "smarter" and facilitating implementation
of something like XML.
b) not virtual function call over head

Disadvantages
==========
a) we're really not used to it yet
b) requires coupling of archive and UDT specification. This can make the
system harder to understand and use in simple cases. System
requires recompilation of the everything for every combination
of UDT and archive used in a program.
c) significantly larger executables
d) much longer compile/build times

In the submitted library, I chose option 1 primarily because of a)
Whether or not this is the best choice really depends on the other factors
mentioned above so I don't see an obvious answer here. In fact, for most
situations either would work just as well.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk