Boost logo

Boost :

From: JOAQUIN LOPEZ MU?Z (joaquin_at_[hidden])
Date: 2005-09-12 17:08:24


----- Mensaje original -----
De: David Abrahams <dave_at_[hidden]>
Fecha: Lunes, Septiembre 12, 2005 11:06 pm
Asunto: Re: [boost] [serialization] No Reply?

> * You should find some way to describe what it means for an input
> and an output archive to be compatible. Try to do that as
> formally as possible. You may have to rely on the fuzzy C++
> notion of "equivalence," but I wouldn't blame you for that.

IMHO, there's no need to go fuzzy or rely on an undefined
notion of "equivalence" between a serialized object and
its restored copy. One can define these concepts with much
precision in an "operational semantics" way, much as, for
instance, semantics of logic languages is formally defined:

* A type T is serializable if the expressions
  oar<<ct;
  iar>>t;
are valid, where ct is of type const T& and t is of type T&,
oar is an output archive and iar is an input archive.

* The storage sequence, or simply storage, of an output
archive oar is the ordered sequence of *all* calls s(i) of
the form:
  oar<<out(i);
where out(i) is of type const T(i)&.

* Given an output archive oar with storage s(i) such
that every T(i) is serializable, we say that an
input archive iar is associated to oar if the sequence
of expressions
  iar>>in(i);
where in(i) is of type T(i)&, does not throw.
We assume the input sequence ranges over the entire
lifetime of iar.

* Under the assumptions above, we say that in(i)
is a restored copy of out(i).

This defines association between archives in an
abstract way. We can go concrete on a per archive
basis, for instance:

* Given an text output archive oar dumping to a
file ofl, and an text input archive iar reading from
a file if, iar is associated to oar if the
contents of ifl coincide with those of ofl.

Now that we have defined what it means for
an input archive to be associated to an output archive,
we can define the operational semantics of serialization
from the concept "being a restored copy".
In what follows we use the previous notation. The idea
is to begin with primitive types and build on those.

* If in(i) is of type bool&, in(i)==true if and only
if out(i)==true.
* If in(i) is of type char&, the numerical value of
in(i) is that of out(i).
* Similarly for other integral types. Different
word sizes should be taken into account.
* If in(i) is of type T*&, and T is serializable, *in(i)
is a restored copy of *out(i).

We can extend this schema inductively as we
progress to more complex types. For example:

* If in(i) is of type std::vector<T>&, and T is
serializable, then in(i).size()==out(i).size() and
each in(i)[j] is a restored copy of out(i)[j].

IMHO, this is sufficiently formal and solves two
theoretical issues:

* We have defined association between output and
input archives without mentioning
extralinguistical devices (files, network streams,
and so on.)
* We have a satisfactory definition of what
serialization is supposed to do without relying
on a naive "equivalence" relationship between
objects and their restored copies: the main
difficulty of such a naive approach like for instance
"the restored copy has the same value as
the original object", is that it does not mean much
when these objects belong to different programs,
running on different timeframes and perhaphs
on different platforms.

What do you think? Admittedly, what I've written
still has some edges to rough, but I think it is
sound enough.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk