Boost logo

Boost :

From: Yitzhak Sapir (yitzhaks_at_[hidden])
Date: 2002-12-12 05:32:54


On Wed, 11 Dec 2002, David Abrahams wrote:

> brangdon_at_[hidden] (Dave Harris) writes:
>
> >> 1. Agreement on terms. In particular, I strongly suggest beginning
> >> with the definitions of serialization and persistence outlined by
> >> Augustus Saunders in
> >> http://lists.boost.org/MailArchives/boost/msg39598.php. I realize
> >> that Robert didn't like those definitions, but they resonated for
> >> most people (including me), and seem to provide an excellent
> >> starting point.
> >
> > For what it's worth, I didn't like those definitions. In my view
> > serialisation is the right name for what the submitted library did.
> > Persistence is just serialisation to a persistent
> > medium. Persistence is a property of media rather than formats. It
> > all but comes for free once you have decent serialisation.
>
> The term "persistence", as I've heard it used for years, is used to
> denote the ability to use "the same" complex data structure across
> successive invocations of the same program. Thus, the data
> "persists". Serialization to a persistent medium is one way to
> implement it, and serialization is almost always a component of the
> system. However, there are other approaches. For example, the system
> might leave parts of the data structure on disk until such time as
> they're accessed.
>
> I still think the other definitions are more useful. In your terms,
> "persistence" slices off a tiny fraction of the space of useful
> functionality, and everything we care about lies in the domain of
> "serialization". I'm willing to use any terms that everyone will
> agree to (including yours), but whichever terms we use should be at
> least as clearly defined as what Augustus wrote. So far, you haven't
> provided a clear definition of serialization.

I would like to offer the following definition (based on the previous
definition given by Augustus Saunders):

Serialization is the process of breaking up various application-defined
containers of data into their components and serializing each component
one by one into a stream, in an agreed-upon intermediate exchange format
that enables an appropriate parser to reconstruct ("deserialize")
necessary information by reading the stream. The containers may contain
components of (1) a fixed quantity and type (arrays/bit arrays and
integral types), (2) varying quantity and fixed type (lists/vectors), (3)
fixed quantity and varying types (structs and classes), or (4) unit
quantity and discriminated type (unions). Each component of the container
may be a container in itself, which is why the definition is necessarily
recursive. Necessary information is defined by the needs of the parser.
Because it is not presumed that the parser (which may be a human being)
shares apriori knowledge, it may be necessary to include meta-data
regarding data types. Various structuring mechanisms may be used in the ,
coming in various flavours--header, pre/post tags, post (ie,terminated),
length prepended, packeted, etc. Metadata may be independent or mixed with
structure.

I disagree that serialization is by necessity lossy, or that serialization
by necessity performs transformations and persistence does not.
Persistence may perform non-lossy transformation. If the data is to
persist in a file, pointers may have to be appropriately transformed.
(Maybe I don't understand the meaning of transformation in the given
paragraph). Serialization may perform lossy transformations, or it may
not. It may be symmetric, or it may not. But serialization always
involves data fed serially into a stream. (A stream being defined as a
medium that maintains serial data). Serialization always involves a
format in which that serial data represents the original data. And
serialization is the process of connecting between the data (and maybe
metadata) itself, the format, and the stream.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk