Boost logo

Boost :

From: Augustus Saunders (infinite_8_monkey_at_[hidden])
Date: 2002-11-22 19:34:46


I have been following the discussion thread for the serialization
library review with some interest, as I think the topic is of extreme
importance. Right up there with smart pointers and threading, it's
something that would be used by many people for many different
things. I want to thank Robert for the obviously extensive amount of
thought and effort that he has put into this. I have this nagging
feeling, though, that we all need to step back for a moment and
re-examine our first principles.

Based on reading the documentation, carefully reading the discussion
from Unicode support to XML archive formats, to registration issues,
etc, and based on my own thoughts on the subject, I get the distinct
impression that we're not at all in agreement about what exactly a
serialization library is supposed to do. From the documentation, it
is clear that Robert is most familiar with MFC's serialization
mechanism, and the library follows in that general mold. He also
states that this is a "serialization" library, and not a
"persistence" library (like what you find with an OO database). I
think that this is an extremeley important point that I don't
remember really being discussed at all. The discussion about
alternate archive formats, especially XML, CSV, and other formats for
interchange with other systems, make sense in the context of
"serialization." A discussion questioning whether these formats
could truly represent all C++ structures (especially diamond
inheritance) ensued, but this point is only relevant to persistence,
not to serialization. Because I think that a lot of the discussion
hinges around this point, I'm going to venture to make a distinction,
and I'd like to know if people agree.

Persistence: A transformation-less transfer of application native
data to an alternate storage medium. Only useful and only intended
to be useful to applications that apriori agree on object type and
layout, presumably by sharing headers. May optionally account for
differences in architecture or compiler. Must be symmetric--support
both store and load. Alternate storage formats would only differ for
effeciency reasons, perhaps at the expense of not supporting
constructs not needed by a given application.

Serialization: A transformation of application native data into a
serial intermediate exchange format specified by the application
writer. Whether objects can be read back in an order different than
they were stored, or if there is any object identification of any
kind, is up to each individual format. Because it is not presumed
that applications share apriori knowledge, it may be necessary to
include meta-data regarding data types. Various structuring
mechanisms may be used, coming in various flavours--header, pre/post
tags, post (ie,terminated), length prepended, packeted, etc.
Metadata may be independant or mixed with structure. Also, it is not
presumed that all expressable object layouts or relationships can be
serialized to all possible formats; it is the responsibility of the
format writer to account for this. Often, only store or load will be
supported--the symmetric operation is performed by the application
one is exchanging data with (really, the point of the excersise).

The library up for review strikes me as a serialization library
intended to function as a persistence library, and I think this
sparked everyone to ask for different things in the confusion. To
me, a persistence library must take into account object factories,
object lifetime management, versioning, and should be fairly
transparent (praying for MPL magic here), while a serialization
library must deal with the archive format issues and explicit
conversion logic. While either task is huge, I really think we need
to clarify the purpose and scope of a boost serialization library
that we would accept--that's only fair to Robert. So here's my
thoughts:

1) We explicity acknowledge that persistance is a seperate topic
requiring a seperate, mostly unrelated, library. It is very
important itself, but we should start a different thread later. That
thread should talk about factories and lifetime management, and not
get confused about whether or not XML is pertinent (it's not, keep it
on this thread).

2) Assuming I can now focus on serialization, the most important
requirement relates to what styles of archive can be generated. In
other words, defining the set of hooks in the serialization process
to insert tags and/or metadata. For starters, very simple examples
demonstrating JPEG style headers, XML style pre/post tags, SWF style
length prepended tags, CSV style terminated lists, and perhaps
protocol style packets.

3) We understand that the library still requires that archives be
written. A couple of the examples should probably be useful, but
other boost members should be encouraged to write and submit archives
for their own pet format.

4) Having introduced tags and metadata issues, escaping schemes need
to be introduced. URL-encoding, C-style backslashing, etc. I don't
remember seeing anything that indicated this was already present;
excuse me if it is.

5) The archive writer also needs a way to write something to verify
archive files. Spirit would probably come in handy here. I haven't
thought through the details of what I would want here, perhaps the
serialization library can't really help with this task.

6) Versioning is not so important at the class level, but at the
entire archive format level.

Perhaps with the meta programming panacea in hand after some language
revisions, we will be able to unify all of the storage and
translation mechanisms under one framework. Until then, I vote that
we keep serialization serial and don't worry about persistence until
we're ready to take the bull by the horns. One last thought on the
multiple format problem: it may be wiser to standardize all
serialization on XML and rely on XSLT to transform into all the
different formats we might want. Again, I want to give a huge thanks
to Robert, you've been wonderfully patient with all of us, even
though everybody keeps voting "no." Thanks, too, to all the
reviewers and their insights.

Cheers-
Augustus

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk