|
Boost : |
From: David Abrahams (dave_at_[hidden])
Date: 2002-12-09 16:05:53
The Serialization library submission by Robert Ramey is not
accepted into Boost at this time.
First of all, I'd like to acknowledge that this was a *very* difficult
review for all concerned. It was tough for the reviewers, for me as
review manager, and especially for Robert Ramey, the library author.
Rendering a decision on the library was correspondingly difficult.
I thank Robert for his work and his patience with the review process,
and I hope that he finds the energy to follow through until we have a
Boost library.
At one point during the review process, Robert wrote to me privately,
expressing the opinion that
After spending the better part of a weekend looking over the library
documentation and re-reading all of the review commentary, I can
understand why Robert might be tempted to conclude that no single
serialization library design would satisfy Boost because there were
just too many conflicting desires on the part of reviewers. However,
I hope he donesn't. I believe the "no serialization library designed
by just one person is likely to satisfy Boost" is much closer to the
truth.
Fortunately, there was great interest in this library (which is why
the scrutiny was so intense) and Robert received many enthusiastic
offers of collaboration from reviewers. I believe the best path
for Boost and for this library is as follows:
0. Reconsider the problem domain in a collaborative environment. If
there are enough participants, a mailing list would be a good start
(I can set up a SourceForge mailing list upon request), and adding
a Wiki Page is easy enough. This process should give strong
consideration to problem domains other than ones originally
envisioned for the library. It should also reflect a reluctance to
begin writing code too early.
1. Agreement on terms. In particular, I strongly suggest beginning
with the definitions of serialization and persistence outlined by
Augustus Saunders in
http://lists.boost.org/MailArchives/boost/msg39598.php. I realize
that Robert didn't like those definitions, but they resonated for
most people (including me), and seem to provide an excellent
starting point.
Robert said "I didn't try to define Persistence as I see it as a
more general notion". Distinctions are usful to the extent that
they partition the space of things actually being considered. If
persistence is defined to be even more general than everything
we're talking about, it's not useful to us. Since we get to choose
the definition, let's choose one we can apply ;-)
2. Careful description of scope. Answer questions like:
* Is this a persistence or serialization library?
* Is it important to be able to plug in arbitrary archive
formats?
* Is it important to be able to use the same UDT serialization
code to write several different archive formats?
* What kinds of applications are we intending to serve?
* What kinds of applications are we explicitly NOT intending to
serve?
3. Careful consideration of the appropriate interface for describing
the serialization of UDTs on a conforming compiler. In particular,
consider the lexical cost of requiring users to specialize library
templates. Also consider that the use of operator<< is going to
invoke ADL anyway, so maybe the interface should just use
that. Serialization of class template specializations and other
classes should use the same mechanisms.
Subsequent consideration of how close the interface can come on
broken compilers, should the participants decide they wish to serve
that user base.
4. Once coding begins, it should go quickly, and proceed in the boost
sandbox.
5. Well, Item 3 drifted a bit into technical issues, so here's a
more-comprehensive list of technical issues I'd like to see
considered carefully and collaboratively. I'm sorry that I didn't
take the time to bring some of these up during the review period,
which was a bit overwhelming just to watch ;-).
* Dave Harris suggested several times that integers should be
written in the binary archive in a variable-length format. This
echoes a philosophy on serialization which I've had for years,
provides many benefits and would seem to allow drastic
simplification of the library if it is decided that the current
scope will be retained, since it entirely obviates the need a
text archive format (the same could be done for floating point
numbers). The only application I can imagine this approach being
unsuitable for would be extremely fast, relatively small
in-memory archives... and I'd have to see benchmarks and a real
use-case to be convinced of that.
* Boost already has a mechanism for exploring the internal
structure of UDTs. It's called visit_each, and it's used by the
signals library to discover bound signal collaborators within
function objects. Could this be exploited for serialization of
composite types?
* Boost already has a mechanism for registering inheritance
relationships and convertibility among classes. It's not part of
the public interface, but is an implementation detail of
Boost.Python. Should this be exploited for serialization?
* Objects without default constructors really should be
deserializable. One possible approach is offered by Python's
serialization mechanism ("pickler"). A class' __getinitargs__
function (if defined) will be called to get the arguments that
should be passed to the class' constructor to reconstitute an
instance of that class. It should be possible to build a similar
mechanism around boost::tuple.
* Is it important to allow all UDTs to be separately versioned?
Every time I have implemented serialization and started with such
a system, I eventually dropped it in favor of a whole-archive
version number. Changing the format of a single class always
creates a backward compatibility problem for new archives anyway.
Allowing the archive to carry the version number also simplifies
the [de]serialization interface. If separate versioning is in
fact important and useful, a rationale should be provided.
* Registration of participating classes must not be required to be
monolithic. More generally, the library must support users who
use polymorphism to insulate themselves from compilation
dependencies.
* Strong consideration should be given to a "you don't pay for what
you don't use" approach. As Ralf Grosse-Kunstleve pointed out to
me, C++ is not really good at serialization, natively. One of
the only reasons to use it instead of a language with stronger
reflection capabilities has got to be that it is fast. Avoiding
virtual function calls for serializing large arrays of small
objects (e.g. complex or rational numbers) must be possible.
* I would like to see the requirement to use *only* ANSI/ISO C++
loosened. Serialization is one of those areas which is simply
not well-supported by standard C++, IMO. Part of what we're
doing here at Boost is expanding the scope of C++ by providing
support for things like threading and the filesystem. Much may
be gained by allowing some components to use extra-legal
constructs that can be easily ported to a majority of platforms.
Two areas that spring to mind are pointer comparisons outside a
single array for unserializing internal object pointers, and the
use of type_info::name() for type identification. Even if these
were optional components to the library, they could provide
enormous benefit for some applications.
[BTW, since the review I have discovered some issues with
type_info::name() and EDG compilers which may make it unsuitable
for type identification in that context, depending on the
application].
Given the enormous interest in addressing this problem domain (or
domains) shown by Boost members, and the many offers of participation,
it would be a real shame if this review didn't ultimately produce a
Boost library that we can all stand behind. Broader collaboration in
the Boost tradition seems like the best way to get there.
Thanks to everyone for their participation in this review.
Special, extra thanks to Robert Ramey for bringing forward his
submission which stirred up this discussion and, I hope, gave us a
start in the right direction.
-- David Abrahams dave_at_[hidden] * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk