Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2002-12-16 10:15:03


Robert Ramey <ramey_at_[hidden]> writes:

> The fundamental problem that Serialization addresses is the
> reversable deconstruction of an arbitrary set of C++ objects to a
> stream of bytes. That's all it is. We know that it is useful for
> certain things, but that doesn't mean its going to be useful for
> everything.
>
> If that's not what you want to do, then such a library is not going
> to be of help and some other method or tool will have to be used.

I believe that everyone who participated in the discussion wanted
to do that, but had additional requirements as well.

> This whole discussion is not about serialization but about a
> completly different topic.

Which topic?

> It seems that there is a strong belief that there can be created a
> generic system which will map an arbitrary set of C++ objects in a
> meaningful way to another other format.

I think that if you have reversible "deconstruction of an arbitrary
set of C++ objects to a stream of bytes", then you have mapped the
objects to another format in a meaningful way. If it was meaningless,
it wouldn't be reversible. In fact, there are many meaningful ways to
do this mapping that /aren't/ reversible, so if you have a reversible
deconstruction, you've achieved something even stronger.

Reading your next paragraph, maybe what you mean by "in a meaningful
way" is something like, "in such a way that the meaning of the
original data might be deduced without referring to the source code
that serialized it". That appears to be part of what the XML crowd is
after when we start talking about serializing field labels and type
names.

However, I think we need to be careful not to be too absolute about
the requirement for meaningfulness here. Even if it is possible to
fully recover the structure of the data, unless all the classes in
question are "just-a-struct", the serializing program will still
contain other information about the data abstraction which will never
appear in the archive.

Finally, if we're going to discuss this kind of mapping, it behooves
us to be up front about whether the archived representation has to
reflect the physical layout of the data or whether it can be an
abstracted representation.

> Such a system will necessarily have to start with a system of
> reflection that describes all aspects of data structures that the
> system aspires to support. For example, if XML is to be supported,
> items will have to have external names. Once this system of
> reflection (or meta-data) were specified, then importers/exporters
> to from/to each file format would be created. There is at least one
> commercial product that does this (Data Junction) This is a hugely
> ambitious undertaking that is only tenously related to
> serialization.

I think that if we ignore (for the moment) that XML happens to be a
text-based format, the connection to serialization would be that any
format which uses this kind of tagged metadata needs to archive that
data somehow. If a serialization library provides some layer of
abstraction for formatting, once an application begins writing a
format with metadata it can take advantage of the serialization
library to plug in different formats.

If you are asserting that archiving with metadata is a different
problem from archiving without metadata, I think I agree with you.
I'm not ready to say that they're unrelated, though. It seems that
the metadata archiving problem could be seen as a generalization of
what you're calling "serialization". For example, if you write
generic code to archive your data with metadata, I can imagine using
the same generic archive writing code to write a format without
metadata. Since reading/writing multiple formats appears to be
important for some people, this could allow them to write one body of
code to do the whole job.

> I was unprepared for the response to the submission of the
> serialization library for formal review. I had expected some flack
> over implemenation issues (indeed I got my share). I had a lot
> difficulty getting this implemented. The wide variety of compiler
> issues made things much more difficult. I got some flack over
> interface issues - basically "registration" and "describe".

Your description of the review feedback as "flak" reflects, IMO, a big
part of the reason that the submission hasn't been accepted yet.
Boost libraries are in many ways about clear communication. The
formal review process is designed to be a process of deepening
understanding both for the reviewers and the library authors. A
productive relationship to questions and negative feedback results in
increased clarity for one or both sides of the conversation, about the
scope of the domain, the needs of potential users, insurmountable
implementation limitations, trade-offs that were considered, etc. At
the end of the review, I wasn't comfortable that this communication
had been achieved.

The library reviewers ought to keep their review comments on a
productive level, but ultimately the "burden of success" for this
communication rests with the submitter. To be successful, the
submitter has to embrace the process, and in the majority of cases,
when libraries are not accepted (or don't even make it to the review
stage) it is because of the submitter's relationship to the review
feedback.

> I was sensitive to these issues but hadn't been able to come up with
> good implemenations of these features. ( BTW, the review process
> has given me new ideas and information regarding these issues and I
> believe they can be addressed in a satisfactory way.)

That's great news!

> Someone always complains that the generic solution is "not
> efficient"

I think that's relatively rare, actually, because "generic" these days
usually means "templated and inlined".

> and wants it to have more specific features. I am generally
> skeptical of these unless accompanied by hard data which is almost
> never submitted.

If you're referring to Matthias' claim that a virtual function call
for each element of an array of 10M complex numbers will have a
noticeable impact, I guess it's easy enough to test. Did you ever
request a test?

> What really through me for a loop was that most of the objections to
> the submissions were that it wasn't something else. It seems that
> there was a fundamental difference between what users expected a
> "serialization" library to do and what serialization libraries
> generally do (Common C++, and MFC in particular as well as
> references
> http://www.riehle.org/computer-science-research/1996/plop-1996-serializer.html
> )

I suppose different people want to define serialization differently.
Getting back to the issue of communication, one thing I'd like to have
seen out of the review, if you were covering a more-limited scope, was
a clear understanding on all sides of the relationship between the
"something else" that others wanted and what you're implementing.

> In this aspect I believe that the review results fairly reflected
> the views of those who commented on the issue. However, I believe
> the rejection was for the wrong reasons. This has led to the
> current discussion which I believe is really not about serialization
> at all.

What is it about?

-- 
                       David Abrahams
   dave_at_[hidden] * http://www.boost-consulting.com
Boost support, enhancements, training, and commercial distribution

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk