Boost logo

Boost :

From: Matthias Troyer (troyer_at_[hidden])
Date: 2005-10-16 10:57:27


On Oct 15, 2005, at 10:33 PM, Robert Ramey wrote:

> Matthias Troyer wrote:
>
>
>> For a portable binary archive this solution is perfect. For fast
>> array serialization a similar approach has problems, as I outlined
>> under point 3 and 4 in my response to Robert Ramey: you need to
>> specifically overload _in the archive_ for all classes that want to
>> make use of fast array serialization, thus introducing a tight
>> coupling between archive and class to be serialized, as well as
>> making it hard to extend.
>>
>
> Here is where were on different pages. you only have to override
> serialization of vector etc in only one place - in the
> fast_archive_adaptor
> class.
> Then the behavior is available to any class that the adaptor is
> applied to.

There are two severe problems with this approach:

1. Most of my archives using fast array serialization would not be
written as archive adaptors, since for MPI archives, PVM archives,
and many others, it does not make any sense to write an archive
without fast array serialization. These archives have to support fast
array serialization from the start.

2. and here is the main problem, you propose that all serialization
code (not only for std::vector but for all future classes, such as
valarray,multi_array, ublas and MTL matrices) be written without
concern for fast array serialization, and that I then provide
overload for all these classes in an adaptor. There are a number of
reasons why this is either not good, or will not work at all:

a) it leads to a tight coupling of the archive classes to
implementation details of all these libraries. The code to serialize
a boost::multi_array should be with the multi_array library and not
in my archive class.

b) the user of my library will have to include hundreds of lines of
serialization code for all these classes, even if he never needs
them. Contrast that with the inclusion of a few lines for save_array
and load_array.

c) even worse: in the cases I referred to this usually cannot be
implemented without being intrusive on the library whose datatype is
being serialized. E.g. the Boost.multi_array, MTL or Blitz will have
to be modified to allow serialization, since serialization is
intrusive for most classes. The "adaptor" you are proposing will then
also have to be intrusive!

Please note here, that the "non intrusive" serialization you show in
the tutorial is still intrusive. You had to make the data members
public to be able to implement the "non intrusive" serialization. For
classes that have getter and setter functions for all members, or
where I can extract the constructor arguments from the class in a non-
intrusive way it is possible to write non-intrusive serialization.
But for all other cases, there is no such thing as non-intrusive
serialization. It is even worse in the case of Blitz arrays, which
have their own built-in reference counted memory allocation scheme.
Views in multi_arrays are similar. There is no non-intrusive way to
serialize these data structures!

Since Boost.Serialization support has to be intrusive for these data
structures, I believe that the intrusiveness should be kept to a
minimum and only one serialization function be provided.

Thus your statement

> In genearl, I want no more coupling than is absolutly necessary. I
> don't
> think
> its necessary here. You can get every thing you want and more by
> using an archive adaptor.

is clearly incorrect. If the serialization library documentation
tells, e.g. the MTL authors to serialize their arrays by looping over
all elements, I will have to, after they implement their version, be
intrusive on the MTL library to get direct array serialization in.
Better to have them support it directly!

And to answer:

>> Actually the cost is minimal if the archive does not support fast
>> save/load_array. The hasfast_array_serialization.hpp header only
>> consists of the default traits:
>>
>
> One still has to include the header. This violates the boost
> principle -
> don't
> pay for what you don't use. Actually the boost principle would be
> - don't
> even know about what you don't use - which is better anyway.

You already violate this principle much more severely in the
serialization library. If I do not want object tracking and
versioning for a text_oarchive of some objects, the code for tracking
and versioning is still included by the serialization library.

Robert, to focus the discussion and not get stuck in details let me
stress a point that I had previously made at both reviews of your
library, and that I still believe in:

* A serialization library without built-in support for serialization
of arrays is fundamentally flawed *

I believe that this is the main issue we need to get sorted out
first, since it is the fundamental point of disagreement.

You write (I quote from your other e-mail):

> default serialization of C array .... does have a universal default
> implementation.

and your default implementation of saving is to save the size as an
unsigned int and then loop over all the elements, saving each one.

My opinion is that this is the wrong approach! Instead a save_array
function should be called, for which the default implementation would
be just what you describe above, but the archive can overload it.

Here are the reasons:

1.) 10x speedup, or more

2.) no need to provide intrusive overloads for all classes that want
to use save_array

3.) prior art. There is a reason why

   - MPI, PVM and other message passing libraries support array
serialization
   - XDR, used for remote procedure calls under Unix, has special
support for arrays
   - HDF5, a standard for large scientific data sets operates
directly on large arrays

   To interface to all these libraries, and to achieve reasonable
performance with them and with binary archives, the direct support
for array serialization by the serialization library is essential.

In the past you have claimed that there is no need for a special
array serialization and wanted to see benchmarks. We now have
benchmark numbers (not only from me), that show roughly 10x or more
penalty. Furthermore we have cases where serialization is impossible
without it.

Matthias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk