Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-10-19 01:26:12


Matthias Troyer wrote:
> On Oct 15, 2005, at 10:33 PM, Robert Ramey wrote:
>
>> Matthias Troyer wrote:
>>
>>
>>> For a portable binary archive this solution is perfect. For fast
>>> array serialization a similar approach has problems, as I outlined
>>> under point 3 and 4 in my response to Robert Ramey: you need to
>>> specifically overload _in the archive_ for all classes that want to
>>> make use of fast array serialization, thus introducing a tight
>>> coupling between archive and class to be serialized, as well as
>>> making it hard to extend.
>>>
>>
>> Here is where were on different pages. you only have to override
>> serialization of vector etc in only one place - in the
>> fast_archive_adaptor
>> class.
>> Then the behavior is available to any class that the adaptor is
>> applied to.
>
> There are two severe problems with this approach:
>
> 1. Most of my archives using fast array serialization would not be
> written as archive adaptors, since for MPI archives, PVM archives,
> and many others, it does not make any sense to write an archive
> without fast array serialization. These archives have to support fast
> array serialization from the start.

> 2. and here is the main problem, you propose that all serialization
> code (not only for std::vector but for all future classes, such as
> valarray,multi_array, ublas and MTL matrices) be written without
> concern for fast array serialization, and that I then provide
> overload for all these classes in an adaptor. There are a number of
> reasons why this is either not good, or will not work at all:

> a) it leads to a tight coupling of the archive classes to
> implementation details of all these libraries. The code to serialize
> a boost::multi_array should be with the multi_array library and not
> in my archive class.

> b) the user of my library will have to include hundreds of lines of
> serialization code for all these classes, even if he never needs
> them. Contrast that with the inclusion of a few lines for save_array
> and load_array.
>
> c) even worse: in the cases I referred to this usually cannot be
> implemented without being intrusive on the library whose datatype is
> being serialized. E.g. the Boost.multi_array, MTL or Blitz will have
> to be modified to allow serialization, since serialization is
> intrusive for most classes. The "adaptor" you are proposing will then
> also have to be intrusive!

I don't really agree with this but it doesn't really matter. In this case
just
write your own fast_binary archive and derive variations from it. You can
skip the binary_archive and basic_binary archive all together. This would
be very easy.

> Please note here, that the "non intrusive" serialization you show in
> the tutorial is still intrusive. You had to make the data members
> public to be able to implement the "non intrusive" serialization. For
> classes that have getter and setter functions for all members, or
> where I can extract the constructor arguments from the class in a non-
> intrusive way it is possible to write non-intrusive serialization.
> But for all other cases, there is no such thing as non-intrusive
> serialization. It is even worse in the case of Blitz arrays, which
> have their own built-in reference counted memory allocation scheme.
> Views in multi_arrays are similar. There is no non-intrusive way to
> serialize these data structures!

I've come to realize that some classes do not provide an iterface sufficient
to support serialization. share_ptr has this problem as does boost::any.
I'm sure there are others for as well. Its out of my hands.

> Since Boost.Serialization support has to be intrusive for these data
> structures, I believe that the intrusiveness should be kept to a
> minimum and only one serialization function be provided.
>
> If the serialization library documentation
> tells, e.g. the MTL authors to serialize their arrays by looping over
> all elements, I will have to, after they implement their version, be
> intrusive on the MTL library to get direct array serialization in.
> Better to have them support it directly!

fine - just make your own archive. I'm perfectly happy with this. The
documentation can easily be changed so that for the archives included
with the package the default serialization of arrays is ...

>> One still has to include the header. This violates the boost
>> principle -
>> don't
>> pay for what you don't use. Actually the boost principle would be
>> - don't
>> even know about what you don't use - which is better anyway.

> You already violate this principle much more severely in the
> serialization library. If I do not want object tracking and
> versioning for a text_oarchive of some objects, the code for tracking
> and versioning is still included by the serialization library.

The headers are included but the code isn't instantiated.

> Robert, to focus the discussion and not get stuck in details let me
> stress a point that I had previously made at both reviews of your
> library, and that I still believe in:
>
> * A serialization library without built-in support for serialization
> of arrays is fundamentally flawed *

> I believe that this is the main issue we need to get sorted out
> first, since it is the fundamental point of disagreement.
>
> You write (I quote from your other e-mail):
>
>> default serialization of C array .... does have a universal default
>> implementation.
>
> and your default implementation of saving is to save the size as an
> unsigned int and then loop over all the elements, saving each one.
>
> My opinion is that this is the wrong approach! Instead a save_array
> function should be called, for which the default implementation would
> be just what you describe above, but the archive can overload it.
>
> Here are the reasons:
>
> 1.) 10x speedup, or more
>
> 2.) no need to provide intrusive overloads for all classes that want
> to use save_array
>
> 3.) prior art. There is a reason why
>
> - MPI, PVM and other message passing libraries support array
> serialization
> - XDR, used for remote procedure calls under Unix, has special
> support for arrays
> - HDF5, a standard for large scientific data sets operates
> directly on large arrays
>
> To interface to all these libraries, and to achieve reasonable
> performance with them and with binary archives, the direct support
> for array serialization by the serialization library is essential.

This is not an issue of efficiency. The instanctiated code is the same
regardless of where you put it.

I have the general case in the core library and any one is free to make
his own archive for more specific cases - which this is. The native binary
archive is actually quite small. Its only as big as it is because it
supports a
wide character interface. The idea of a wide char interface for the binary
archive is dubiuos anyway. So its simple just to make your own version
of binary archives. The library supports and encourages that and you don't
have to change anything in the core to do that. I'm looking forward to
seeing the final result.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk