Boost logo

Boost :

From: Matthias Troyer (troyer_at_[hidden])
Date: 2006-09-18 17:01:10


On Sep 18, 2006, at 8:50 PM, Robert Ramey wrote:

>> The set of types for which an array optimization can be done is
>> different for binary, MPI, XDR, ... archives, but a common dispatch
>> mechanism is possible, which is what we have implemented in the
>> array::[io]archive classes.
>
> And I think that is what I have a problem with. The "common" dispatch
> as I see it implemented presumes the known optimizable types. When
> other optimizable types are added, this will have to grow. It
> seems to
> me that it is fundementally not scalable. So personally, I would
> prefer
> to add the code to the derived types - but I understand this is my
> preference.

No, the "optimizable types" are not the types (like std::vector,
std::valarray) for which an array optimization exists, but rather the
value_types of the array for which the storage can be optimized. This
set depends only on the archive itself, and not on the types, and
each archive can have its own lambda expression to determine whether
the value_type is optimizable. Adding optimized serialization to e.g.
multi_array will only mean that multi_array should use the array
wrapper to serialize its data instead of writing a loop over all
elements. This simplifies the serialization implementation for this
class, and automatically provides optimized serialization for all the
types, without any change in the serialization library, nor any
change in an archive.

This is perfectly scalable in contrast to your idea of having each
archive class re-implement the serialization of all optimizable
containers. I am a bit confused about your arguments above since it
was actually you who suggested the array wrapper as the least
intrusive and scalable solution

> I would to find this code as part of the array_wrapper for std::vector
> rather than as part of the archive class.

again, there is no array_wrapper for std::vector, rather the
std::vector<T> serialization serializes its data through an array<T>
wrapper, as you had proposed.

> This would entail making the above somewhat more elaborate
>
> class array<std::vector> {
> template<class Archive>
> void binary_serialize(...){...}
>
> template<class Archive>
> void mpi_serialize(...){...}
>
> template<class Archive>
> void serialize<Archive &ar, const unsigned int version) const {
> // if Archive is derived from base_binary
> binary_serialize(ar, version);
> // else
> // if Archive is derived from base_mpi_archive
> mpi_serialize(...)
> // else
> array_default<T>::serialize(* this, version)
> }
>

Ouch!!! This is just what I mean by not scalable. We already have
five cases now (plain, binary, packed MPI, MPI datatype, skeleton)
with two more coming soon (XDR, HDF5). Do you really want that each
author of a serialization function for an array-like data structure
should reimplement an optimization for all these archives????????

>
>> If by three or four logically distinct things you mean
>>
>> 1. the array optimization
>> 2. the skeleton&content archive wrappers
>> 3. the MPI archives
>> 4. the MPI library
>>
>> then my comments are:
>>
>> 1. is already factored out and in the serialization library. If
>> anything should be done to it, there was the desire to extend array
>> wrappers to strided arrays, which can easily be done without touching
>> anything in the serialization library.
>
> Hmmm - what about MTL or ublas - don't these have there own special
> types for collections. I know boost::multi_array does. Wouldn't
> these
> have to be added to the std::valarray, and std::vector already in the
> binary archive?

I skipped most of the above because it seems there is a fundamental
misunderstanding regarding the role of the array wrapper. The array
wrapper, which you had suggested yourself, was introduced to
completely decouple array optimizations from specific datatypes. When
implementing MTL, ublas, Blitz or other serialization one just uses
an array wrapper to serialize contiguous arrays. An archive can then
user either the element-wise default serialization of the array
wrapper, or decide to overload it, and implement an optimized way --
independent of which class the array wrapper came from.

Thus, there is no std::vector, std::valarray, ... overload in any of
the archives - not in the binary archive nor anywhere else. What you
seem to propose, both above and in the longer text I cut, is to
instead re-implement the optimized serialization for all these N
classes in the M different archive types that can use it (we have M=4
now with the binary, packed MPI, MPI datatype, and skeleton archives,
and soon we'll do M+=2 by adding XDR and HDF5 archives.). Besides
leading to an M*N problem, which the array wrapper was designed to
solve, this leads to intrusion problems into all classes that need to
be serialized (including multi_array and all others), which is not
feasible as we discussed last year.

> I am intrigued by the skeleton - again the documentation doesn't
> really
> give a good idea of what it does an what else it might be used for.

The skeleton is just all types that you treat in the archive classes
and not in the primitives, while the contents is all you treat in the
primitives. It is just a formalization of your serialization library
implementation details.
>
> So my complaints really come down to two issues.
>
> a) I'm still not convinced you've factored optimizations which
> can be applied to certain pairs of types and archives in the best
> way.

That's a separate discussion which we seem to be repeating every few
months now. It seems to me from today's discussion that there is a
confusion now about the use of the array wrapper, which we use in
just the way you originally proposed.
> b) The MPI documention doesn't make very clear the organization
> of the disparate pieces. Its a user manual "cookbook" which is
> fine as far as it goes. But I think its going to need more
> explanation
> of the design itself.

Most of the issues you are interested, such as the use of
serialization for the skeleton&content are implementation details,
the important points of which will be explained in a paper that is
currently being written.

Matthias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk