Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2006-09-18 14:50:37


Matthias Troyer wrote:
> I am a bit perplexed by your mail, since it is an identical copy of a
> private e-mail you sent me two weeks ago, even before the review
> started.

I realize this - its just that I thought that someone else might
have some other observations to add on the subject.

> The comments of several reviewers, which were initially skeptical
> about our use of the serialization library in a high performance
> context, but whose concerns vanished when they saw the array
> optimizations, should show you that it was not only me who needs
> these optimizations.

I don't object to the array optimizations per se, I'm interested
in seeing if there's a way to do that doesn't hard code coupling
between particular pairs of archives and datatypes into the
the original archive classes. Actually this question applies to the
the modifications in binary_?archive so its a little off topic - but
still related.

> Watch out that there are more such types: multi_array, ublas and MTL
> vectors and matrices, ... With the array wrapper we have an elegant
> solution to handle also these other types. Since we have discussed
> this topic many times on the list over the past year I will not
> comment further for now.

I think this is the part I'm still not seeing. the changes to
binary_?archive
include specializations for std::valarray, std::vector and native C++
arrays. This pattern suggests that for these other data types for which
an optimization might exist, more and more will have to added to
binary archive. And all programs will have to include them even
if they don't use them.

When I originally suggested the idea of an array wrapper
(admitadly not thought out in detail) I envisioned that the array.hpp
would have the "default" serialization - lowest common denominator
which is there so far so good.

so then in say boost/serialization/vector.hpp I expect to see a
specialization
for array like

void array<std::vector<T>::serialize<binary_iarchive &ar, const unsigned
int) const {
    // special stuff for loading data into binary vectors
}

so now programs only have to compile and be aware of the specializations
that they are in fact going to use. And each optimization can be
compiled, tested, etc independently. This adds the same three
"special cases" just in a different place - so the total work is the same.

The only remain problem is to figure a way to do this through the base
class so the optimization gets transmitted to any derivations. I'm not sure
this is a huge deal. So far we only have two archives which can
exploit these optimizations (binary and now mpi). It it becomes really
bothersome, these specializations can forward to a commmon implementation.

With some effort, it might be possible to avoid even this minimal effort.

This would entail making the above somewhat more elaborate

class array<std::vector> {
    template<class Archive>
    void binary_serialize(...){...}

    template<class Archive>
    void mpi_serialize(...){...}

    template<class Archive>
    void serialize<Archive &ar, const unsigned int version) const {
        // if Archive is derived from base_binary
            binary_serialize(ar, version);
        // else
        // if Archive is derived from base_mpi_archive
            mpi_serialize(...)
        // else
            array_default<T>::serialize(* this, version)
}

So - still one gets eactly what you want without forcing all users to
include
and compile specializations/optimizations for all the types you want to
add in the future.

Note that there could be one set of array wrappers for binary_serializable
optimizations - a different set for mpi optimizable wrappers.

This is the motivation for using the array wrapper - to permit the
specializations
for different types to be orthogonal to the archives and types that benefit
from special treatment.

I think its really the just the same code you already have - its just
shuffled around so that it doesn't have to be included when you need it.

> As Doug Gregor pointed out this is not possible since the format is
> implementation-defined, and can change from one execution to another.

OK - I just assumed (wrongly apparently) that an MPI protocol presumed
a heterogeneous environment.

> The set of types for which an array optimization can be done is
> different for binary, MPI, XDR, ... archives, but a common dispatch
> mechanism is possible, which is what we have implemented in the
> array::[io]archive classes.

And I think that is what I have a problem with. The "common" dispatch
as I see it implemented presumes the known optimizable types. When
other optimizable types are added, this will have to grow. It seems to
me that it is fundementally not scalable. So personally, I would prefer
to add the code to the derived types - but I understand this is my
preference.

Your "magic" idea (which you have not
> described to the list yet since it was only in private e-mails) can
> easily be incorporated into that. Just replace
>
> typedef is_fundamental<mpl::_1> use_array_optimization;
>
> by
>
> typedef is_bitwise_serializable<mpl::_1> use_array_optimization;
>
> or
>
> typedef is_magic<mpl::_1> use_array_optimization;
>
> and you have upgraded to your magic optimization!

I would to find this code as part of the array_wrapper for std::vector
rather than as part of the archive class.

> If by three or four logically distinct things you mean
>
> 1. the array optimization
> 2. the skeleton&content archive wrappers
> 3. the MPI archives
> 4. the MPI library
>
> then my comments are:
>
> 1. is already factored out and in the serialization library. If
> anything should be done to it, there was the desire to extend array
> wrappers to strided arrays, which can easily be done without touching
> anything in the serialization library.

Hmmm - what about MTL or ublas - don't these have there own special
types for collections. I know boost::multi_array does. Wouldn't these
have to be added to the std::valarray, and std::vector already in the
binary archive?

> 2. is independent of the rest of the proposed Boost.MPI library but
> we keep it in detail since we do not see any other use for this at
> the moment. Once someone could use it we can move it immediately to
> the serialization library.

OK - there information the skeleton was a little - uhh skeletal . I really
didn't understand how its implemented. The relationship - if any -
to boost serialization isn't clear from the documentation. I suspect that
this will be resolved by amplification in the documentation.

> 3. and 4. are tightly coupled since the MPI archives do not make any
> sense outside the Boost.MPI context and I do not see that splitting
> this into two separate libraries makes any sense at all. The code
> itself is written cleanly though, with no part of the MPI archive
> types depending on any of the communication functions.

This may be true - it wasn't obvious to me. by MPI archives I meant
your packed_archive and that seemed to me a thin wrapper around
base_binary_archive - which is fine with me. So I suspect my
complaint is that the documentation seems to suggest that its something
more elaborate than that.

> Thus I see absolutely no reason at all to shuffle the code around
> anymore, unless you can come up with a reason to move the
> implementation details of skeleton&content to a public place in the
> serialization library.

I am intrigued by the skeleton - again the documentation doesn't really
give a good idea of what it does an what else it might be used for.

So my complaints really come down to two issues.

a) I'm still not convinced you've factored optimizations which
can be applied to certain pairs of types and archives in the best
way.

b) The MPI documention doesn't make very clear the organization
of the disparate pieces. Its a user manual "cookbook" which is
fine as far as it goes. But I think its going to need more explanation
of the design itself.

As an aside, I'm amazed you havn't gotten any flack for not including more
formal documention - including concepts for the template parameters.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk