Boost logo

Boost :

From: Matthias Troyer (troyer_at_[hidden])
Date: 2006-09-18 05:53:11


Hi Robert
On Sep 17, 2006, at 7:56 PM, Robert Ramey wrote:

> This isn't a full review - I've just read the documentation and
> perused the
> code. Its more like random observations. A lot of this is very
> particular
> to the way that the MPI library uses/builds upon serialization. So
> it may
> not be of interested to many.

I am a bit perplexed by your mail, since it is an identical copy of a
private e-mail you sent me two weeks ago, even before the review
started. I had replied to your e-mail but never heard back from you.
I am thus confused about the purpose of your message but assume that
you want me to reply in public now. I will essentially follow what I
have written in my reply two weeks ago but modify it a bit so that
the list readers who have not followed our private e-mail exchange
can follow.

> I've spent some more time reviewing the serialization currently
> checked into
> the head. It's quite hard to follow.
> Among other things, the MPI library includes the following:
>
> a) an optimization for serialization of std::vector, std::valarray and
> native C++ arrays in binary archives.

I assume you mean the array wrapper here? If yes, then this is not
restricted to the above types but also needed for other array-like
data structures. Furthermore this is not part of the MPI library but
a general extension to the serialization library that we had
implemented there last spring, with your consent.

The comments of several reviewers, which were initially skeptical
about our use of the serialization library in a high performance
context, but whose concerns vanished when they saw the array
optimizations, should show you that it was not only me who needs
these optimizations.

> b) A new type of archive (which should be called mpi_?archive) which
> serializes C++ structures in terms of MPI datatypes. This would
> complement
> the archive types that are already included in the package.

Do you mean the mpi::packed_archive We prefer to call it
mpi::packed_archive, since it is actually a slow and not the
preferred way of sending via MPI. This does not make use of MPI data
types, and is in fact - as you saw yourself - a variant of the
binary_archive, with the only main distinction that save_binary
cannot be used for fast serialization of array wrappers - a different
function is needed, hence the abstraction using save_array.

> i) text - renders C++ structures in terms of a long string of
> characters -
> the simplest portable method.
> ii) binary - renders C++ structures as native binary data. The fastest
> method - but non portable.
> iii) renders ... as xml elements - a special case of i) above.
>
> So we would end up with an mpi_archive and optionally
> mpi_primitive. In the
> other archives, I separated ?_primitive so this could be shared by
> both text
> and xml. In your case it isn't necessary to make an mpi_primitive
> class -
> though it might be helpful and it would certainly be convenient to
> leverage
> on the established pattern to ease understanding for the casual
> reader.
>
Actually we already have an MPI primitive class, and the packed
archive is - as you saw - just a binary archive with special
primitives, and a special way of dealing with arrays. if we keep a
save_array instead of just using save_binary in the binary archives,
then indeed just by specifying the MPI primitives we create an MPI
archive - I liked that design of yours once I grasped how you split
off the primitives.

Note, however, as Doug Gregor pointed out, that the mpi_primitives
are not very useful outside an MPI message passing context.

> c) the "skeleton" idea - which I still haven't totally figured out
> yet. I
> believe I would characterize this as an "archive adaptor" which
> changes the
> behavior of any the archive class to which it is applied. In this
> way it is
> similar to the "polymorphic_?archive" .

Indeed these are similar wrappers
>
> In my view these enhancements are each independent of one another.
> This is
> not reflected in the current implementation.

Have you looked at the code? Actually the skeleton "archive adaptors"
do not depend at all on the rest of the MPI library and they can
easily be factored out in case they are useful also in another
context. For now, since they are an implementation detail, never
occur in the public API, and we do not see them useful in another
context at the moment, we have left them in detail.

> I would suggest the following:
>
> a) enhancements to the binary archive be handled as such. We're
> only talking
> about specializations for three templates - std::vector,
> std:valarray and
> native C++ arrays. I know these same three are also handled
> specially for
> mpi_?archives, but it's still a mistake to combine them. in binary_?
> archive
> they are handled one (load binary) while in mpi_archive they are
> handled
> another (load_array) I still think this would be best implemented as
> "enhanced_binary_?archive".

Watch out that there are more such types: multi_array, ublas and MTL
vectors and matrices, ... With the array wrapper we have an elegant
solution to handle also these other types. Since we have discussed
this topic many times on the list over the past year I will not
comment further for now.

If you do not like the way we have implemented the array
optimizations in the binary archive then we can just roll back the
CVS state to the version at the end of May where we had implemented a
separate array-optimized binary archive, and non of the MPI archives
needed to change any of your archives.

> b) mpi_?archive should derive directly from common_?archive like
> basic_binary_?archive does. The reason I have basic_... is that for
> xml and
> text there are separate wide character versions so I wanted to
> factor out
> the commonality. In your case, I don't think that's necessary so I
> would
> expect your hierarchy would look like
> class mpi_archive :
> public common_archive,
> public interface_archive
> ...

Do you mean the packed archive? This is actually a binary archive -
do you really mean that we should reimplement the functionality of
the binary archive and not reuse what is there?
> Note that you've used packed_archive - I would use mpi_archive
> instead. I
> think this is a better description of what it is.

I still prefer mpi::packed_archive, since there can also be other MPI
archives. One possible addition to speed up things on homogeneous
machines might be just an mpi::binary_archive, using a binary buffer.

> Really its only a name change - and "packed archive" is already
> inside an
> mpi namespace so its not a huge issue. BUT I'm wondering if the
> idea of
> rendering C++ data structures as MPI primitives should be more
> orthogonal to
> MPI prototcol itself. That is, might it not be sometimes
> convenient to save
> such serializations to disk? Wouldn' this provide a portable
> binary format
> for free? (Lots of people have asked for this but no one as been
> sufficiently interested to actually invest the required effort).

As Doug Gregor pointed out this is not possible since the format is
implementation-defined, and can change from one execution to another.

> 4) Shouldn't there be a logical place for other archive types for
> message
> passing - how about XDR? I would think it would be close cousin to
> MPI
> archives.

XDR might be used by an implementation or not - these are
implementation details and a good MPI implementation is supposed to
pick the best format.

>
> c) The skeleton idea would be
> template<class BaseArchive>
> class skeleton_archive
> ....???
> (I concede I haven't studied this enough).

Indeed, the skeleton archives could be factored out if anybody sees
another use for them. This is an orthogonal piece of code, and we
should discuss where it can be useful. One possible application is to
visualize data structures without caring about the content, but only
about types and pointers. But I don't know if anyone needs this or if
there is another use for these code pieces. If there is then we can
factor it out of the mpi detail namespace and put it into archive
with no essential changes to the code.

> The only "repeated" or shared code might be that which determines when
> either a binary or mpi optimization can be applied. It's not clear
> to me
> whether this criteria applies to both kinds of archives ore each
> one has its
> own separate criteria. If it's the latter - there's no shared code
> and we're
> done. If it's the former, the a separate free standing concept has
> to be
> invented. In the past I've called this "binary serializable" and
> more lately
> "magic". ( a concession to physicist's fondness for whimsical names).

The set of types for which an array optimization can be done is
different for binary, MPI, XDR, ... archives, but a common dispatch
mechanism is possible, which is what we have implemented in the
array::[io]archive classes. Your "magic" idea (which you have not
described to the list yet since it was only in private e-mails) can
easily be incorporated into that. Just replace

     typedef is_fundamental<mpl::_1> use_array_optimization;

by

     typedef is_bitwise_serializable<mpl::_1> use_array_optimization;

or

     typedef is_magic<mpl::_1> use_array_optimization;

and you have upgraded to your magic optimization!

> So rather or in addtion to an MPI library you would end up with three
> logically distinct things. Each one can stand on its own.

> So depending on this last, the serialization part of the MPI
> library falls
> into 3 or 4 independent pieces. If the code where shuffled around
> to reflect
> this, it would be much easier to use, test, verify, enhance and
> understand.
> Also the skeleton concept might be then applicable to other types of
> archives. Also the "magic" concept really is a feature of the type
> and is
> really part of the ad hoc C++ type reflection which is what
> serialization
> traits are.

If by three or four logically distinct things you mean

1. the array optimization
2. the skeleton&content archive wrappers
3. the MPI archives
4. the MPI library

then my comments are:

1. is already factored out and in the serialization library. If
anything should be done to it, there was the desire to extend array
wrappers to strided arrays, which can easily be done without touching
anything in the serialization library.

2. is independent of the rest of the proposed Boost.MPI library but
we keep it in detail since we do not see any other use for this at
the moment. Once someone could use it we can move it immediately to
the serialization library.

3. and 4. are tightly coupled since the MPI archives do not make any
sense outside the Boost.MPI context and I do not see that splitting
this into two separate libraries makes any sense at all. The code
itself is written cleanly though, with no part of the MPI archive
types depending on any of the communication functions.

Thus I see absolutely no reason at all to shuffle the code around
anymore, unless you can come up with a reason to move the
implementation details of skeleton&content to a public place in the
serialization library.

Matthias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk