Boost logo

Boost Users :

Subject: Re: [Boost-users] [Boost.MPI] question about serializingcustomobject
From: David Abrahams (dave_at_[hidden])
Date: 2010-08-27 03:22:18


At Thu, 26 Aug 2010 16:05:41 -0800,
Robert Ramey wrote:
>
> Dave Abrahams wrote:
> > BoostPro Computing, http://boostpro.com
> > Sent from coveted but awkward mobile device
> >
> >> If one were using heterogenious machines, I could understand the
> >> usage of MPI types. But as I understand it, the MPI serialization
> >> presumes that the machines are binary compatible.
> >
> > You're mistaken.
> >
> >> So I'm just not seeing this.
> >
> > Start reading here:
> > http://www.boost.org/doc/libs/1_39_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content
> > and all will be revealed
>
> lol - I've read that several times.

I always wonder, when you write that, whether you're physically
laughing out loud. That's OK; don't spoil the mystique ;-)

> I just never found it to be very revealing. The word skeleton
> seemed pretty suggestive. It's still not clear to me how such a
> think can work between heterogeneous machines. For example, if I
> have an array of 2 byte integers and they each need to get
> transformed one by one into a 4 byte integer because that's closest
> MPI data type,

I think you don't understand what MPI datatypes do. They are not like
an API that you have to conform to.

> I don't see how the fact that the "shape" doesn't change helps you.

Well, let me try to explain all this; I think it is important stuff
and not generally as well-understood as it should be. And since
Boost.MPI and Boost.Serialization are so closely related, I think it's
especially important that *you* underestand. I hope Matthias will
correct me if I get anything wrong.

The first thing to understand is at the hardware level. Network cards
have a fixed-size buffer. Sending anything over the network involves
getting it into the buffer. If the buffer fills up, packets are
waiting to go out, and you can't put anything further in the buffer
until that happens.

One mission of MPI (the non-boost variety) is to provide a portable
high-level API for sending out these messages. Therefore, MPI deals
with the low-level stuff and has/needs direct access to the network
buffer *but code written on top of MPI does not* <== note
emphasis. <<<=== No, really, note it!

Now let's assume a heterogeneous environment. In that case, you can't
just “blit the bytes;” somebody needs to figure out how to encode data
for transmission and decode it upon receipt so that it has the same
meaninig on both ends. In particular, bytes may need to be permuted
to account for endian-ness. Let's further assume a system that
transmits in little-endian, so before sending from a big-endian
machine one needs to swap bytes (all the other possible schemes have
the same consequences---by choosing one we can work with specific
examples).

The code doing the permutation has to know about the data structure,
rather than operating on the data as raw bytes only. For example, if
the data structure is a sequence of 32-bit integers, you need to
reverse each group of 4 consecutive bytes in-place. If it's a
sequence of 16-bit integers, you need to reverse pairs of bytes
in-place, and if it's a sequence of chars, you don't need to do
anything.

Now suppose MPI only knew about byte sequences --- sort of like files
only know about byte sequences. What would we do? We'd use something
like Boost.Serialization to handle the translation. We'd serialize
our source data into a portable representation that could be passed to
MPI, which would then copy the bytes into the network buffer as
needed. That's two copies of every byte. One copy to serialize, and
another copy into the network buffer.

However, we have MPI datatypes and type maps, which describe the
structure of data to be transmitted. Take a quick look at
http://ww2.cs.mu.oz.au/498/notes/node20.html, and note that an MPI
type map is a sequence of (type, byteoffset) pairs. If there's
padding in your data structure, the type map captures that fact so
padding bytes aren't sent. So we can tell MPI about the data
structure and let MPI put the serialized representation *directly*
into the network buffer. That's one copy of every byte.

Of course, making this efficient depends on sending the same data
structure multiple times. MPI type maps are on the order of the same
size as the data structure they describe, so creating one might be
roughly the same cost as a copy. So using the same "shape" over and
over again is important.

Unfortunately, an MPI type map is a pain to create, and is even more
painful to maintain as data structures change. So, typically, type
maps get created for a very few structs (essentially) and more complex
structures are typically sent with a series of MPI calls (some of
which use type maps).

The genius of Boost.MPI is in three realizations:

1. You can represent all the values in an arbitrarily complex
   non-contiguous data structure with a single type map. It's like a
   struct that extends from the data's minimum address to its maximum,
   probably with *lots* of padding.

2. When you serialize a complex data structure, the archive sees the
   type and address of every datum.

3. You can treat addresses as byte offsets (from address zero).

So Boost.MPI has an Archive type one that creates an MPI type map by
treating addresses as offsets and translating fundamental C++ types
into MPI datatypes. This involves no actual data copying. Then it
uses the type map to send the "giant struct beginning at address
zero." This avoids an expensive intermediate serialization phase
before MPI actually gets its paws on your data.

HTH, and I hope I got all that right :-)

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net