Boost logo

Boost :

From: Geoffrey Irving (irving_at_[hidden])
Date: 2006-09-16 03:21:54


On Sat, Sep 16, 2006 at 08:41:53AM +0200, Matthias Troyer wrote:
> Hi,
>
> in light of the performance questions let me summarize some details
> of how the proposed Boost.MPI library sends data:
>
> If an object is sent for which an MPI_Datatype exists, then
> communication is done using that MPI_Datatype. This applies to both
> the builtin primitive types, as well as custom MPI_Datatypes for "POD-
> like" types for which an MPI_Datatype can be constructed. For each
> such type, an MPI_Datatype is built *once* during execution of the
> program by using the serialization library.
>
> If a fixed-size array is sent for which an MPI_Datatype exists then
> again send is done using the MPI_Datatype.
>
> Thus for these two cases we get optimal performance, and a much
> simplified usage, since the creation of MPI_Datatypes is made much
> easier than in MPI.
>
> For all other types (variable-sized vectors, linked lists, trees) the
> data structure is serialized into a buffer by the serialization
> library using MPI_Pack. Again MPI_Datatypes are used wherever they
> exist, and contiguous arrays of homogeneous types for which
> MPI_Datatypes exist are serialized using a single MPI_Pack call
> (using a new optimization in Boost.Serialization). At the end, the
> buffer is sent using MPI_Send. Note here that while MPI_Pack calls do
> incur an overhead, we are talking about sending complex data
> structures for which no corresponding MPI call exists, and any
> program directly written using MPI would also need to first serialize
> the data structure into a buffer.
>
> To counter the mentioned overhead, there is the "skeleton&content"
> mechanism for cases when a data structure needs to be sent multiple
> times, with different "contents" while the "skeleton" (the sizes of
> arrays, the values of pointers, ...) of the data structure remains
> unchanged. In that case the structural information only (sizes,
> types, pointers) is serialized using MPI_Pack and sent once so that
> the receiving side can create an identical data structure to receive
> the data. Afterwards an MPI_Datatype for the "contents" (the data
> members) of the data structure is created, and sending the content is
> done using this custom MPI_Datatype, which again gives optimal
> performance.
>
> It seems to me that the simplicity of the interface does a good job
> at hiding these optimizations from the user. If anyone knows of a
> further optimization trick that could be used then please post it to
> the list.

As far as I can tell, there are three choices when sending a message:

    1. Send it as MPI_PACK, in which case structure and content can be
       encoded together arbitrarily.
    2. Send it as a datatype with the entire structure known beforehand.
       This allows easy use of nonblocking sends and receives.
    3. Send it as a datatype with the structure determined up to the
       total size of the message. This requires getting the size with
       MPI_Probe, then building a datatype, then receiving it.

The third option allows you to send a variable size vector with no
extra explicit buffer. The same applies to a vector plus a constant
amount of data (such as pair<int,vector<int> >). That would be quite
useful, but probably difficult to work out automatically.

Unfortunately, this trick interacts very poorly with multiple
nonblocking messages, since the only ways to wait for one of several
messages sent this way are to either busy wait or use the same tag
for all messages. This restriction probably makes it impossible to
hide this inside the implementation of a general nonblocking send
function.

Also, if my classification of messages strategies is wrong, I would
love to know!

Geoffrey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk