Boost logo

Boost :

From: Douglas Gregor (doug.gregor_at_[hidden])
Date: 2006-09-16 09:43:09


On Sep 16, 2006, at 5:05 AM, Matthias Troyer wrote:
> On Sep 16, 2006, at 10:22 AM, Markus Blatt wrote:
>> The question came up when I looked into mpi/collectives/
>> broadcast.hpp:
>>
>> // We're sending a type that does not have an associated MPI
>> // datatype, so we'll need to serialize it. Unfortunately, this
>> // means that we cannot use MPI_Bcast, so we'll just send from the
>> // root to everyone else.
>> template<typename T>
>> void
>> broadcast_impl(const communicator& comm, T& value, int root,
>> mpl::false_)
>>
>> If this function gets called the performance will definitely be
>> suboptimal as the root will send to all others. It this just the case
>> if no MPI_Datatype was constructed (like for the linked list) or
>> is it
>> called whenever the boost serialization is used?
>
> OK, I see your concern. This is actually only used when no
> MPI_Datatype can be constructed. That is when there no MPI_Datatype
> is possible, such as for a linked list, and if you do not use the
> skeleton&content mechanism either.

Right. from code code standpoint, in addition to that broadcast_impl
shown above, there is one that looks like this:

     // We're sending a type that has an associated MPI datatype, so
     // we'll use MPI_Bcast to do all of the work.
     template<typename T>
     void broadcast_impl(const communicator& comm, T& value, int
root, mpl::true_)

That last parameter decides which implementation to use, based on
whether we have or can create an MPI_Datatype for the type T.

> Since this part of the code was written by Doug Gregor, I ask him to
> correct me if I say something wrong now or if I miss something. When
> no MPI datatype exists then we need to pack the object into a buffer
> using MPI_Pack, and the buffer needs to be broadcast. So far we all
> seem to agree. The problem now is that the receiving side needs to
> know the size of the buffer to allocate enough memory, but there is
> no MPI_Probe for collectives that could be used to inquire about the
> message size. I believe that this was the reason for implementing the
> broadcast as a sequence of nonblocking sends and receives (Doug?).

Yes, this was the reason for the sequence of nonblocking sends and
receives.

> Thinking about it, I realize that one could instead do two
> consecutive broadcasts: one to send the size of the buffer and then
> another one to send the buffer. This will definitively be faster on
> machines with special hardware for collectives. On Beowulf clusters
> on the other hand the current version is faster since most MPI
> implementation just perform the broadcast as a sequence of N-1 send/
> receive operations from the root instead of optimizing it.

Right. I guess we could provide some kind of run-time configuration
switch that decides between the two implementations, if someone runs
into a case where it matters.

        Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk