Boost logo

Boost :

From: Matthias Troyer (troyer_at_[hidden])
Date: 2006-09-16 05:05:12


On Sep 16, 2006, at 10:22 AM, Markus Blatt wrote:

>
> Just forget about it. I was missing the tags in the collective
> communication where they definitely are none in the MPI
> standard. Probably I should have gotten more sleep. Sorry.

I would actually also love to have tags there :-)
>>
>> I hope these answers address the issues you had in mind. I can
>> elaborate if you want.
>>
>
> The question came up when I looked into mpi/collectives/broadcast.hpp:
>
> // We're sending a type that does not have an associated MPI
> // datatype, so we'll need to serialize it. Unfortunately, this
> // means that we cannot use MPI_Bcast, so we'll just send from the
> // root to everyone else.
> template<typename T>
> void
> broadcast_impl(const communicator& comm, T& value, int root,
> mpl::false_)
>
> If this function gets called the performance will definitely be
> suboptimal as the root will send to all others. It this just the case
> if no MPI_Datatype was constructed (like for the linked list) or is it
> called whenever the boost serialization is used?

OK, I see your concern. This is actually only used when no
MPI_Datatype can be constructed. That is when there no MPI_Datatype
is possible, such as for a linked list, and if you do not use the
skeleton&content mechanism either.

Since this part of the code was written by Doug Gregor, I ask him to
correct me if I say something wrong now or if I miss something. When
no MPI datatype exists then we need to pack the object into a buffer
using MPI_Pack, and the buffer needs to be broadcast. So far we all
seem to agree. The problem now is that the receiving side needs to
know the size of the buffer to allocate enough memory, but there is
no MPI_Probe for collectives that could be used to inquire about the
message size. I believe that this was the reason for implementing the
broadcast as a sequence of nonblocking sends and receives (Doug?).
Thinking about it, I realize that one could instead do two
consecutive broadcasts: one to send the size of the buffer and then
another one to send the buffer. This will definitively be faster on
machines with special hardware for collectives. On Beowulf clusters
on the other hand the current version is faster since most MPI
implementation just perform the broadcast as a sequence of N-1 send/
receive operations from the root instead of optimizing it.

Matthias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk