Boost logo

Boost Users :

Subject: Re: [Boost-users] [mpi] irecv / send problem
From: Matthias Troyer (troyer_at_[hidden])
Date: 2009-08-05 17:59:29


On Jun 12, 2009, at 1:16 PM, Nick Collier wrote:

> I running into an issue where an irecv followed by a send results in
> deadlock. A simple test case,
>
> class Item {
> private:
>
> friend class boost::serialization::access;
>
> template<class Archive>
> void serialize(Archive& ar, const unsigned int version) {
> ar & val;
> }
>
> public:
> int val;
> Item() : val(1) {
> }
>
> };
>
> struct Receipt {
>
> boost::mpi::request request;
> std::vector<Item> items;
> };
>
> int main(int argc, char **argv) {
>
> mpi::environment env(argc, argv);
> mpi::communicator world;
> Receipt receipt;
>
> vector<Item> msg(100000);
>
> int other = world.rank() == 0 ? 1 : 0;
> cout << world.rank() << " irecv from " << other << endl;
> receipt.request = world.irecv(other, 0, receipt.items);
> cout << world.rank() << " sending to " << other << endl;
> world.send(other, 0, msg);
>
> receipt.request.wait();
>
> cout << "Done" << endl;
> }
>
> Run with mpirun -np 2, this never completes. It does complete with
> vector<Item> msg(10) however.
>
> Nick

Looking at this issue the reason is probably be that for a general
Item type Boost.MPI uses Boost.Serialization to send and receive
serialized data. For that the receiving side has to resize a receive
buffer after receiving the size of the serialized message. Boost.MPI
currently first sends the size of that buffer and then the data in a
second message. The irecv call only posts a receive for the first
(size) message since it cannot receive the buffer yet. The receive for
the buffer is called only in the request.wait() function, which we
never get to because we are still stuck in the send call.

This is an unfortunate design problem of Boost.MPI and there are two
ways around it:

1) use the skeleton/content mechanism or send fixed-size arrays and an
MPI datatype for Item

2) one could change irecv to use a single receive call - but then we
need to give irecv an upper bound for the buffer needed to receive the
serialized data, and the receive will fail if the size was too small.
Would that behavior be preferred?

Matthias


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net