|
Boost-MPI : |
Subject: [Boost-mpi] deadlock (kindof) in nonbloking_test with wait_any on Intel MPI
From: Alain Miniussi (alain.miniussi_at_[hidden])
Date: 2014-09-08 12:14:55
Hi,
I have an issue with that test that goes into an infinite loop.
I am using intel MPI 4.1.3 on a linux box.
I did run nonblocking_test.cpp (the regression test of the boost
distribution (develop branch) on a single proc, and following the
debugger I get the following issue:
line 85:
// send a one elt list, that generates 2 MPI_isend, one for the
archive's size, one for the archive.
// Now, with intel's MPI, it seems that the second one won't be sent
untill the first one is received.
// Good or not, I suspect this behavior is legal
S: reqs.push_back(comm.isend((comm.rank() + 1) % comm.size(), 0,
values[0]));
// Receive one element list, that only generate ONE MPI_irecv, for the
size. A handle is set
// on the request object to retrieve the second message. The second
request is set to null.
R: reqs.push_back(comm.irecv((comm.rank() + comm.size() - 1) %
comm.size(),
later on:
// reqs[0] contains 2 MPI_request object, only one is complete
// reqs[1] contains only one MPI_request objet, and a handle
if (wait_any(reqs.begin(), reqs.end()).second == reqs.begin())
reqs[1].wait();
else
reqs[0].wait();
Let's look at wait any:
if (current->m_requests[0] != MPI_REQUEST_NULL &&
current->m_requests[1] != MPI_REQUEST_NULL)
if (optional<status> result = current->test())
return std::make_pair(*result, current);
// A: Only the first request will call test, since
current->m_requests[1] == MPI_REQUEST_NULL for the recv request
// B: For the first one, current->test() will basically call
MPI_Waitall, which will fail since the second MPI_Request seems to wait
for the first one to be consumed, which won't happen, since A.
But we have non trivial requests, so:
// There are some nontrivial requests, so we must continue our
// busy waiting loop.
n = 0;
current = first;
And so we get a dead lock.
Does anyone has any idea on how to fix this ?
I suspect the:
if (current->m_requests[0] != MPI_REQUEST_NULL &&
current->m_requests[1] != MPI_REQUEST_NULL)
if (optional<status> result = current->test())
return std::make_pair(*result, current);
should be more subtle and allow current->m_requests[1] ==
MPI_REQUEST_NULL when current->m_handle is not null....
Regards
Alain
Boost-Commit list run by troyer at boostpro.com