Boost logo

Boost-MPI :

Subject: Re: [Boost-mpi] deadlock (kindof) in nonbloking_test with wait_any on Intel MPI
From: Alain Miniussi (alain.miniussi_at_[hidden])
Date: 2014-09-10 07:49:19


I have 3 pull requests related with those issues.
The mpi test do not hang anymore with Intel's MPI with those.

Thanks,

Alain

On 08/09/2014 18:14, Alain Miniussi wrote:
>
> Hi,
>
> I have an issue with that test that goes into an infinite loop.
> I am using intel MPI 4.1.3 on a linux box.
>
> I did run nonblocking_test.cpp (the regression test of the boost
> distribution (develop branch) on a single proc, and following the
> debugger I get the following issue:
>
> line 85:
> // send a one elt list, that generates 2 MPI_isend, one for the
> archive's size, one for the archive.
> // Now, with intel's MPI, it seems that the second one won't be sent
> untill the first one is received.
> // Good or not, I suspect this behavior is legal
> S: reqs.push_back(comm.isend((comm.rank() + 1) % comm.size(), 0,
> values[0]));
> // Receive one element list, that only generate ONE MPI_irecv, for the
> size. A handle is set
> // on the request object to retrieve the second message. The second
> request is set to null.
> R: reqs.push_back(comm.irecv((comm.rank() + comm.size() - 1) %
> comm.size(),
>
> later on:
> // reqs[0] contains 2 MPI_request object, only one is complete
> // reqs[1] contains only one MPI_request objet, and a handle
> if (wait_any(reqs.begin(), reqs.end()).second == reqs.begin())
> reqs[1].wait();
> else
> reqs[0].wait();
>
> Let's look at wait any:
> if (current->m_requests[0] != MPI_REQUEST_NULL &&
> current->m_requests[1] != MPI_REQUEST_NULL)
> if (optional<status> result = current->test())
> return std::make_pair(*result, current);
> // A: Only the first request will call test, since
> current->m_requests[1] == MPI_REQUEST_NULL for the recv request
> // B: For the first one, current->test() will basically call
> MPI_Waitall, which will fail since the second MPI_Request seems to
> wait for the first one to be consumed, which won't happen, since A.
>
> But we have non trivial requests, so:
> // There are some nontrivial requests, so we must continue our
> // busy waiting loop.
> n = 0;
> current = first;
> And so we get a dead lock.
>
> Does anyone has any idea on how to fix this ?
> I suspect the:
> if (current->m_requests[0] != MPI_REQUEST_NULL &&
> current->m_requests[1] != MPI_REQUEST_NULL)
> if (optional<status> result = current->test())
> return std::make_pair(*result, current);
>
> should be more subtle and allow current->m_requests[1] ==
> MPI_REQUEST_NULL when current->m_handle is not null....
>
> Regards
>
> Alain
>
> _______________________________________________
> Boost-mpi mailing list
> Boost-mpi_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-mpi

-- 
---
Alain

Boost-Commit list run by troyer at boostpro.com