|
Boost Users : |
Subject: [Boost-users] [MPI] request::test() crashes or returns garbage
From: Patrik Jonsson (patrik-web_at_[hidden])
Date: 2011-11-16 18:09:06
Hi,
I'm having some problem with nonblocking p2p communications. This is
with boost 1.47 and openmpi-1.5.3 compiled with icpc 12.0.3. Here's an
example that exhibits the problem:
#include <boost/mpi.hpp>
#include <iostream>
#include <vector>
using namespace boost::mpi;
using namespace std;
int main(int argc, char* argv[])
{
environment env(argc, argv);
communicator world;
vector<request> handshake_reqs_;
for(int i=0; i<world.size(); ++i) {
if(i!=world.rank()) {
printf("Task %d posting recv from %d\n",world.rank(), i);
handshake_reqs_.push_back(world.irecv(i, 13));
printf("Task %d sending to %d\n",world.rank(), i);
world.isend(i, 13);
}
}
for(int i=0; i<handshake_reqs_.size(); ++i) {
boost::optional<status> s=handshake_reqs_[i].test();
if(s.is_initialized()) {
const int source_task = s.get().source();
const int tag = s.get().tag();
printf("Task %d received message tag %d from task
%d\n",world.rank(), tag, source_task);cout.flush();
}
}
}
So, essentially, all tasks send a message with tag 13 to all other
tasks, who have posted nonblocking receives for such a message. The
output from this program is something like:
[pjonsson_at_sunrise03 ~]$ mpirun -np 3 ./a.outTask 0 posting recv from 1
Task 0 sending to 1
Task 1 posting recv from 0
Task 1 sending to 0
Task 1 posting recv from 2
Task 0 posting recv from 2
Task 0 sending to 2
Task 1 sending to 2
Task 1 received message tag 0 from task 0
[sunrise03:06504] *** Process received signal ***
[sunrise03:06504] Signal: Segmentation fault (11)
[sunrise03:06504] Signal code: Address not mapped (1)
[sunrise03:06504] Failing at address: 0x100000037
Task 2 posting recv from 0
Task 2 sending to 0
Task 2 posting recv from 1
Task 0 received message tag 1 from task 0
[sunrise03:06504] [ 0] /lib64/libpthread.so.0 [0x3d6c20eb10]
[sunrise03:06504] [ 1]
/n/home00/pjonsson/lib/libboost_mpi.so.1.47.0(_ZN5boost3mpi7request4testEv+0xc)
[0x2b2fe9863a3c]
[sunrise03:06504] [ 2] ./a.out(main+0x35d) [0x40ab37]
[sunrise03:06504] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3d6b61d994]
[sunrise03:06504] [ 4] ./a.out(_ZNSt8ios_base4InitD1Ev+0x41) [0x407039]
[sunrise03:06504] *** End of error message ***
[sunrise03:06503] *** Process received signal ***
[sunrise03:06503] Signal: Segmentation fault (11)
[sunrise03:06503] Signal code: Address not mapped (1)
[sunrise03:06503] Failing at address: 0x100000037
[sunrise03:06503] [ 0] /lib64/libpthread.so.0 [0x3d6c20eb10]
[sunrise03:06503] [ 1]
/n/home00/pjonsson/lib/libboost_mpi.so.1.47.0(_ZN5boost3mpi7request4testEv+0xc)
[0x2b44a0822a3c]
[sunrise03:06503] [ 2] ./a.out(main+0x35d) [0x40ab37]
[sunrise03:06503] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3d6b61d994]
[sunrise03:06503] [ 4] ./a.out(_ZNSt8ios_base4InitD1Ev+0x41) [0x407039]
[sunrise03:06503] *** End of error message ***
[sunrise03:06505] *** Process received signal ***
[sunrise03:06505] Signal: Segmentation fault (11)
[sunrise03:06505] Signal code: Address not mapped (1)
[sunrise03:06505] Failing at address: 0x100000037
Task 2 sending to 1
Task 2 received message tag 0 from task 0
[sunrise03:06505] [ 0] /lib64/libpthread.so.0 [0x3d6c20eb10]
[sunrise03:06505] [ 1]
/n/home00/pjonsson/lib/libboost_mpi.so.1.47.0(_ZN5boost3mpi7request4testEv+0xc)
[0x2b7c652b4a3c]
[sunrise03:06505] [ 2] ./a.out(main+0x35d) [0x40ab37]
[sunrise03:06505] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3d6b61d994]
[sunrise03:06505] [ 4] ./a.out(_ZNSt8ios_base4InitD1Ev+0x41) [0x407039]
[sunrise03:06505] *** End of error message ***
Sometimes it doesn't crash but just returns garbage for the
source/tag, like all messages have source 0 and tag value that is the
source task. Using test_some gives similar results.
If I write the same using the C API, it works correctly:
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
vector<MPI_Request> handshake_reqs_;
for(int i=0; i<size; ++i) {
if(i!=rank) {
handshake_reqs_.push_back(MPI_Request());
MPI_Irecv(0,0, MPI_INT, i, 13, MPI_COMM_WORLD, &handshake_reqs_.back());
printf("Task %d sending to %d\n",rank, i);
MPI_Request r;
MPI_Isend(0,0, MPI_INT, i, 13, MPI_COMM_WORLD, &r);
}
}
while(true) {
for(int i=0; i<handshake_reqs_.size(); ++i) {
int complete;
MPI_Status s;
MPI_Test(&handshake_reqs_[i], &complete, &s);
if(complete) {
const int source_task = s.MPI_SOURCE;
const int tag = s.MPI_TAG;
printf("Task %d received message tag %d from task %d\n",rank, tag,
source_task);cout.flush();
MPI_Irecv(0,0, MPI_INT, i, 13, MPI_COMM_WORLD, &handshake_reqs_[i]);
}
}
}
}
returns:
[pjonsson_at_sunrise03 ~]$ mpirun -np 3 ./a.out
Task 2 sending to 0
Task 2 sending to 1
Task 0 sending to 1
Task 0 sending to 2
Task 0 received message tag 13 from task 2
Task 2 received message tag 13 from task 0
Task 1 sending to 0
Task 0 received message tag 13 from task 1
Task 1 sending to 2
Task 2 received message tag 13 from task 1
Task 1 received message tag 13 from task 0
Task 1 received message tag 13 from task 2
Any ideas what might be going wrong?
Regards,
/Patrik
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net