Boost logo

Boost-MPI :

Subject: [Boost-mpi] Is there a way to make a master process to ignore terminations of slave processes?
From: Izhar Wallach (izhar.wallach_at_[hidden])
Date: 2012-12-12 13:42:33


I have a MPI program that creates multiple slaves which send data to a
master process. The master just holds an array of requests and whenever a
slave is ready, it receives the data the slave had sent it and sends back
the slave another request. The jobs of the salves are independent of each
other and of the master such that if a slave dies, or for any other reason,
stops sending data to the master, the execution of the program should not
be affected. This should be useful when running a long job on a large
number of nodes where a single failure is more likely to occur.

I was wondering if there is a way to configure the MPI to ignore process
failures. Right now, if I manually kill one of the slave processes all the
other processes terminate as well. In other words, if I have 2 slaves and a
master, and one of the slave processes dies, I would like the remining
master and slave processes to keep running.

I'm using Boost-MPI version 1.52 and mpich2 version 1.5.


Boost-Commit list run by troyer at