I found out , that when i use a send -> recv communication on the nodes, before calling communicator::barrier, it works. maybe calling the point to point operatrion before the collective synchronizises both prozesses?
Now i need a solution for using communicator::barrier as first call.
Stephan
Hi there,
i have a big problem by running MPI programs which use the Boost.MPI library. When i'm trying to run programs on more than one node, collective operations likecommunicator::barrier or
broadcast, or even the
environment
destructor (cause of FINALIZE, which is colletive) causing the programm to crash. I got errors like this :
[1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >'
[1] what(): MPI_Barrier: Other MPI error, error stack:
[1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed
[1]MPIR_Barrier_impl(255)............:
[1]MPIR_Barrier_intra(79)............:
[1]MPIC_Sendrecv(186)................:
[1]MPIC_Wait(534)....................:
[1]MPIDI_CH3I_Progress(184)..........:
[1]MPID_nem_mpich2_blocking_recv(895):
[1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:
I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors..
But when using the original MPI equivalent without the Boost.MPI library, such asMPI_Barrier
, the programm runs well. I am using MPICH2 on Ubuntu 10.04 platforms.
Someone had problems like this or know a fix for that?
Regards,
stephan