i forget to say, that communicator::barrier is the first operation which is called when starting a prozess. i do this to synchronize the processes.

Stephan

2010/8/21 Stephan Hackstedt <stephan.hackstedt@googlemail.com>
I found out , that when i use a send -> recv communication on the nodes, before calling communicator::barrier, it works. maybe calling the point to point operatrion before the collective synchronizises both prozesses?
Now i need a solution for using communicator::barrier as first call.

Stephan

2010/8/20 Stephan Hackstedt <stephan.hackstedt@googlemail.com>

Hi there,

i have a big problem by running MPI programs which use the Boost.MPI library. When i'm trying to run programs on more than one node, collective operations like communicator::barrier or broadcast, or even the environment destructor (cause of FINALIZE, which is colletive) causing the programm to crash. I got errors like this :

[1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >'

[1]  what():  MPI_Barrier: Other MPI error, error stack:
[1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed
[1]MPIR_Barrier_impl(255)............:
[1]MPIR_Barrier_intra(79)............:
[1]MPIC_Sendrecv(186)................:
[1]MPIC_Wait(534)....................:
[1]MPIDI_CH3I_Progress(184)..........:
[1]MPID_nem_mpich2_blocking_recv(895):
[1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:


I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors..
But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier, the programm runs well. I am using MPICH2 on Ubuntu 10.04 platforms.
Someone had problems like this or know a fix for that?

Regards,

stephan