MPICH2 + Boost.MPI Collective Problems

Hi there, i have a big problem by running MPI programs which use the Boost.MPI library. When i'm trying to run programs on *more *than one node, collective operations like communicator::barrier<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/communicator.html#id918378-bb>or broadcast,<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/broadcast.html>or even the environment<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/environment.html>destructor (cause of FINALIZE, which is colletive) causing the programm to crash. I got errors like this : *[1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
'
[1] what(): MPI_Barrier: Other MPI error, error stack: [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed [1]MPIR_Barrier_impl(255)............: [1]MPIR_Barrier_intra(79)............: [1]MPIC_Sendrecv(186)................: [1]MPIC_Wait(534)....................: [1]MPIDI_CH3I_Progress(184)..........: [1]MPID_nem_mpich2_blocking_recv(895): [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0: * I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors.. But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier<http://www.mpi-forum.org/docs/mpi-11-html/node66.html#Node66>, the programm runs well. I am using MPICH2 on Ubuntu 10.04 platforms. Someone had problems like this or know a fix for that? Regards, stephan

I found out , that when i use a send -> recv communication on the nodes, before calling communicator::barrier, it works. maybe calling the point to point operatrion before the collective synchronizises both prozesses? Now i need a solution for using communicator::barrier as first call. Stephan 2010/8/20 Stephan Hackstedt <stephan.hackstedt@googlemail.com>
Hi there,
i have a big problem by running MPI programs which use the Boost.MPI library. When i'm trying to run programs on *more *than one node, collective operations like communicator::barrier<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/communicator.html#id918378-bb>or broadcast,<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/broadcast.html>or even the environment<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/environment.html>destructor (cause of FINALIZE, which is colletive) causing the programm to crash. I got errors like this :
*[1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
'
[1] what(): MPI_Barrier: Other MPI error, error stack: [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed [1]MPIR_Barrier_impl(255)............: [1]MPIR_Barrier_intra(79)............: [1]MPIC_Sendrecv(186)................: [1]MPIC_Wait(534)....................: [1]MPIDI_CH3I_Progress(184)..........: [1]MPID_nem_mpich2_blocking_recv(895): [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0: *
I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors.. But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier<http://www.mpi-forum.org/docs/mpi-11-html/node66.html#Node66>, the programm runs well. I am using MPICH2 on Ubuntu 10.04 platforms. Someone had problems like this or know a fix for that?
Regards,
stephan

i forget to say, that communicator::barrier is the first operation which is called when starting a prozess. i do this to synchronize the processes. Stephan 2010/8/21 Stephan Hackstedt <stephan.hackstedt@googlemail.com>
I found out , that when i use a send -> recv communication on the nodes, before calling communicator::barrier, it works. maybe calling the point to point operatrion before the collective synchronizises both prozesses? Now i need a solution for using communicator::barrier as first call.
Stephan
2010/8/20 Stephan Hackstedt <stephan.hackstedt@googlemail.com>
Hi there,
i have a big problem by running MPI programs which use the Boost.MPI library. When i'm trying to run programs on *more *than one node, collective operations like communicator::barrier<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/communicator.html#id918378-bb>or broadcast,<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/broadcast.html>or even the environment<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/environment.html>destructor (cause of FINALIZE, which is colletive) causing the programm to crash. I got errors like this :
*[1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
'
[1] what(): MPI_Barrier: Other MPI error, error stack: [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed [1]MPIR_Barrier_impl(255)............: [1]MPIR_Barrier_intra(79)............: [1]MPIC_Sendrecv(186)................: [1]MPIC_Wait(534)....................: [1]MPIDI_CH3I_Progress(184)..........: [1]MPID_nem_mpich2_blocking_recv(895): [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0: *
I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors.. But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier<http://www.mpi-forum.org/docs/mpi-11-html/node66.html#Node66>, the programm runs well. I am using MPICH2 on Ubuntu 10.04 platforms. Someone had problems like this or know a fix for that?
Regards,
stephan
participants (1)
-
Stephan Hackstedt