Boost logo

Boost Users :

Subject: Re: [Boost-users] MPICH2 + Boost.MPI Collective Problems
From: Stephan Hackstedt (stephan.hackstedt_at_[hidden])
Date: 2010-08-21 03:46:47


i forget to say, that communicator::barrier is the first operation which is
called when starting a prozess. i do this to synchronize the processes.

Stephan

2010/8/21 Stephan Hackstedt <stephan.hackstedt_at_[hidden]>

> I found out , that when i use a send -> recv communication on the nodes,
> before calling communicator::barrier, it works. maybe calling the point to
> point operatrion before the collective synchronizises both prozesses?
> Now i need a solution for using communicator::barrier as first call.
>
> Stephan
>
> 2010/8/20 Stephan Hackstedt <stephan.hackstedt_at_[hidden]>
>
> Hi there,
>>
>> i have a big problem by running MPI programs which use the Boost.MPI
>> library. When i'm trying to run programs on *more *than one node,
>> collective operations like communicator::barrier<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/communicator.html#id918378-bb>or
>> broadcast,<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/broadcast.html>or even the
>> environment<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/environment.html>destructor (cause of FINALIZE, which is colletive) causing the programm to
>> crash. I got errors like this :
>>
>> *[1]terminate called after throwing an instance of
>> 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
>> >'
>>
>> [1] what(): MPI_Barrier: Other MPI error, error stack:
>> [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed
>> [1]MPIR_Barrier_impl(255)............:
>> [1]MPIR_Barrier_intra(79)............:
>> [1]MPIC_Sendrecv(186)................:
>> [1]MPIC_Wait(534)....................:
>> [1]MPIDI_CH3I_Progress(184)..........:
>> [1]MPID_nem_mpich2_blocking_recv(895):
>> [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0: *
>>
>> I also tested this with the simple broadcast example from the Boost.MPI
>> tutorial - same errors..
>> But when using the original MPI equivalent without the Boost.MPI library,
>> such as MPI_Barrier<http://www.mpi-forum.org/docs/mpi-11-html/node66.html#Node66>,
>> the programm runs well. I am using MPICH2 on Ubuntu 10.04 platforms.
>> Someone had problems like this or know a fix for that?
>>
>> Regards,
>>
>> stephan
>>
>>
>>
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net