Hi,

I have checked the OpenMPI settings using ompi_info and they seem to be correct. I did a bit more of debugging and found that it seems to be an issue related to multiple network interfaces on the headnode of our cluster. There are three interfaces on it, one ethernet for communication between nodes, second infiniband for the connection between the nodes and the third ethernet for connecting the cluster head node to rest of the network. 
When I pass the IP address of the local cluster networks, it gives me a connection error (because there is no server on those ip machines, but the connection does not time out and says connection refused). But, when I pass the IP address of a machine connected via the third interface, which runs the server, it does a connection time out. If I pass the hostname to the boost::asio::ip::tcp::resolver for the machine on third interface I get the following error:
Host not found (non-authoritative), try again later.
It seems to me that with the mpicxx/mpirun and Boost.Asio there seems to be some sort of confusion/conflict in handling multiple network interfaces, which might be the cause of my problem. 

Kind Regards
Vibhor
 


From: Matthias Troyer <troyer@phys.ethz.ch>
To: boost-users@lists.boost.org
Sent: Mon, 15 February, 2010 2:25:34 PM
Subject: Re: [Boost-users] [Boost.Asio] Compiling Boost.Asio with OpenMPI

Have you checked which compiler, compiler and linker options the mpicxx script uses? Maybe one of those is causing your problems

Matthias

On 15 Feb 2010, at 01:41, vibhor aggarwal wrote:

Hi,

I have tried compiling with mpicxx and no MPI code in it. It compiles fine, but I still have to use mpirun to executed the generated binary and it does a connection time out.

Kind Regards
Vibhor


From: Matthias Troyer <troyer@phys.ethz.ch>
To: boost-users@lists.boost.org
Sent: Mon, 15 February, 2010 12:44:57 AM
Subject: Re: [Boost-users] [Boost.Asio] Compiling Boost.Asio with OpenMPI


On 14 Feb 2010, at 06:33, vibhor aggarwal wrote:

> Hello,
> 
> I am trying to write a program which runs using OpenMPI on our cluster and communicates the results using TCP/IP to another machine which has a network connection to the cluster. I am using Boost.Asio for the TCP/IP connection. If I remove the MPI code from the executable and compile it with g++, it connects to the remote machine without a problem. But when I add a simple MPI code to it and compile with mpicxx and execute using mpirun, it generates a connection time out in the call to socket.connect(). Any idea what might be going wrong ?

Have you tried compiling it with mpicxx without adding any MPI code to it?

Matthias

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


Your Mail works best with the New Yahoo Optimized IE8. Get it NOW!._______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users



Your Mail works best with the New Yahoo Optimized IE8. Get it NOW!.