Hi,
I have checked the OpenMPI settings using ompi_info and they seem to be correct. I did a bit more of debugging and found that it seems to be an issue related to multiple network interfaces on the headnode of our cluster. There are three interfaces on it, one ethernet for communication between nodes, second infiniband for the connection between the nodes and the third ethernet for connecting the cluster head node to rest of the network.
When I pass the IP address of the local cluster networks, it gives me a connection error (because there is no server on those ip machines, but the connection does not time out and says connection refused). But, when I pass the IP address of a machine connected via the third interface, which runs the server, it
does a connection time out. If I pass the hostname to the boost::asio::ip::tcp::resolver for the machine on third interface I get the following error:
Host not found (non-authoritative), try again later.
It seems to me that with the mpicxx/mpirun and Boost.Asio there seems to be some sort of confusion/conflict in handling multiple network interfaces, which might be the cause of my problem.
Kind Regards
Vibhor
From: Matthias Troyer <troyer@phys.ethz.ch>
To: boost-users@lists.boost.org
Sent: Mon, 15 February, 2010 2:25:34
PM
Subject: Re: [Boost-users] [Boost.Asio] Compiling Boost.Asio with OpenMPI
Have you checked which compiler, compiler and linker options the mpicxx script uses? Maybe one of those is causing your problems
Matthias
On 15 Feb 2010, at 01:41, vibhor aggarwal wrote:
Hi,
I have tried compiling with mpicxx and no MPI code in it. It compiles fine, but I still have to use mpirun to executed the generated binary and it does a connection time out.
Kind Regards
Vibhor
From: Matthias Troyer <troyer@phys.ethz.ch>
To: boost-users@lists.boost.org
Sent: Mon, 15 February, 2010 12:44:57 AM
Subject: Re: [Boost-users] [Boost.Asio] Compiling Boost.Asio with OpenMPI
On 14 Feb 2010, at 06:33, vibhor aggarwal
wrote:
> Hello,
>
> I am trying to write a program which runs using OpenMPI on our cluster and communicates the results using TCP/IP to another machine which has a network connection to the cluster. I am using Boost.Asio for the TCP/IP connection. If I remove the MPI code from the executable and compile it with g++, it connects to the remote machine without a problem. But when I add a simple MPI code to it and compile with mpicxx and execute using mpirun, it generates a connection time out in the call to socket.connect(). Any idea what might be going wrong ?
Have you tried compiling it with mpicxx without adding any MPI code to it?
Matthias
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.orghttp://lists.boost.org/mailman/listinfo.cgi/boost-users
Your Mail works best with the New Yahoo Optimized IE8.
Get it NOW!._______________________________________________
Boost-users mailing list
Boost-users@lists.boost.orghttp://lists.boost.org/mailman/listinfo.cgi/boost-users