|
Boost Users : |
Subject: Re: [Boost-users] poor boost::mpi performance
From: Matthias Troyer (troyer_at_[hidden])
Date: 2009-10-19 21:26:16
On 19 Oct 2009, at 18:26, Jonas Juselius wrote:
> Hi! I'm using the boost::mpi library for a HPC project. I really
> like the interface, but I'm currently getting very poor performance
> from the library. I started out by serializing my objects (which are
> full of pointers and allocated memory, and what not), but that
> didn't perform at all, so I went for a more brute force approach
> instead. No luck there either.
>
> Essentially what I want to do in a typical case, is to send a set of
> indexes (4 integers) followed by an array of doubles. The arrays
> sizes are fixed at startup, and have a size between 10-60 kB each.
> There are usually many of these arrays, and the total amount of data
> to be communicated at the end of a calculation is of the order of
> 1GB. Here is my current implementation: 1) Pack the indexes into an
> array of 4 integers and send (or broadcast) to the receiver(s). The
> receiver figures out where to store the next packet based on the
> indexes (this takes next to no time). 2) Send the array to the
> receivers using:
>
> double *data = coefs->data();
> world.send(who, tag, data, nCoefs);
>
> where coefs is a pointer to an Eigen2 vector, and data is a pointer
> to a contiguous array of doubles.
>
> I'm running the code on a big HPC cluster with individual nodes with
> 8 cores and 16 GB memory, all connected with Infiniband. Using this
> setup I achieve a maximum transfer rate of 66 MB/s doing all to one
> communication, which is approx. 10 times less than what I'm supposed
> to get. I will not even mention how long a broadcast takes, but
> suffice to say that it takes 20-25 times longer than doing the
> calculation. I get the same poor performance regardless of whether
> I'm communicating only over 127.0.0.1 or over the net. Since out
> environment is homogeneous, I have compiled the both the mpi library
> and my program defining the BOOST_MPI_HOMOGENEOUS macro. I will try
> to batch more packages into larger units, but earlier experiences
> (with basic MPI) has shown that with 65 kB arrays, transfer rates of
> 1 GB/s are possible over our Infiniband switch. Asynchronous
> transfer is an option, but that complicates the load balancing
> algorithms to a point where I really don't want to go unless under
> gun point. Any suggestions?
Dear Jonas,
I am surprised by that slow performance. It should actually by the
same as doing an MPI_Send and passing it the same pointer and lengths
and MPI_DOUBLE as data type. Could you please try this instead and let
me know if it speeds up things?
Matthias
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net