Boost logo

Boost Users :

Subject: [Boost-users] poor boost::mpi performance
From: Jonas Juselius (jonas.juselius_at_[hidden])
Date: 2009-10-19 06:26:00


Hi! I'm using the boost::mpi library for a HPC project. I really like
the interface, but I'm currently getting very poor performance from the
library. I started out by serializing my objects (which are full of
pointers and allocated memory, and what not), but that didn't perform at
all, so I went for a more brute force approach instead. No luck there
either.

Essentially what I want to do in a typical case, is to send a set of
indexes (4 integers) followed by an array of doubles. The arrays sizes
are fixed at startup, and have a size between 10-60 kB each. There are
usually many of these arrays, and the total amount of data to be
communicated at the end of a calculation is of the order of 1GB. Here is
my current implementation: 1) Pack the indexes into an array of 4
integers and send (or broadcast) to the receiver(s). The receiver
figures out where to store the next packet based on the indexes (this
takes next to no time). 2) Send the array to the receivers using:

double *data = coefs->data();
world.send(who, tag, data, nCoefs);

where coefs is a pointer to an Eigen2 vector, and data is a pointer to a
contiguous array of doubles.

I'm running the code on a big HPC cluster with individual nodes with 8
cores and 16 GB memory, all connected with Infiniband. Using this setup
I achieve a maximum transfer rate of 66 MB/s doing all to one
communication, which is approx. 10 times less than what I'm supposed to
get. I will not even mention how long a broadcast takes, but suffice to
say that it takes 20-25 times longer than doing the calculation. I get
the same poor performance regardless of whether I'm communicating only
over 127.0.0.1 or over the net. Since out environment is homogeneous, I
have compiled the both the mpi library and my program defining the
BOOST_MPI_HOMOGENEOUS macro. I will try to batch more packages into
larger units, but earlier experiences (with basic MPI) has shown that
with 65 kB arrays, transfer rates of 1 GB/s are possible over our
Infiniband switch. Asynchronous transfer is an option, but that
complicates the load balancing algorithms to a point where I really
don't want to go unless under gun point. Any suggestions?

Best regards,

-jonas-

-- 
________________________________________________________________________
Dr. Jonas Jusélius
Centre for Theoretical and    E-mail    : jonas.juselius_at_[hidden]
Computational Chemistry       Telephone : +47 77644079
Department of Chemistry       Fax       : +47 77644765
University of Tromsø          Mobile ph.: +47 47419869
N-9037 Tromsø, NORWAY         http://jonas.iki.fi
_______________________________________________________________________
[ PGP key    : keyserver or http://jonas.iki.fi/pubkey.asc    ]
[ Fingerprint: 2516 A57A 3012 7962 287D  B66E C1A9 157F 0A59 7A66 ]

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net