Boost Users :
Subject: Re: [Boost-users] Performance optimization in Boost using std::vector<>
From: Ilja Honkonen (ilja.honkonen_at_[hidden])
Date: 2015-02-12 10:51:59
> There is a known performance problem with serializing a std::vector
> over MPI.
> Basically, this prevents you from ever reaching the performance of C.
> The problem is on the receive side. When you receive a vector, if you
> don't know the size,
> the receive side has to:
> - get the number of elements of the vector
> - resize the vector (which initializes elements)
> - receive the elements in the vector data (reinitialize the elements)
> The C version of the idiom:
> - gets the number of elements
> - reserves (as opposed to resize) the memory for the elements
> - receive the element in the vector (initialize elements once).
> This might make a small or a large performance difference, profile!
According to the attached program there seems to be a much larger
performance problem than initializing vector elements. The program first
sends a vector of doubles using MPI, then sends another identical vector
with boost::mpi and prints how long these took in seconds. Note that
boost::mpi also sends two messages for run-time sized containers. For
vectors of 1e6 items the program prints (mpi rank is the first number):
0 resize: 0.0126891, send: 0.00988925, recv: 0
1 resize: 0.0131643, send: 0, recv: 0.00955247
0 resize: 0.0096425, send: 0.279135, recv: 0
1 resize: 0, send: 0, recv: 0.295702
For vectors of 1e7 items:
0 resize: 0.0974027, send: 0.0538886, recv: 0
1 resize: 0.105708, send: 0, recv: 0.0456324
0 resize: 0.0517177, send: 2.70333, recv: 0
1 resize: 0, send: 0, recv: 2.82339
And vectors of 5e7 items:
0 resize: 0.590099, send: 0.226269, recv: 0
1 resize: 0.440719, send: 0, recv: 0.375706
0 resize: 0.198448, send: 13.5335, recv: 0
1 resize: 0, send: 0, recv: 14.0518
Boost::mpi version is always at least 10 times slower. It also seems to
run out of memory with smaller number of items implying that unnecessary
copies of data are created somewhere. Based on experience with more
complex programs (e.g. http://dx.doi.org/10.1016/j.jastp.2014.08.012) I
wouldn't recommend boost::mpi for high performance computing. Or in case
of user error at least high performance is easier to get with pure MPI...
I used boost-1.57.0, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) and
mpirun (Open MPI) 1.6.5.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net