[mpi] broadcast performance

Hi all - I need some advice for using the broadcast function of boost::mpi. We have a large buffer (sometimes gigabytes) that we need to get to all child nodes. We currently use boost::serialization with a binary archive to write the data into a std::vector<char>. Then we send that data across, and deserialize. The data is sent with MPI_Bcast. I've started testing with similar functionality using boost::mpi::broadcast to handle serialization and deserialization. Tracing through the code, it seems that the data is sent to all the child nodes via isend. Is there something I can do to ensure that Bcast will be used instead? If I only have a couple of nodes, the former is fine, but with more nodes, the MPI implementation of Bcast may do a better job (logarithmic or even constant time with the number of nodes). What are the suggestions for getting fast broadcast in this case? I don't think that using skeletons will help, since each instance of the broadcast will have unique data with potentially different layouts. Thanks, Brian

Hi Brian, If i understood correctly, you're actually doing something like: std::vector<char> gigaVec; MPI_Bcast(blah, blah, ..., &gigaVec[0]) and want to replace that by boost::mpi::broadcast, is that correct? Just do it the same way, if the type of the container is a MPI type, you're guaranteed that the underlying MPI implementation will be called. Regards, Júlio.

Okay. I can do that. I was just wondering if there was a trick to make it happen under the hood. I'm curious as to why Bcast doesn't get called by boost::mpi::broadcast for non-trivial types. Thanks, Brian On Wed, Sep 5, 2012 at 7:00 PM, Júlio Hoffimann <julio.hoffimann@gmail.com> wrote:
Hi Brian,
If i understood correctly, you're actually doing something like:
std::vector<char> gigaVec; MPI_Bcast(blah, blah, ..., &gigaVec[0])
and want to replace that by boost::mpi::broadcast, is that correct?
Just do it the same way, if the type of the container is a MPI type, you're guaranteed that the underlying MPI implementation will be called.
Regards, Júlio.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Brian, You can think of Boost.MPI as a very well-designed wrapper. All it does is to call the underlying C (OpenMPI, MPICH, others) implementation when the types are covered by the MPI standard. On other hand, i agree with you, maybe would be possible to specialize a template for std::vector<T> that handles it as a raw buffer. Someone has an opinion about this? When i have time, i'll think carefully to see if i can contribute a patch. Regards, Júlio. 2012/9/6 Brian Budge <brian.budge@gmail.com>
Okay. I can do that. I was just wondering if there was a trick to make it happen under the hood. I'm curious as to why Bcast doesn't get called by boost::mpi::broadcast for non-trivial types.
Thanks, Brian
On Wed, Sep 5, 2012 at 7:00 PM, Júlio Hoffimann <julio.hoffimann@gmail.com> wrote:
Hi Brian,
If i understood correctly, you're actually doing something like:
std::vector<char> gigaVec; MPI_Bcast(blah, blah, ..., &gigaVec[0])
and want to replace that by boost::mpi::broadcast, is that correct?
Just do it the same way, if the type of the container is a MPI type, you're guaranteed that the underlying MPI implementation will be called.
Regards, Júlio.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Hi Julio - I may be completely wrong, but I was under the understanding that when a send call happens, serialization magic occurs that builds an MPI_Datatype, and that by then handing the data into MPI_Send etc... we avoid an extra copy? But perhaps that won't work in my case. I doubt that MPI_Recv is capable of building a complex hierarchy back up including pointers, using operator new etc... Perhaps you have to have a fully instantiated object of the same kind in order to use this functionality with MPI_Recv? I have a virtual message hierarchy, and the messages (or shared_ptrs of messages) perform virtual dispatch upon being recv'd. Is there anything performance-wise to be gained by using boost::mpi for send/recv/broadcast? Or is the MPI_Datatype performance gain only applicable to classes that have (perhaps complex, but) concrete layout with object instantiation on the stack? It seems that if I can't get the MPI_Datatype benefit for my types, I may be better off maintaining my own buffers for serialization, so I can potentially lower the number of memory allocations. Thanks, Brian On Thu, Sep 6, 2012 at 3:29 AM, Júlio Hoffimann <julio.hoffimann@gmail.com> wrote:
Brian,
You can think of Boost.MPI as a very well-designed wrapper. All it does is to call the underlying C (OpenMPI, MPICH, others) implementation when the types are covered by the MPI standard.
On other hand, i agree with you, maybe would be possible to specialize a template for std::vector<T> that handles it as a raw buffer. Someone has an opinion about this?
When i have time, i'll think carefully to see if i can contribute a patch.
Regards, Júlio.
2012/9/6 Brian Budge <brian.budge@gmail.com>
Okay. I can do that. I was just wondering if there was a trick to make it happen under the hood. I'm curious as to why Bcast doesn't get called by boost::mpi::broadcast for non-trivial types.
Thanks, Brian
On Wed, Sep 5, 2012 at 7:00 PM, Júlio Hoffimann <julio.hoffimann@gmail.com> wrote:
Hi Brian,
If i understood correctly, you're actually doing something like:
std::vector<char> gigaVec; MPI_Bcast(blah, blah, ..., &gigaVec[0])
and want to replace that by boost::mpi::broadcast, is that correct?
Just do it the same way, if the type of the container is a MPI type, you're guaranteed that the underlying MPI implementation will be called.
Regards, Júlio.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Hi Brian, I don't remember the details, but what you said is completely right, when we pass an object to any of the Boost.MPI methods, it can be either of MPI type, in what case it's properly forwarded to the C implementation, or it can be of a serializable type when the magic happens. At the other end of the wire, Boost.MPI will magically deserialize the object and you have no additional work. You keep working on high-level C++. The main bottleneck here is the act of serialize/deserialize repeatedly. As you already know, Boost.MPI solved this problem for some cases (with fixed layout): the skeleton and content approach. When the approach is not applicable, you have to live with C raw buffers doing the &vec[0] trick. I'll take a better look to see if specializing the template with that trick is safe and covered by the C++ standard. You're also free to investigate and produce patches. :-) Regards, Júlio.

The only idea I had was potentially to use MPI_Hindexed and MPI_Address to create the full memory layout, and then go through the data calling placement new, etc... Given the lack of documentation around how to actually do this with MPI, I can't really think of anything better than what is currently happening inside boost::mpi. If I need the better performance, I will have to uglify my code :) Thanks, Brian On Thu, Sep 6, 2012 at 10:49 AM, Júlio Hoffimann <julio.hoffimann@gmail.com> wrote:
Hi Brian,
I don't remember the details, but what you said is completely right, when we pass an object to any of the Boost.MPI methods, it can be either of MPI type, in what case it's properly forwarded to the C implementation, or it can be of a serializable type when the magic happens.
At the other end of the wire, Boost.MPI will magically deserialize the object and you have no additional work. You keep working on high-level C++.
The main bottleneck here is the act of serialize/deserialize repeatedly. As you already know, Boost.MPI solved this problem for some cases (with fixed layout): the skeleton and content approach.
When the approach is not applicable, you have to live with C raw buffers doing the &vec[0] trick. I'll take a better look to see if specializing the template with that trick is safe and covered by the C++ standard.
You're also free to investigate and produce patches. :-)
Regards, Júlio.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Brian Budge
-
Júlio Hoffimann