Boost logo

Boost Users :

Subject: Re: [Boost-users] [MPI, serialization] Segmentation fault in heterogeneous cluster
From: Francesco Biscani (bluescarni_at_[hidden])
Date: 2010-09-07 09:41:19


Just an update, in case anyone is still following this.

It turns out that even when serializing the classes to a text archive,
converting it to string, transmit the string via boost::mpi and then
rebuilding the classes on the other side from the transmitted string,
I still have the same error as reported above for heterogeneous
clusters (in homogeneous clusters it works seemingly ok).

So what I'm doing now is to send the archive in string form using
directly the MPI_* primitives (using a std::vector<char> as buffer and
MPI_CHAR datatype). This works in all configurations I've tested.

I'm not entirely sure if the problem is on my side or if this is a
genuine bug, but I would like to provide any info/testing necessary to
solve this issue.

Thanks again,

  Francesco.

On Fri, Sep 3, 2010 at 6:36 PM, Francesco Biscani <bluescarni_at_[hidden]> wrote:
> Hi Matthias,
>
> probably I'm doing something really stupid, but it seems the problem
> is somehow related to shared_ptr. This code reproduces the "MPI
> message truncated error":
>
> #include <boost/mpi/environment.hpp>
> #include <boost/mpi/communicator.hpp>
> #include <boost/serialization/assume_abstract.hpp>
> #include <boost/serialization/export.hpp>
> #include <boost/serialization/base_object.hpp>
> #include <boost/serialization/shared_ptr.hpp>
> #include <boost/serialization/tracking.hpp>
> #include <boost/serialization/vector.hpp>
> #include <boost/shared_ptr.hpp>
> #include <vector>
>
> struct base
> {
>        virtual void do_something() const = 0;
>        template <class Archive>
>        void serialize(Archive &ar, const unsigned int)
>        {
>                ar & values;
>        }
>        std::vector<double> values;
>        virtual ~base() {}
> };
>
> BOOST_SERIALIZATION_ASSUME_ABSTRACT(base);
>
> struct derived: public base
> {
>        void do_something() const {};
>        template <class Archive>
>        void serialize(Archive &ar, const unsigned int)
>        {
>                ar & boost::serialization::base_object<base>(*this);
>        }
> };
>
> BOOST_CLASS_EXPORT(derived);
>
> struct container
> {
>        template <class Archive>
>        void serialize(Archive &ar, const unsigned int)
>        {
>                ar & ptr;
>        }
>        boost::shared_ptr<base> ptr;
> };
>
>
> int main()
> {
>        boost::mpi::environment env;
>        boost::mpi::communicator world;
>        if (world.rank() == 0) {
>                boost::shared_ptr<container> c(new container());
>                world.send(1,0,c);
>                world.recv(1,0,c);
>        } else {
>                boost::shared_ptr<container> c(new container());
>                world.recv(0,0,c);
>                world.send(0,0,c);
>        }
>        return 0;
> }
>
> The error happens when rank 1 is receiving the object:
>
> terminate called after throwing an instance of
> 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
>>'
>  what():  MPI_Unpack: MPI_ERR_TRUNCATE: message truncated
>
> Thanks,
>
>  Francesco.
>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net