Boost logo

Boost Users :

From: Alex Cumberworth (alex_at_[hidden])
Date: 2021-06-20 12:12:30


I am trying to run some Monte Carlo simulation code I wrote in C++ with
MPI via the Boost.MPI wrapper. I use only blocking send and receive
calls (i.e. send and recv, never isend or irecv), but after running the
program for a few days, I inevitably end up with the following error

terminate called after throwing an instance of boost::exception_detail::clone_impl >'
   what(): MPI_Recv: MPI_ERR_TRUNCATE: message truncated

I have seen that this can happen with non-blocking calls are being made,
but I cannot see how it can happen with only blocking calls. I am
sending and receiving a vector of structs:

struct Chain {
     bool operator==(Chain chain_2);
     int index;
     int identity;
     vector positions;
     vector orientations;

   private:
     friend class boost::serialization::access;
     template
     void serialize(Archive& ar, const unsigned int version) {
         ar& index;
         ar& identity;
         ar& positions;
         ar& orientations;
     }
};
using Chains = vector;

where VectorThree is declared elsewhere as

class VectorThree {
   public:
     VectorThree(int x, int y, int z): m_container {{x, y, z}} {};
     VectorThree(): m_container {{0, 0, 0}} {};

     VectorThree operator-();

     VectorThree operator+(const VectorThree& v_2) const;
     VectorThree operator-(const VectorThree& v_2) const;
     bool operator!=(const VectorThree& v_2) const;

     int& operator[](const size_t& i) { return m_container[i]; };
     const int& at(const size_t& i) const { return m_container.at(i); };

     VectorThree rotate_half(VectorThree axis);
     VectorThree rotate(VectorThree origin, VectorThree axis, int turns);
     VectorThree rotate(VectorThree axis, int turns);
     int sum();
     int abssum();
     VectorThree absolute();
     VectorThree sort();

   private:
     array m_container;
     friend class boost::serialization::access;
     template
     void serialize(Archive& arch, const unsigned int) {
         arch& m_container;
     }
};

It sends and receives instances of this object many times before the
error occurs. I have narrowed down the calls where the error occurs by
printing statements before and after both sending and receiving. The
sending code:

Chains chains_send {m_us_sim->get_chains()};
cout << "Win " << m_rank << ": Sending chains (size " << chains_send.size() << ") to " << win_i << "\n";
m_world.send(win_i, swap_i, chains_send);
cout << "Win " << m_rank << ": Sent chains to " << win_i << "\n";

The receiving code:

Chains chains_rec;
cout << "Win " << m_rank << ": Recieving chains from " << win_to_win[0] << "\n";
m_world.recv(win_to_win[0], swap_i, chains_rec);
cout << "Win " << m_rank << ": Recieved chains (size " << chains_rec.size() << ") from " << win_to_win[0] << "\n";

In the output file before the crash I have

Win 0: Recieving chains from 1
Win 1: Sending chains (size 2) to 0
Win 1: Sent chains to 0

I am using version 1.65 of Boost and OpenMPI 2.0.1.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net