|
Boost Users : |
From: Alex Cumberworth (alex_at_[hidden])
Date: 2021-06-20 12:12:30
I am trying to run some Monte Carlo simulation code I wrote in C++ with
MPI via the Boost.MPI wrapper. I use only blocking send and receive
calls (i.e. send and recv, never isend or irecv), but after running the
program for a few days, I inevitably end up with the following error
terminate called after throwing an instance of boost::exception_detail::clone_impl >'
what(): MPI_Recv: MPI_ERR_TRUNCATE: message truncated
I have seen that this can happen with non-blocking calls are being made,
but I cannot see how it can happen with only blocking calls. I am
sending and receiving a vector of structs:
struct Chain {
bool operator==(Chain chain_2);
int index;
int identity;
vector positions;
vector orientations;
private:
friend class boost::serialization::access;
template
void serialize(Archive& ar, const unsigned int version) {
ar& index;
ar& identity;
ar& positions;
ar& orientations;
}
};
using Chains = vector;
where VectorThree is declared elsewhere as
class VectorThree {
public:
VectorThree(int x, int y, int z): m_container {{x, y, z}} {};
VectorThree(): m_container {{0, 0, 0}} {};
VectorThree operator-();
VectorThree operator+(const VectorThree& v_2) const;
VectorThree operator-(const VectorThree& v_2) const;
bool operator!=(const VectorThree& v_2) const;
int& operator[](const size_t& i) { return m_container[i]; };
const int& at(const size_t& i) const { return m_container.at(i); };
VectorThree rotate_half(VectorThree axis);
VectorThree rotate(VectorThree origin, VectorThree axis, int turns);
VectorThree rotate(VectorThree axis, int turns);
int sum();
int abssum();
VectorThree absolute();
VectorThree sort();
private:
array m_container;
friend class boost::serialization::access;
template
void serialize(Archive& arch, const unsigned int) {
arch& m_container;
}
};
It sends and receives instances of this object many times before the
error occurs. I have narrowed down the calls where the error occurs by
printing statements before and after both sending and receiving. The
sending code:
Chains chains_send {m_us_sim->get_chains()};
cout << "Win " << m_rank << ": Sending chains (size " << chains_send.size() << ") to " << win_i << "\n";
m_world.send(win_i, swap_i, chains_send);
cout << "Win " << m_rank << ": Sent chains to " << win_i << "\n";
The receiving code:
Chains chains_rec;
cout << "Win " << m_rank << ": Recieving chains from " << win_to_win[0] << "\n";
m_world.recv(win_to_win[0], swap_i, chains_rec);
cout << "Win " << m_rank << ": Recieved chains (size " << chains_rec.size() << ") from " << win_to_win[0] << "\n";
In the output file before the crash I have
Win 0: Recieving chains from 1
Win 1: Sending chains (size 2) to 0
Win 1: Sent chains to 0
I am using version 1.65 of Boost and OpenMPI 2.0.1.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net