Boost Users :

Date view	Thread view	Subject view	Author view

Subject: [Boost-users] [Boost.MPI] segfault while accessing irecv request
From: JiÅ™Ã VyskoÄil (svzj_at_[hidden])
Date: 2013-10-15 04:43:57

Next message: Olivier Tristan: "Re: [Boost-users] NT2 and Boost.SIMD beta 2 available"
Previous message: Olivier Tristan: "Re: [Boost-users] NT2 and Boost.SIMD beta 2 available"

I need to implement non-blocking communication in my application. I am
getting segfault when calling wait() on my irecv requests:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7903a69 in
boost::archive::detail::basic_iarchive_impl::load_preamble
(this=0x110b590, ar=..., co=...) at
libs/serialization/src/basic_iarchive.cpp:319
(gdb) bt
#0 0x00007ffff7903a69 in
boost::archive::detail::basic_iarchive_impl::load_preamble
(this=0x110b590, ar=..., co=...) at
libs/serialization/src/basic_iarchive.cpp:319
#1 0x00007ffff7904c52 in
boost::archive::detail::basic_iarchive_impl::load_pointer
(this=0x110b590, ar=..., t=@0x7fffffffd348: 0x2, bpis_ptr=0x0,
bpis_ptr_at_entry=0x950a50
<boost::serialization::singleton<boost::archive::detail::pointer_iserializer<boost::mpi::packed_iarchive,
std::vector<double, std::allocator<double> > > >::get_instance()::t>,
finder=finder_at_entry=0x44d5c0
<boost::archive::detail::load_pointer_type<boost::mpi::packed_iarchive>::find(boost::serialization::extended_type_info
const&)>) at libs/serialization/src/basic_iarchive.cpp:446
#2 0x00007ffff79042e8 in
boost::archive::detail::basic_iarchive::load_pointer
(this=this_at_entry=0x110b3b0, t=@0x7fffffffd348: 0x2,
bpis_ptr=bpis_ptr_at_entry=0x950a50
<boost::serialization::singleton<boost::archive::detail::pointer_iserializer<boost::mpi::packed_iarchive,
std::vector<double, std::allocator<double> > > >::get_instance()::t>,
finder=finder_at_entry=0x44d5c0
<boost::archive::detail::load_pointer_type<boost::mpi::packed_iarchive>::find(boost::serialization::extended_type_info
const&)>) at libs/serialization/src/basic_iarchive.cpp:550
#3 0x0000000000451f72 in
boost::archive::detail::load_pointer_type<boost::mpi::packed_iarchive>::invoke<std::vector<double,
std::allocator<double> >*> (ar=..., t=@0x7fffffffd348: 0x2) at
/usr/include/boost/archive/detail/iserializer.hpp:524
#4 0x00000000004521e8 in load<boost::mpi::packed_iarchive,
std::vector<double>*> (t=<optimized out>, ar=...) at
/usr/include/boost/archive/detail/iserializer.hpp:592
#5 load_override<std::vector<double>*> (t=<optimized out>,
this=0x110b3b0) at /usr/include/boost/archive/detail/common_iarchive.hpp:66
#6 load_override<std::vector<double>*> (version=0, x=<optimized out>,
this=0x110b3b0) at /usr/include/boost/mpi/packed_iarchive.hpp:101
#7 load_override<std::vector<double>*> (version=0, x=<optimized out>,
this=0x110b3b0) at /usr/include/boost/mpi/packed_iarchive.hpp:118
#8 operator>><std::vector<double>*> (t=<optimized out>, this=0x110b3b0)
at /usr/include/boost/archive/detail/interface_iarchive.hpp:60
#9 deserialize (stat=..., this=0x110b390) at
/usr/include/boost/mpi/communicator.hpp:1335
#10 boost::mpi::request::handle_serialized_irecv<std::vector<double,
std::allocator<double> >*> (self=0x11acf30, action=<optimized out>) at
/usr/include/boost/mpi/communicator.hpp:1457
#11 0x00007ffff7b5663f in boost::mpi::request::test
(this=this_at_entry=0x11acf30) at libs/mpi/src/request.cpp:66
#12 0x000000000045052d in
boost::mpi::wait_all<__gnu_cxx::__normal_iterator<boost::mpi::request*,
std::vector<boost::mpi::request, std::allocator<boost::mpi::request> > >
> (first=..., last=...) at /usr/include/boost/mpi/nonblocking.hpp:262
#13 0x000000000044be38 in opice::Communicator::waitall_recv
(this=0x9a76c0) at /media/data/prog/workspace/opice/src/communicator.cpp:257
#14 0x0000000000424c14 in opice::Boundary::set_J (this=0xbd2b40) at
/media/data/prog/workspace/opice/src/boundary.cpp:423
#15 0x00000000004354a6 in opice::Experiment::run
(this=this_at_entry=0x99d850) at
/media/data/prog/workspace/opice/src/experiment.cpp:211
#16 0x0000000000415b02 in main (argc=1, argv=0x7fffffffd708) at
/media/data/prog/workspace/opice/src/opice.cpp:41
(gdb)

In the program, parallelization is achieved using domain decomposition.
There is a Boundary class which calls the communication routines, the
boundary has several "BorderComm" objects (two in current version) -
each of them communicates with one neighbor. There is one Communicator
object which holds the data about MPI communicator, handles MPI
initialization, etc.

Relevant parts look somewhat like this:

class Boundary {
public:
    void set_J();
private:
    BorderComm *sbx_max, *sbx_min;
    Communicator* comm_;
};
void Boundary::set_J(){
    if(sbx_min) sbx_min->receive_J();
    if(sbx_max) sbx_max->receive_J();
    if(sbx_max) sbx_max->send_J();
    if(sbx_min) sbx_min->send_J();

comm_->waitall_recv();

if(sbx_max) sbx_max->save_J();
if(sbx_min) sbx_min->save_J();

comm_->waitall_send();
}

class BorderComm {
public:
    void send_J();
    void receive_J();
    void save_J();
    int neighbor_;
private:
    V3Slice *slice_J_border_;
    std::vector<double> *recv_array_;
    Communicator* comm_;
};
void BorderComm::send_J() {
    comm_->send_fields(slice_J_border_, TYPE_J, neighbor_);
}
void BorderComm::receive_J(){
    recv_array_ = new std::vector<double>(LEN);
    comm_->receive_fields(recv_array_, TYPE_J, neighbor_);
}
void BorderComm::save_J(){
    .... put contents of recv_array_ into some V3Slice ....
    delete(recv_array_);
}

class Communicator {
public:
    void send_fields(V3Slice* slice, PackType pack_type, int recnum);
    void receive_fields(std::vector<double>* recv_array, PackType
pack_type, int sendnum);
    void waitall_send();
    void waitall_recv();
protected:
    boost::mpi::communicator *world_;
    std::vector<boost::mpi::request> send_requests_;
    std::vector<boost::mpi::request> recv_requests_;
};
void Communicator::send_fields(V3Slice* slice, PackType pack_type, int
recnum) {
    std::vector<double> send_array(LEN);
    ..... fill send_array from V3Slice object .....
    send_requests_.push_back(world_->isend(recnum, pack_type, send_array));
}
void Communicator::receive_fields(std::vector<double>* recv_array,
PackType pack_type, int sendnum) {
    recv_requests_.push_back(world_->irecv(sendnum, pack_type, recv_array));
}
void Communicator::waitall_send(){
    boost::mpi::wait_all(send_requests_.begin(), send_requests_.end());
    send_requests_.clear();
}
void Communicator::waitall_recv(){
    boost::mpi::wait_all(recv_requests_.begin(), recv_requests_.end());
    recv_requests_.clear();
}

Now if I move to frame #13 in the backtrace - to get inside the
Communicator class, and look at the requests, I get:

(gdb) frame 13
#13 0x000000000044be38 in opice::Communicator::waitall_recv
(this=0x9a76c0) at /media/data/prog/workspace/opice/src/communicator.cpp:257

(gdb) print recv_requests_
$1 = std::vector of length 2, capacity 2 = {{m_requests = {0x94f0c0
<ompi_request_null>, 0x94f0c0 <ompi_request_null>}, m_handler = 0x452040
<boost::mpi::request::handle_serialized_irecv<std::vector<double,
std::allocator<double> >*>(boost::mpi::request*,
boost::mpi::request::request_action)>, m_data = {px = 0x110b390, pn =
{pi_ = 0x1103940}}}, {m_requests = {0xa40380, 0x94f0c0
<ompi_request_null>}, m_handler = 0x452040
<boost::mpi::request::handle_serialized_irecv<std::vector<double,
std::allocator<double> >*>(boost::mpi::request*,
boost::mpi::request::request_action)>, m_data = {px = 0x11ab900, pn =
{pi_ = 0x11a3ec0}}}}

(gdb) print send_requests_
$2 = std::vector of length 4, capacity 4 = {{m_requests = {0xa3f700,
0xa3f400}, m_handler = 0x0, m_data = {px = 0x11074c0, pn = {pi_ =
0x1105ff0}}}, {m_requests = {0xa3f100, 0xa3ee00}, m_handler = 0x0,
m_data = {px = 0x11ab640, pn = {pi_ = 0x118efc0}}}, {m_requests =
{0x114ec80, 0x114e980}, m_handler = 0x0, m_data = {px = 0x11aceb0, pn =
{pi_ = 0x11a3770}}}, {m_requests = {0x114e680, 0x114e380}, m_handler =
0x0, m_data = {px = 0x11ad570, pn = {pi_ = 0x11a1cd0}}}}

Until now, I have been using a 1D domain-decomposition algorithm, where
each of the domains communicated with its two neighbors using
non-blocking sends (preceded by a wait) matched with blocking recvs. Now
I need to move to 3D decomposition, where each domain gets 26 neighbors.
If I were to use blocking communication, I would have to devise a
non-trivial algorithm to avoid deadlocks. Non-blocking communication
seems a better option.

So - in previous version, there was no need for waitall functions.
Instead of:
std::vector<boost::mpi::request> send_requests_;
std::vector<boost::mpi::request> recv_requests_;
there was just a single boost::mpi::request in my Communicator class.
There was a call to request.wait() at the beginning of
Communicator::send_fields(), and Communicator::receive_fields() used
recv() instead of irecv().

This worked, but the order of communication in Boundary::set_J() had to
be fixed (send to left, recv from right, send to right, recv form left).
Now that would be very ugly for up to 26 neighbors, so the idea was to:
1. isend everything
2. irecv everything
3. wait until everything is received
4. save received data
5. wait until everything is sent
6. exit the function, continue other computation....

All other functions - i.e. creating the recv_array_, copying from my
V3Slice object, etc. are the same as in the "blocking version"

Next message: Olivier Tristan: "Re: [Boost-users] NT2 and Boost.SIMD beta 2 available"
Previous message: Olivier Tristan: "Re: [Boost-users] NT2 and Boost.SIMD beta 2 available"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net