Boost logo

Boost Users :

Subject: [Boost-users] [MPI] error in hybrid OpenMP + Boost.MPI application
From: Riccardo Murri (riccardo.murri_at_[hidden])
Date: 2010-06-22 16:40:45


Hello,

I have an OpenMP+MPI application, which crashes with an exception on
some inputs. The backtrace shows that the error originates in the
Boost.MPI code::

  terminate called after throwing an instance of
'boost::archive::archive_exception'
    what(): unregistered class
  ...
  [compute-0-7:23208] [ 0] /lib64/libpthread.so.0 [0x3110c0e4c0]
  [compute-0-7:23208] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3110430215]
  [compute-0-7:23208] [ 2] /lib64/libc.so.6(abort+0x110) [0x3110431cc0]
  [compute-0-7:23208] [ 3]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)
[0x31114bec44]
  [compute-0-7:23208] [ 4] /usr/lib64/libstdc++.so.6 [0x31114bcdb6]
  [compute-0-7:23208] [ 5] /usr/lib64/libstdc++.so.6 [0x31114bcde3]
  [compute-0-7:23208] [ 6] /usr/lib64/libstdc++.so.6 [0x31114bceca]
  [compute-0-7:23208] [ 7]
/home/oci/murri/sw/lib/libboost_serialization.so.1.43.0(_ZN5boost7archive6detail19basic_iarchive_impl12load_pointerERNS1_14basic_iarchiveERPvPKNS1_25basic_pointer_iserializerEPFS9_RKNS_13serialization18extended_type_infoEE+0x23b)
[0x2b3d485348ab]
  [compute-0-7:23208] [ 8]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost7archive6detail17load_pointer_typeINS_3mpi15packed_iarchiveEE6invokeIPN9Waterfall9Processor9SparseRowEEEvRS4_RT_+0x49)
[0x4d7989]
  [compute-0-7:23208] [ 9]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost7archive4loadINS_3mpi15packed_iarchiveEPN9Waterfall9Processor9SparseRowEEEvRT_RT0_+0x22)
[0x4d79f6]
  [compute-0-7:23208] [10]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost7archive6detail15common_iarchiveINS_3mpi15packed_iarchiveEE13load_overrideIPN9Waterfall9Processor9SparseRowEEEvRT_i+0x28)
[0x4d7a20]
  [compute-0-7:23208] [11]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost7archive21basic_binary_iarchiveINS_3mpi15packed_iarchiveEE13load_overrideIPN9Waterfall9Processor9SparseRowEEEvRT_i+0x23)
[0x4d7a45]
  [compute-0-7:23208] [12]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost3mpi15packed_iarchive13load_overrideIPN9Waterfall9Processor9SparseRowEEEvRT_iN4mpl_5bool_ILb0EEE+0x23)
[0x4d7a6b]
  [compute-0-7:23208] [13]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost3mpi15packed_iarchive13load_overrideIPN9Waterfall9Processor9SparseRowEEEvRT_i+0x2a)
[0x4d7a98]
  [compute-0-7:23208] [14]
/home/oci/murri/rank/rank_wf_dbg(_ZN5boost7archive6detail18interface_iarchiveINS_3mpi15packed_iarchiveEErsIPN9Waterfall9Processor9SparseRowEEERS4_RT_+0x2a)
[0x4d7ac4]
  [compute-0-7:23208] [15]
/home/oci/murri/rank/rank_wf_dbg(_ZNK5boost3mpi12communicator9recv_implIPN9Waterfall9Processor9SparseRowEEENS0_6statusEiiRT_N4mpl_5bool_ILb0EEE+0x93)
[0x4f41ef]
  [compute-0-7:23208] [16]
/home/oci/murri/rank/rank_wf_dbg(_ZNK5boost3mpi12communicator4recvIPN9Waterfall9Processor9SparseRowEEENS0_6statusEiiRT_+0x3f)
[0x4f427b]
  [compute-0-7:23208] [17]
/home/oci/murri/rank/rank_wf_dbg(_ZN9Waterfall4rankEv+0x291)
[0x4acf27]
  [compute-0-7:23208] [18]
/home/oci/murri/rank/rank_wf_dbg(main+0x698) [0x4ad9b6]
  [compute-0-7:23208] [19] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x311041d974]
  [compute-0-7:23208] [20] /home/oci/murri/rank/rank_wf_dbg [0x4aa219]
  [compute-0-7:23208] *** End of error message ***

One MPI rank is started per compute node; all OpenMP threads may call
mpi::isend(); only one will do mpi::iprobe()/mpi::recv().

Although the above error is in the mpi::communicator::recv(),
serializing the mpi::isend() calls apparently solves the issue;
similarly, the program runs fine if I run it on one node only; with
some other (smaller) inputs, it runs fine as well. This leads me to
think that it is a thread-safety issue with the MPI part. I have
checked that the MPI library (OpenMPI 1.4.2) is initialized with
MPI_Init_threads() and provides the threading level
MPI_THREAD_MULTIPLE.

So, question: is there a (known) thread-safety issue with Boost.MPI,
or should I definitely look somewhere else?

Thanks for any help!

Riccardo


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net