Boost logo

Boost-MPI :

Subject: [Boost-mpi] multiple irecv tests failure with MPI_ERR_TRUNCATE
From: Roy Hashimoto (roy.hashimoto_at_[hidden])
Date: 2014-02-12 13:29:34


Hi -

I'm trying to queue multiple irecv requests but they seem to be failing in
request::test() with a MPI_ERR_TRUNCATE from OpenMPI:

$ mpiexec ./bmpi
libc++abi.dylib: terminating with uncaught exception of type
boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
>: MPI_Unpack: MPI_ERR_TRUNCATE: message truncated
[shoebag:06238] *** Process received signal ***
[shoebag:06238] Signal: Abort trap: 6 (6)
[shoebag:06238] Signal code: (0)
[shoebag:06238] [ 0] 2 libsystem_platform.dylib
 0x00007fff876665aa _sigtramp + 26
[shoebag:06238] [ 1] 3 ???
0x0000000000000000 0x0 + 0
[shoebag:06238] [ 2] 4 libsystem_c.dylib
0x00007fff90281bba abort + 125
[shoebag:06238] [ 3] 5 libc++abi.dylib
0x00007fff8e5c5141 __cxa_bad_cast + 0
[shoebag:06238] [ 4] 6 libc++abi.dylib
0x00007fff8e5eaaa4 _ZL25default_terminate_handlerv + 240
[shoebag:06238] [ 5] 7 libobjc.A.dylib
0x00007fff8cf31322 _ZL15_objc_terminatev + 124
[shoebag:06238] [ 6] 8 libc++abi.dylib
0x00007fff8e5e83e1 _ZSt11__terminatePFvvE + 8
[shoebag:06238] [ 7] 9 libc++abi.dylib
0x00007fff8e5e7e6b
_ZN10__cxxabiv1L22exception_cleanup_funcE19_Unwind_Reason_CodeP17_Unwind_Exception
+ 0
[shoebag:06238] [ 8] 10 bmpi
 0x0000000100cb53bf _ZN5boost15throw_exceptionINS_3mpi9exceptionEEEvRKT_ +
111
[shoebag:06238] [ 9] 11 bmpi
 0x0000000100cb6b79
_ZN5boost3mpi17packed_iprimitive9load_implEPvP15ompi_datatype_ti + 169
[shoebag:06238] [10] 12 bmpi
 0x0000000100cb6a73
_ZN5boost3mpi17packed_iprimitive4loadIcEEvRNSt3__112basic_stringIT_NS3_11char_traitsIS5_EENS3_9allocatorIS5_EEEE
+ 307
[shoebag:06238] [11] 13 bmpi
 0x0000000100cb692f
_ZN5boost7archive11load_access14load_primitiveINS_3mpi15packed_iarchiveENSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEEEvRT_RT0_
+ 47
[shoebag:06238] [12] 14 bmpi
 0x0000000100cb68ed
_ZN5boost7archive6detail21load_non_pointer_typeINS_3mpi15packed_iarchiveEE14load_primitive6invokeINSt3__112basic_stringIcNS8_11char_traitsIcEENS8_9allocatorIcEEEEEEvRS4_RT_
+ 29
[shoebag:06238] [13] 15 bmpi
 0x0000000100cb68a7
_ZN5boost7archive6detail21load_non_pointer_typeINS_3mpi15packed_iarchiveEE6invokeINSt3__112basic_stringIcNS7_11char_traitsIcEENS7_9allocatorIcEEEEEEvRS4_RT_
+ 39
[shoebag:06238] [14] 16 bmpi
 0x0000000100cb6862
_ZN5boost7archive4loadINS_3mpi15packed_iarchiveENSt3__112basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEEEEvRT_RT0_
+ 34
[shoebag:06238] [15] 17 bmpi
 0x0000000100cb682b
_ZN5boost7archive6detail15common_iarchiveINS_3mpi15packed_iarchiveEE13load_overrideINSt3__112basic_stringIcNS7_11char_traitsIcEENS7_9allocatorIcEEEEEEvRT_i
+ 43
[shoebag:06238] [16] 18 bmpi
 0x0000000100cb67ee
_ZN5boost3mpi15packed_iarchive13load_overrideINSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEEEvRT_iN4mpl_5bool_ILb0EEE
+ 46
[shoebag:06238] [17] 19 bmpi
 0x0000000100cb67b3
_ZN5boost3mpi15packed_iarchive13load_overrideINSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEEEvRT_i
+ 35
[shoebag:06238] [18] 20 bmpi
 0x0000000100cb6761
_ZN5boost7archive6detail18interface_iarchiveINS_3mpi15packed_iarchiveEErsINSt3__112basic_stringIcNS7_11char_traitsIcEENS7_9allocatorIcEEEEEERS4_RT_
+ 49
[shoebag:06238] [19] 21 bmpi
 0x0000000100cb6597
_ZN5boost3mpi6detail21serialized_irecv_dataINSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEE11deserializeERNS0_6statusE
+ 39
[shoebag:06238] [20] 22 bmpi
 0x0000000100cb5212
_ZN5boost3mpi7request23handle_serialized_irecvINSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEEENS_8optionalINS0_6statusEEEPS1_NS1_14request_actionE
+ 1826
[shoebag:06238] [21] 23 libboost_mpi-mt.dylib
0x0000000100cf7732 _ZN5boost3mpi7request4testEv + 50
[shoebag:06238] [22] 24 bmpi
 0x0000000100cae1df main + 2047
[shoebag:06238] [23] 25 libdyld.dylib
0x00007fff85ac95fd start + 1
[shoebag:06238] [24] 26 ???
0x0000000000000001 0x0 + 1
[shoebag:06238] *** End of error message ***
abort trap: 6

I've attached a short test program that just makes 2 isend calls and 2
irecv calls from rank 0 to rank 0, sleeps briefly, then tries to test the
irecv requests.

Interestingly, the test succeeds if only one call is made or if a native
MPI type is used. That is, I only get the failure with multiple
asynchronous requests of serialized types.

I've tried the test on a Mac with boost 1.55 and OpenMPI 1.73, and on
Debian Wheezy with boost 1.49 and OpenMPI 1.45. I see the same basic error
on both.

Is this something that should work that I'm just doing incorrectly? Or am I
trying to do something that isn't supported?

Thanks!
Roy





Boost-Commit list run by troyer at boostpro.com