Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost test and openmpi
From: Martin Vymazal (martin.vymazal_at_[hidden])
Date: 2014-03-21 11:06:58


What bothers me is the fact that it doesn't segfault (with or without mpirun)
as a 'classical' executable with main() function, but it crashes when I run it
as a boost test without mpirun. I must admit I didn't know that there's no
guarantee that this should actually work without mpirun and maybe I'm
complaining about a problem where there isn't any.
 I ran the executable with gdb and curiously enough, it terminated correctly
without reporting any problems. I also tried valgrind to see if I get any
memory errors. The segfault happens when I call MPI_Finalize() despite the
fact that mpi environment has been initialized but not finalized yet. The
output is below.

 Martin

==11986== Command: ./utest-mpi
==11986==
Global fixture constructor:
==11986== Syscall param writev(vector[...]) points to uninitialised byte(s)
==11986== at 0x14F0D9E7: writev (in /usr/lib/libc-2.19.so)
==11986== by 0x1790BF72: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:249)
==11986== by 0x1790D0B3: mca_oob_tcp_peer_send (oob_tcp_peer.c:204)
==11986== by 0x179109BB: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
==11986== by 0x172FDC5A: orte_rml_oob_send (rml_oob_send.c:136)
==11986== by 0x172FE228: orte_rml_oob_send_buffer (rml_oob_send.c:270)
==11986== by 0x17D1B7BF: modex (grpcomm_bad_module.c:573)
==11986== by 0x5577324: ompi_mpi_init (ompi_mpi_init.c:541)
==11986== by 0x558E7D2: PMPI_Init (pinit.c:84)
==11986== by 0x40E52B:
boost::unit_test::ut_detail::global_fixture_impl<MPIFixture>::test_start(unsigned
long) (utest-Poisson.cpp:509)
==11986== by 0x6A44763: boost::unit_test::ut_detail::callback0_impl_t<int,
boost::unit_test::ut_detail::test_start_caller>::invoke() (in
/usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== by 0x6A36175:
boost::execution_monitor::catch_signals(boost::unit_test::callback0<int>
const&) (in /usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== Address 0x164d2341 is 161 bytes inside a block of size 256 alloc'd
==11986== at 0x4C2AA3E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==11986== by 0x56060F7: opal_dss_buffer_extend
(dss_internal_functions.c:63)
==11986== by 0x560650D: opal_dss_copy_payload (dss_load_unload.c:164)
==11986== by 0x55DACC2: orte_grpcomm_base_pack_modex_entries
(grpcomm_base_modex.c:861)
==11986== by 0x17D1B6CE: modex (grpcomm_bad_module.c:563)
==11986== by 0x5577324: ompi_mpi_init (ompi_mpi_init.c:541)
==11986== by 0x558E7D2: PMPI_Init (pinit.c:84)
==11986== by 0x40E52B:
boost::unit_test::ut_detail::global_fixture_impl<MPIFixture>::test_start(unsigned
long) (utest-Poisson.cpp:509)
==11986== by 0x6A44763: boost::unit_test::ut_detail::callback0_impl_t<int,
boost::unit_test::ut_detail::test_start_caller>::invoke() (in
/usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== by 0x6A36175:
boost::execution_monitor::catch_signals(boost::unit_test::callback0<int>
const&) (in /usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== by 0x6A369B2:
boost::execution_monitor::execute(boost::unit_test::callback0<int> const&) (in
/usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== by 0x6A3FDB1: boost::unit_test::framework::run(unsigned long,
bool) (in /usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986==
MPI environment is initialized: 1
MPI environment is finalized: 0
Running 1 test case...
Running dummy test case
Global fixture destructor
MPI environment is initialized: 1
MPI environment is finalized: 0
==11986== Invalid write of size 8
==11986== at 0x6A358AC: ??? (in
/usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== by 0x14E643FF: ??? (in /usr/lib/libc-2.19.so)
==11986== by 0x14F0D9E6: writev (in /usr/lib/libc-2.19.so)
==11986== by 0x1790BF72: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:249)
==11986== by 0x1790D0B3: mca_oob_tcp_peer_send (oob_tcp_peer.c:204)
==11986== by 0x179109BB: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
==11986== by 0x172FDC5A: orte_rml_oob_send (rml_oob_send.c:136)
==11986== by 0x172FE228: orte_rml_oob_send_buffer (rml_oob_send.c:270)
==11986== by 0x55F6EEC: orte_routed_base_register_sync
(routed_base_register_sync.c:86)
==11986== by 0x17B17276: finalize (routed_binomial.c:115)
==11986== by 0x55F64F7: orte_routed_base_close
(routed_base_components.c:126)
==11986== by 0x55D6BB4: orte_ess_base_app_finalize (ess_base_std_app.c:265)
==11986== Address 0xa98 is not stack'd, malloc'd or (recently) free'd
==11986==
==11986==
==11986== Process terminating with default action of signal 11 (SIGSEGV)
==11986== Access not within mapped region at address 0xA98
==11986== at 0x6A358AC: ??? (in
/usr/lib/libboost_unit_test_framework.so.1.55.0)
==11986== by 0x14E643FF: ??? (in /usr/lib/libc-2.19.so)
==11986== by 0x14F0D9E6: writev (in /usr/lib/libc-2.19.so)
==11986== by 0x1790BF72: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:249)
==11986== by 0x1790D0B3: mca_oob_tcp_peer_send (oob_tcp_peer.c:204)
==11986== by 0x179109BB: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
==11986== by 0x172FDC5A: orte_rml_oob_send (rml_oob_send.c:136)
==11986== by 0x172FE228: orte_rml_oob_send_buffer (rml_oob_send.c:270)
==11986== by 0x55F6EEC: orte_routed_base_register_sync
(routed_base_register_sync.c:86)
==11986== by 0x17B17276: finalize (routed_binomial.c:115)
==11986== by 0x55F64F7: orte_routed_base_close
(routed_base_components.c:126)
==11986== by 0x55D6BB4: orte_ess_base_app_finalize (ess_base_std_app.c:265)
==11986== If you believe this happened as a result of a stack
==11986== overflow in your program's main thread (unlikely but
==11986== possible), you can try to increase the size of the
==11986== main thread stack using the --main-stacksize= flag.
==11986== The main thread stack size used in this run was 8388608.
==11986==
==11986== HEAP SUMMARY:
==11986== in use at exit: 530,391 bytes in 4,383 blocks
==11986== total heap usage: 8,298 allocs, 3,915 frees, 13,119,175 bytes
allocated
==11986==
==11986== LEAK SUMMARY:
==11986== definitely lost: 5,064 bytes in 34 blocks
==11986== indirectly lost: 5,390 bytes in 22 blocks
==11986== possibly lost: 25,881 bytes in 584 blocks
==11986== still reachable: 494,056 bytes in 3,743 blocks
==11986== suppressed: 0 bytes in 0 blocks
==11986== Rerun with --leak-check=full to see details of leaked memory
==11986==
==11986== For counts of detected and suppressed errors, rerun with: -v
==11986== Use --track-origins=yes to see where uninitialised values come from
==11986== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 1)
Segmentation fault (core dumped)

On Friday 21 March 2014 08:50:25 Rhys Ulerich wrote:
> > it doesn't work for me.
>
> To be sure I understand because you were vague... You believe it
> shouldn't segfault when the binary is executed without using mpirun?
>
> If it's even possible depends on your MPI stack. There's zero
> guarantee in the MPI standard, IIRC, that an MPI-based binary can be
> executed without mpirun.
>
> The latter case does not segfault for me on MPICH2 1.4.1p1, gcc 4.6.3,
> Boost 1.5.1.
>
> I suggest you attach debugger and isolate the origin of the segfault.
>
> - Rhys
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net