
Hi Francesco! Binary archives are for use on one single platform only. If you want to move archives between different platforms, you have to use something portable - like xml or text archives. I guess x64 and ppc64 have different endianess and your compilers might have different type sizes for int as well. You can also have a look at my portable binary archive which you can find at the boost vault. Let me know if you do and find that it works in your case. Greetings, -- Christian Pfligersdorffer Software Engineering http://www.eos.info boost-users-bounces@lists.boost.org on :
Hello,
I'm getting a segfault when using Boost.MPI on a cluster of heterogeneous machines (x86_64 and ppc64). The problem arises when the "slave" machine, ppc64, receives its payload from the "master" machine, x86_64, and tries to unpack the archive. Tracing down the issue with valgrind and in debug mode, the problem arises here:
==28632== Invalid write of size 8 ==28632== at 0x10429DDC: boost::archive::detail::basic_iarchive_impl::load_pointer(boos t::archive::detail::basic_iarchive&, void*&, boost::archive::detail::basic_pointer_iserializer const*, boost::archive::detail::basic_pointer_iserializer const* (*)(boost::serialization::extended_type_info const&)) (basic_iarchive.cpp:453) ==28632== by 0x1042772F: boost::archive::detail::basic_iarchive::load_pointer(void*&, boost::archive::detail::basic_pointer_iserializer const*, boost::archive::detail::basic_pointer_iserializer const* (*)(boost::serialization::extended_type_info const&)) (basic_iarchive.cpp:564) ==28632== by 0x10468707: void boost::archive::detail::load_pointer_type<boost::mpi::packed_i archive>::invoke<pagmo::population*>(boost::mpi::packed_iarchive&, pagmo::population*&) (iserializer.hpp:518) ==28632== by 0x104683EF: void boost::archive::load<boost::mpi::packed_iarchive, pagmo::population*>(boost::mpi::packed_iarchive&, pagmo::population*&) (iserializer.hpp:586) ==28632== by 0x10468223: void boost::archive::detail::common_iarchive<boost::mpi::packed_iar chive>::load_override<pagmo::population*>(pagmo::population*&, int) (common_iarchive.hpp:68) ==28632== by 0x10468023: void boost::archive::basic_binary_iarchive<boost::mpi::packed_iarch ive>::load_override<pagmo::population*>(pagmo::population*&, int) (basic_binary_iarchive.hpp:67) ==28632== by 0x10467E27: void boost::mpi::packed_iarchive::load_override<pagmo::population*> (pagmo::population*&, int, mpl_::bool_<false>) (packed_iarchive.hpp:98) ==28632== by 0x10467C27: void boost::mpi::packed_iarchive::load_override<pagmo::population*> (pagmo::population*&, int) (packed_iarchive.hpp:115) ==28632== by 0x1046798F: boost::mpi::packed_iarchive& boost::archive::detail::interface_iarchive<boost::mpi::packed_ iarchive>::operator>><pagmo::population*>(pagmo::population*&) (interface_iarchive.hpp:60) ==28632== by 0x104676BB: void boost::serialization::nvp<pagmo::population*>::load<boost::mpi
packed_iarchive>(boost::mpi::packed_iarchive&, unsigned int) (nvp.hpp:87) ==28632== by 0x104674AF: void boost::serialization::access::member_load<boost::mpi::packed_iarchive, boost::serialization::nvp<pagmo::population*> (boost::mpi::packed_iarchive&, boost::serialization::nvp<pagmo::population*>&, unsigned int) (access.hpp:101) ==28632== by 0x104672CF: boost::serialization::detail::member_loader<boost::mpi::packed _iarchive, boost::serialization::nvp<pagmo::population*>
invoke(boost::mpi::packed_iarchive&, boost::serialization::nvp<pagmo::population*>&, unsigned int) (split_member.hpp:54) ==28632== Address 0x4b65d98 is not stack'd, malloc'd or (recently) free'd
The issue is in the method basic_iarchive_impl::load_pointer, around line 450:
int i = cid; cobject_id_vector[i].bpis_ptr = bpis_ptr;
Indeed, a printf confirms that i == 512 while cobject_id_vector.size() == 3. This also provokes the assertion new_cid == cid to fail one line below (where new_cid == 2). The same code, run locally on the ppc64 acting both as slave and master with mpirun -np 2, runs ok. Boost version is 1.42.0, MPI implementation is openMPI 1.4.2.
Can this be related to some endianness issue? Is Boost.MPI expected to work on heterogeneous clusters?
Thanks,
Francesco. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users