Boost logo

Boost Users :

Subject: Re: [Boost-users] [MPI, serialization] Segmentation fault in heterogeneous cluster
From: Martin Hünniger (m.huenniger_at_[hidden])
Date: 2010-09-16 07:21:47


Hello,

I have a similar problem here. I try to send data from one process to
another (mpirun -np 2). The dataI use is serialized in the appropriate
way. If I send it to a text archive I can it restore again from this
text archive and all is ok. But when I try to send the data between the
processes something goes wrong and the data is not restored correctly.

I have the following serialization routine:

template<typename Float>
template<class Archive>
void Ball<Float>::serialize( Archive &ar, const unsigned int version )
{
   ar & BOOST_SERIALIZATION_NVP( members ) // std::vector<int> members
   // and so on...

   // Testing:
   for( int i=0; i<members.size(); ++i )
     std::cout << members[i] << " ";
   std::cout << std::endl;
}

the output on sending such an object is (for example)
0 1 3 2

the output on receiving the same object is
0 0 0 0

So there seems to be something wrong during storing the data in ar or
during the restoration process.

Did someone in this thread come already to a solution to this problem?

Cheers,
Martin

Am 03.09.2010 11:31, schrieb Francesco Biscani:
> Hi Matthias,
>
> I updated to Boost 1.44.0 but unfortunately the crash is now even in
> local mode (mpirun -np 2). The strange thing is that the serialization
> code is apparently working fine when used with text archives, but with
> MPI archives the slave process, upon reception, is deserializing the
> objects with seemingly random values (e.g., huge values instead of 1
> or 0 for an integer data member of a structure).
>
> I'm trying to isolate the problem right now and, in case I can
> reproduce it with a minimal example, I will post it here (though it is
> likely some mistake on my part, it's the first time I use MPI and
> serialization libraries).
>
> Cheers,
>
> Francesco
>
> On Thu, Sep 2, 2010 at 4:26 AM, Matthias Troyer<troyer_at_[hidden]> wrote:
>>
>>
>> On Sep 2, 2010, at 7:39, Francesco Biscani<bluescarni_at_[hidden]> wrote:
>>
>>> Hello,
>>>
>>> I'm getting a segfault when using Boost.MPI on a cluster of
>>> heterogeneous machines (x86_64 and ppc64). The problem arises when the
>>> "slave" machine, ppc64, receives its payload from the "master"
>>> machine, x86_64, and tries to unpack the archive. Tracing down the
>>> issue with valgrind and in debug mode, the problem arises here:
>>>
>>
>>
>>>
>>> Can this be related to some endianness issue? Is Boost.MPI expected to
>>> work on heterogeneous clusters?
>>>
>>
>> Hi Francesco,
>>
>> Have you checked whether a program using the MPI C API can correctly send data on your heterogeneous cluster? Boost.MPI uses the support for heterogeneous machines of the underlying MPI library unless you define the macro BOOST_MPI_HOMOGENOUS.
>>
>> Have you also tried the latest Boost release?
>>
>> Matthias
>> _______________________________________________
>> Boost-users mailing list
>> Boost-users_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net