|
Boost Users : |
Subject: Re: [Boost-users] [MPI, serialization] Segmentation fault in heterogeneous cluster
From: Martin Huenniger (m.huenniger_at_[hidden])
Date: 2010-09-21 10:01:12
Hi,
the problem is solved:
the bug originated from two issues:
1)
int size = ss.str().size();
it is not wise to forget to send the terminating \0 of a C-string. So
there are 2 solutions:
int size = ss.str().size+1
or
char *buf = const cast<char*>( ss.str.data() );
The first is to be prefered. Because
2.) This fragment is _bad_:
MPI_Send( &size, 1, MPI_INT, w,
MW::send_job_data_size, MPI_Comm(comm) );
char *buf = const_cast<char*>( ss.str().c_str() );
MPI_Send( buf , size, MPI_CHAR,
w, MW::send_job_data_tag, MPI_Comm(comm) );
Why? char *buf gets initialized with the address of a temporary copy of
the C-string corresponding to the stringstream ss's string. So when
MPI_Send is invoked the pointer buf points to some memory that is not
guaranteed to hold the expected content.
So the solution is:
std::stringstream ss;
{
Oarchive oa(ss);
oa << data;
}
int size = ss.str().size+1
MPI_Send( &size, 1, MPI_INT, w,
MW::send_job_data_size, MPI_Comm(comm) );
MPI_Send( const_cast<char*>( ss.str.c_str() ), size, MPI_CHAR,
w, MW::send_job_data_tag, MPI_COMM(comm) )
The next problem is the receiving of binary_archives: Its solution is
also a bit under the hood
int size;
MPI_Status mstatus;
MPI_Recv( &size, 1, MPI_INT, 0,
MW::send_job_data_size, MPI_Comm(comm), &mstatus );
char *buf = static_cast<char*>( malloc( size ) );
MPI_Recv( buf, size, MPI_CHAR, 0,
MW::send_job_data_tag, MPI_Comm(comm), &mstatus );
std::string s( buf );
std::stringstream ss( s );
The problem here is the following: we receive size C-characters and try
to generaterate a C++ string from it. We are using std::string s( buf ).
Here lies the error: string::string( char * ) expects a C-string, that
is a \0-termionated sequence of chars. If we have in buf a
binary_archive, the the probability of having a \0 in some place is very
high and therefore s only holds a part of the transmitted information.
Better use
std::string s( buf, size)
to initialize a string of length size with the data buf. And that fixes
the issue.
Thanks to our Ph.D student Jens for help with that.
Cheers,
Martin
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net