Boost logo

Boost Users :

Subject: Re: [Boost-users] [MPI, serialization] Segmentation fault in heterogeneous cluster
From: Martin Huenniger (m.huenniger_at_[hidden])
Date: 2010-09-21 10:01:12


Hi,

the problem is solved:

the bug originated from two issues:
1)
int size = ss.str().size();

it is not wise to forget to send the terminating \0 of a C-string. So
there are 2 solutions:

   int size = ss.str().size+1
or

   char *buf = const cast<char*>( ss.str.data() );

The first is to be prefered. Because

2.) This fragment is _bad_:

MPI_Send( &size, 1, MPI_INT, w,
           MW::send_job_data_size, MPI_Comm(comm) );
char *buf = const_cast<char*>( ss.str().c_str() );
MPI_Send( buf , size, MPI_CHAR,
           w, MW::send_job_data_tag, MPI_Comm(comm) );

Why? char *buf gets initialized with the address of a temporary copy of
the C-string corresponding to the stringstream ss's string. So when
MPI_Send is invoked the pointer buf points to some memory that is not
guaranteed to hold the expected content.

So the solution is:

   std::stringstream ss;
   {
      Oarchive oa(ss);
      oa << data;
   }
   int size = ss.str().size+1
   MPI_Send( &size, 1, MPI_INT, w,
             MW::send_job_data_size, MPI_Comm(comm) );
   MPI_Send( const_cast<char*>( ss.str.c_str() ), size, MPI_CHAR,
             w, MW::send_job_data_tag, MPI_COMM(comm) )

The next problem is the receiving of binary_archives: Its solution is
also a bit under the hood

   int size;
   MPI_Status mstatus;
   MPI_Recv( &size, 1, MPI_INT, 0,
             MW::send_job_data_size, MPI_Comm(comm), &mstatus );
   char *buf = static_cast<char*>( malloc( size ) );
   MPI_Recv( buf, size, MPI_CHAR, 0,
             MW::send_job_data_tag, MPI_Comm(comm), &mstatus );
   std::string s( buf );
   std::stringstream ss( s );

The problem here is the following: we receive size C-characters and try
to generaterate a C++ string from it. We are using std::string s( buf ).
Here lies the error: string::string( char * ) expects a C-string, that
is a \0-termionated sequence of chars. If we have in buf a
binary_archive, the the probability of having a \0 in some place is very
high and therefore s only holds a part of the transmitted information.

Better use

   std::string s( buf, size)

to initialize a string of length size with the data buf. And that fixes
the issue.

Thanks to our Ph.D student Jens for help with that.

Cheers,
Martin


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net