I winnowed my own code down to a simple scatter call. I get the error under release mode but not under debug. Here's the code:

namespace mpi  = boost::mpi;
int main(int argc, char **argv) {
mpi::environment env(argc, argv);
mpi::communicator world;

vector<int> out;
vector<vector<int> > in;
if (world.rank() == 0) {
for (int i = 0; i < world.size(); ++i) {
vector<int> vec(10, 12);
in.push_back(vec);
}
}

try {
mpi::scatter(world, in, out, 0);
} catch (std::exception& ex) {
std::cerr << ex.what() << std::endl;
throw ex;
}

cout << world.rank() << " : " << out.size() << endl;
}

And the error is:

ar size: 4293462132
MPI_Send: Invalid count, error stack:
MPI_Send(176): MPI_Send(buf=0x0016F78C, count=-1505164, MPI_PACKED, dest=1, tag=
2147483647, MPI_COMM_WORLD) failed
MPI_Send(101): Negative count, value is -1505164

MPI_Alloc_mem: Invalid argument, error stack:
MPI_Alloc_mem(119): MPI_Alloc_mem(size=-1505164, MPI_INFO_NULL, baseptr=0x003EF6
8C) failed
MPI_Alloc_mem(82).: Invalid value for size, must be non-negative but is -1505164

thanks,

Nick


On Oct 26, 2010, at 2:58 AM, Matthias Troyer wrote:

Hi Nick,

On 26 Oct 2010, at 00:52, Nick Collier wrote:

Sorry for not getting back sooner, and thanks for the initial reply. I added a print to packed_archive_send, so it now looks like:

void
packed_archive_send(MPI_Comm comm, int dest, int tag,
                    const packed_oarchive& ar)
{
std::cout << "ar size: " << ar.size() << std::endl;
  const void* size = &ar.size();
  ....

When I run this under release mode I with, I get:

ar size: 4265919640

MPI_Send: Invalid count, error stack:
MPI_Send(176): MPI_Send(buf=0x01E8AE18, count=-29047656, MPI_PACKED, dest=1, tag
=2147483647, MPI_COMM_WORLD) failed
MPI_Send(101): Negative count, value is -29047656

When I run it under debug mode, I don't get the error and 

ar size: 1622.

Any suggestions how to proceed are much appreciated. 

This seems to point towards either a bug in your code, that you overwrite the memory location of the packed_oarchive, or a bug in the optimizer. Can you try a simple test program where you are sure that you don't write out of bounds anywhere?

You could also try to see what goes on by adding more output statements to the size() function of packed_oprimitive, or add a size function to packed_oarchive that prints internal_buffer_.size() and then calls the size function of the packed_oprimitive base class. It seems that either the internal_buffer_ vector is messed up, or the reference in the base class by either a compiler bug o a bug in your code.

Matthias

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users