Boost logo

Boost Users :

Subject: Re: [Boost-users] boost MPI issue on windows
From: Matthias Troyer (troyer_at_[hidden])
Date: 2010-10-19 20:20:23


On 19 Oct 2010, at 14:04, Nick Collier wrote:

> Hi,
>
> I've got an MPI application that uses the boost mpi libraries. It runs fine on OSX and Linux when compiled using both debug and more optimized compiler flags. Unfortunately, on windows, the application only runs when compiled using Visual Studio 2008 debug configuration. When I run the executable compiled in the release configuration it crashes, but not in the same place every time in my code. It does only occur though when I make communicator::send and communicator::recv calls, sending and receiving vectors of ints. In those cases, I get exceptions like:
>
> {routine_=0x00d37d38 "MPI_Send" result_code_=805931010 message="MPI_Send: Invalid count, error stack:
> MPI_Send(176): MPI_Send(buf=0x001BF950, count=-1833296, MPI_PACKED, dest=1, tag=2001, MPI_COMM_WORLD) failed
> MPI_Send(101): Negative count, value is -1833296" }
>
> Sometimes its for a send and sometimes for a receive, but the exception is always for a "negative count" which looks suspiciously like an uninitialized integer.
>
> I figured I'm doing something wrong either in how I've compiled the boost libraries or in my own code (although its odd that it works in both linux and osx). Any suggestions obviously appreciated.
>
> This is for boost-1.39. We are prevented from using anything newer because the machine we ultimately deploy on has boost-1.39 installed.
>
> On windows, I'm using microsoft's mpi implementation (based on MPICH2) under windows 7. The boost mpi libraries are compiled in both static debug and release mt versions.

Not having Windows MPI and not having your code I cannot help much. But here are some suggestions:

The send is probably the send in packed_archive_send in point_to_point.cpp
Could you add some print statements there to check the size of the archive ar.size(). This is very strange since ar.size() is the size of a std::vector, and that cannot be negative. This sounds like sone strange compiler bug that we should identify and then work around.

Matthias



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net