Boost logo

Boost Users :

Subject: Re: [Boost-users] [MPI] all_gather() missing functionality or bug ?
From: Júlio Hoffimann (julio.hoffimann_at_[hidden])
Date: 2012-07-25 12:39:27


2012/7/25 Thomas Hisch <t.hisch_at_[hidden]>

> Hello list,
>
> I discovered that the all_gather function from boost.mp throws an
> exception, if the local sizes of the to be gathered vector are
> different:
>
> terminate called after throwing an instance of 'terminate called
> after throwing an instance of
>
> 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
> >'
> what(): MPI_Allgather: MPI_ERR_IN_STATUS: error code in status
>
> boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
> >[thisch:14211] *** Process received signal ***
>
>
> I used the following form of the all_gather function:
> all_gather(comm, myvec.data(), myvec.size(), totalvec)
> where myvec is a std::vector<double>. As already mentioned the size of
> myvec can be different on each processor. The documentation [0] says
> that all_gather supports most uses of the MPI_Allgatherv. Therefore I
> am not sure if my use of all_gather is unsupported or a just a bug.
>
> all_gather works as excpected if myvec.size() is the same on all
> processors. ATM I use the following workaround:
>
> { //temporary all_gather fix: use MPI_Allgatherv
> vector<int> recvcts(numprocs);
> vector<int> displs(numprocs);
> const int nrealcoeffs = N;
> const int initialcoeffsperrank = nrealcoeffs/numprocs;
> const int remainder = nrealcoeffs%numprocs;
> size_t displtmp = 0;
> for(size_t i=0; i < numprocs; i++){
> recvcts[i] = initialcoeffsperrank;
> if(i < remainder) recvcts[i]++;
> displs[i] = displtmp;
> displtmp += recvcts[i];
> }
>
> allgradient.resize(N);
> MPI_Allgatherv( myvec.data(), myvec.size(), MPI_DOUBLE,
> totalvec.data(), recvcts.data(), displs.data(),
> MPI_DOUBLE, comm);
> }
>
>
> Regards
> Thomas
>
> [0]
> http://www.boost.org/doc/libs/1_50_0/doc/html/mpi/tutorial.html#mpi.c_mapping
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>

Hi Thomas,

I had a similar problem in the past, but it was with the scatterv/gatherv
pair. I submitted a patch (https://svn.boost.org/trac/boost/ticket/5292)
completely based on the existing implementation (scatter/gather), but
Boost.MPI devs didn't liked it. If you read table 17.5 (
http://www.boost.org/doc/libs/1_50_0/doc/html/mpi/tutorial.html#mpi.c_mapping),
you'll find their opinion about scatterv/gatherv.

I think you have two options:

1) Implement it yourself by adapting the existing Boost.MPI implementation
like i did.
2) Do the tedious and error-prone task of dealing with different sizes by
hand, i.e., redistribute your chunk of data and do "point-to-point"
communication with the remainder.

Good luck!
Júlio.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net