|
Boost Users : |
Subject: Re: [Boost-users] mpi isend to group
From: Philipp Kraus (philipp.kraus_at_[hidden])
Date: 2012-12-30 17:20:42
Am 30.12.2012 um 22:37 schrieb Andreas Schäfer:
> On 00:02 Sun 30 Dec , Philipp Kraus wrote:
>> I use at the moment OpenMPI (but it should be worked also under MS Windows)
>
> Did you enable progress threads in Open MPI?
Yes, I have build the OpenMPI lib myself
>
>> The system hast got 64 cores (on each core 2 threads can be
>> created).
>
> Are we talking about a single machine with 64 cores, or a small cluster?
a small cluster system with 8 nodes
>
>>> - Which threading level do you request via MPI_Thread_init()?
>>
>> At my testing I use MPI_THREAD_SERIALIZED
>
> In your code fragment you called boost::thread::yield(). Do you ensure
> that no concurrent calls to MPI take place? Otherwise I'd assume that
> this doesn't comply with the MPI standard.
>
>>> - How do you ensure asynchronous progress? (i.e.: mostly MPI will only
>>> send/receive data when some MPI function is being called. Unless
>>> e.g. MPI_Test is being polled or you MPI supports progress threads,
>>> the bulk of communication won't be carried out until you call MPI_Wait())
>>>
>>
>> Each MPI process runs in the thread loop some database calls, so after each
>> database block I will check if there is a message from the MPI core 0 and if it
>> exists, all cores should be "barried". The core 0 checks after the database calls
>> is there is any data, if yes, it sends this data to the other cores. So I can not use
>> a MPI_Wait call, because this creates a blocking communication.
>
> From what I understood so far I fear that you have to change your
> architecture. MPI needs cycles to make progress. This can only be
> achieved by calling it. Also, your code fragment suggests that you
> repetitively call MPI_Irecv() and MPI_Isend() without ever waiting for
> completion. This results in a memory leak as new handles for each
> communication will be created within MPI.
Sorry I have forgot a "main information": This calls are a preexecution of
the algorithm. The main algorithm is cycled and uses a MPI blocking
communication, so only the preexecution must be a little bit weak.
> As your jobs are reasonably small and waiting for MPI 3 seems to be no
> option I'd suggest you to use MPI_Iprobe(), similar to the following
> code. Notice that MPI_Send/Recv are only called when necessary and
> their blocking nature creates an implicit barrier. The fragment uses a
> hard-coded binary tree to implement a reasonably fast/scalable
> broadcast (even though Open MPI's MPI_Bcast() would be much faster).
>
> while (thread_is_running) {
> int id = 0;
>
> if (rank == 0) {
> try {
> id = getID();
> MPI_Send(&id, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
> } catch (...) {
> }
> } else {
> int flag;
> MPI_Iprobe(predecessor, 0, MPI_COMM_WORLD, &flag, 0);
>
> if (flag) {
> MPI_Recv(&id, 1, MPI_INT, predecessor, 0, MPI_COMM_WORLD, 0);
> if ((2 * rank + 0) < size) {
> MPI_Send(&id, 1, MPI_INT, 2 * rank + 0, 0, MPI_COMM_WORLD);
> }
> if ((2 * rank + 1) < size) {
> MPI_Send(&id, 1, MPI_INT, 2 * rank + 1, 0, MPI_COMM_WORLD);
> }
>
> do_something(id);
> }
> }
>
> // no yield here, but you may call some worker function here
> }
Thanks for the code, but this code is based on OpenMPI. My program must
be also work wirth MPI CH2 (MPI implemention on Windows based systems),
so I would like to create a boost-only solution.
I do it at the moment with:
while (thread_is_running) {
if (!l_mpicom.rank())
for(std::size_t i=1; i < l_mpicom.size(); ++i)
l_mpicom.isend(i, 666, l_task.getID());
else
if (boost::optional<mpi::status> l_status = l_mpicom.iprobe(0, 666))
{
std::size_t l_taskid = 0;
l_mpicom.recv( l_status->source(), l_status->tag(), l_taskid );
}
}
Thanks
Phil
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net