Boost logo

Boost Users :

Subject: Re: [Boost-users] boostMPI asychronous communication
From: Matthias Troyer (troyer_at_[hidden])
Date: 2010-06-30 06:32:03


The isend also returns a request object that you need to call wait on

Matthia

Sent from my iPad

On Jun 29, 2010, at 12:55 AM, Jack Bryan <dtustudy68_at_[hidden]> wrote:

> Thanks for your reply.
>
> I have checked the tags, master and worker tags match.
>
> The deadlock happens in the case of 2 tasks scheduled on one processor.
>
> If there is only one task on one processor, there is no deadlock.
> It works well.
>
> The master is resopnsible for scheduling tasks to workers, which need to run
> the assigned tasks and feedback results to master.
>
> if I assign one task to each worker, it works well.
>
> But, when I increase the # of task to 2 on worker node, it is deadlock.
>
> The master only schedules 2 tasks to one worker in order to simplify the analysis
> for the poential deadlock.
>
> The worker can receive the 2 tasks and run them, but the master cannot get the
> results from worker.
>
>
> the main idea:
>
> master (node0)
>
> counter=0;
> totalTaskNum =2;
> while (counter < totalTaskNum )
> {
> TaskPackage myTaskPackage(world);
>
> world.isend(node1, downStreamTaskTag, myTaskPackage);
> recvReqs[counter] = world.irecv(node1, upStreamtaskTag, taskResultPackage[counter]);
> counter++;
> }
> world.wait_all(recvReqs, recvReqs+(totalTaskNum));
>
> worker (node 1):
>
> while(1)
> {
> TaskPackage workerTaskPackage(world);
> world.recv(node0,downStreamTaskTag, workerTaskPackage );
>
> do it local work;
>
> world.isend(node0, upStreamTaskTag, workerTaskPackage);
>
> if (no new task)
> break;
> }
>
> My code has many classes, I am trying to find out how to cut out the main part from it.
>
> Any help is appreciated.
>
>
> thanks
>
> Jack
>
> > Date: Mon, 28 Jun 2010 21:28:47 +0200
> > From: riccardo.murri_at_[hidden]
> > To: boost-users_at_[hidden]
> > Subject: Re: [Boost-users] boostMPI asychronous communication
> >
> > Hello Jack,
> >
> > On Mon, Jun 28, 2010 at 7:46 PM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
> > > This is the main part of me code, which may have deadlock.
> > >
> > > Master:
> > > for (iRank = 0; iRank < availableRank ; iRank++)
> > > {
> > > destRank = iRank+1;
> > > for (taski = 1; taski <= TaskNumPerRank ; taski++)
> > > {
> > > resultSourceRank = destRank;
> > > recvReqs[taskCounterT2] = world.irecv(resultSourceRank, upStreamTaskTag, resultTaskPackageT2[iRank][taskCounterT3]);
> > > reqs = world.isend(destRank, taskTag, myTaskPackage);
> > > ++taskCounterT2;
> > > }
> > >
> > > // taskTotalNum = availableRank * TaskNumPerRank
> > > // right now, availableRank =1, TaskNumPerRank =2
> > > mpi::wait_all(recvReqs, recvReqs+(taskTotalNum));
> > > -----------------------------------------------
> > > worker:
> > > while (1)
> > > {
> > > world.recv(managerRank, downStreamTaskTag, resultTaskPackageW);
> > > do its local work on received task;
> > > destRank = masterRank;
> > > reqs = world.isend(destRank, taskTag, myTaskPackage);
> > > if (recv end signal)
> > > break;
> > > }
> >
> > 1. I can't see where the outer for-loop in master is closed; is the
> > wait_all() part of that loop? (I assume it does not.) Can you send a
> > minimal program that I can feed to a compiler and test? This could
> > help.
> >
> > 2. Are you sure there is no tag mismatch between master and worker?
> >
> > master: world.isend(destRank, taskTag, myTaskPackage);
> > ^^^^^^^
> > worker: world.recv(managerRank, downStreamTaskTag, resultTaskPackageW);
> > ^^^^^^^^^^^^^^^^^
> >
> > unless master::taskTag == worker::downStreamTaskTag, the recv() will
> > wait forever.
> >
> > Similarly, the following requires that master::upStreamTaskTag ==
> > worker::taskTag:
> >
> > master: ... = world.irecv(resultSourceRank, upStreamTaskTag, ...);
> > worker: world.isend(destRank, taskTag, myTaskPackage); //
> > destRank==masterRank
> >
> > 3. Do the source/destination ranks match? The master waits for messages from
> > destinations 1..availableRank (inclusive range), and the worker waits
> > for a message from "masterRank" (is this 0?)
> >
> > 4. Does the master work if you replace the main loop with the following?
> >
> > Master:
> > for (iRank = 0; iRank < availableRank ; iRank++)
> > {
> > destRank = iRank+1;
> > for (taski = 1; taski <= TaskNumPerRank ; taski++)
> > {
> > // XXX: the following code does not contain any reference to
> > // "taski": it is sending "TaskNumPerRank" copies of the
> > // same message ...
> > reqs = world.isend(destRank, taskTag, myTaskPackage);
> > };
> > }; // I assume the outer loop does *not* include the wait_all()
> >
> > // expect a message from each Task
> > int n = 0;
> > while (n < taskTotalNum) {
> > mpi::status status = world.probe();
> > world.recv(status.source(), status.tag(),
> > resultTaskPackageT2[status.source()][taskCounterT3]);
> > ++n;
> > };
> >
> >
> > Best regards,
> > Riccardo
> > _______________________________________________
> > Boost-users mailing list
> > Boost-users_at_[hidden]
> > http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net