Boost logo

Boost Users :

Subject: Re: [Boost-users] mpi isend to group
From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2012-12-30 16:37:38


On 00:02 Sun 30 Dec , Philipp Kraus wrote:
> I use at the moment OpenMPI (but it should be worked also under MS Windows)

Did you enable progress threads in Open MPI?

> The system hast got 64 cores (on each core 2 threads can be
> created).

Are we talking about a single machine with 64 cores, or a small cluster?

> > - Which threading level do you request via MPI_Thread_init()?
>
> At my testing I use MPI_THREAD_SERIALIZED

In your code fragment you called boost::thread::yield(). Do you ensure
that no concurrent calls to MPI take place? Otherwise I'd assume that
this doesn't comply with the MPI standard.

> > - How do you ensure asynchronous progress? (i.e.: mostly MPI will only
> > send/receive data when some MPI function is being called. Unless
> > e.g. MPI_Test is being polled or you MPI supports progress threads,
> > the bulk of communication won't be carried out until you call MPI_Wait())
> >
>
> Each MPI process runs in the thread loop some database calls, so after each
> database block I will check if there is a message from the MPI core 0 and if it
> exists, all cores should be "barried". The core 0 checks after the database calls
> is there is any data, if yes, it sends this data to the other cores. So I can not use
> a MPI_Wait call, because this creates a blocking communication.

From what I understood so far I fear that you have to change your
architecture. MPI needs cycles to make progress. This can only be
achieved by calling it. Also, your code fragment suggests that you
repetitively call MPI_Irecv() and MPI_Isend() without ever waiting for
completion. This results in a memory leak as new handles for each
communication will be created within MPI.

As your jobs are reasonably small and waiting for MPI 3 seems to be no
option I'd suggest you to use MPI_Iprobe(), similar to the following
code. Notice that MPI_Send/Recv are only called when necessary and
their blocking nature creates an implicit barrier. The fragment uses a
hard-coded binary tree to implement a reasonably fast/scalable
broadcast (even though Open MPI's MPI_Bcast() would be much faster).

    while (thread_is_running) {
        int id = 0;

        if (rank == 0) {
            try {
                id = getID();
                MPI_Send(&id, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
            } catch (...) {
            }
        } else {
            int flag;
            MPI_Iprobe(predecessor, 0, MPI_COMM_WORLD, &flag, 0);

            if (flag) {
                MPI_Recv(&id, 1, MPI_INT, predecessor, 0, MPI_COMM_WORLD, 0);
                if ((2 * rank + 0) < size) {
                    MPI_Send(&id, 1, MPI_INT, 2 * rank + 0, 0, MPI_COMM_WORLD);
                }
                if ((2 * rank + 1) < size) {
                    MPI_Send(&id, 1, MPI_INT, 2 * rank + 1, 0, MPI_COMM_WORLD);
                }

                do_something(id);
            }
        }
 
        // no yield here, but you may call some worker function here
    }

HTH
-Andreas

-- 
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net