|
Boost Users : |
Subject: Re: [Boost-users] hybrid parallelism
From: Hicham Mouline (hicham_at_[hidden])
Date: 2010-11-03 20:19:58
> -----Original Message-----
> From: boost-users-bounces_at_[hidden] [mailto:boost-users-
> bounces_at_[hidden]] On Behalf Of Dave Abrahams
> Sent: 03 November 2010 23:54
> To: boost-users_at_[hidden]
> Subject: Re: [Boost-users] hybrid parallelism
>
> On Thu, Nov 4, 2010 at 8:16 AM, Brian Budge <brian.budge_at_[hidden]>
> wrote:
> > Hi Hicham -
> >
> > Yes, you can use MPI (possibly through boost::mpi) to distribute
> tasks
> > to multiple machines, and then use threads on those machines to work
> > on finer grained portions of those tasks. From another thread on
> this
> > list, there are constructs in boost::asio that handle task queuing
> for
> > the thread tasks.
>
> If I were you I would start by trying to do this with N processes per
> machine, rather than N threads, since you need the MPI communication
> anyway.
>
> --
> Dave Abrahams
> BoostPro Computing
> http://www.boostpro.com
> _______________________________________________
Just temporarily? You would still after that add a layer of multithreading
to each process, and have only 1 process per machine, after that, no?
A 1 process N threads in 1 machine is probably better total wall time than
just N mono threaded processes because of the no need to duplicate the input
memory to the tasks.
The question I really wanted to ask about is that I expect to have M*N
outstanding threads (M computers, N threads in each process) just sitting
there waiting for jobs.
Then from the user interface, I click and that starts 100000 tasks, then it
is spread all over the M machines and N threads in each process. Then result
comes back, displayed...
Then user clicks again and same thing happens.
You're saying this is doable with Boost.MPI + MPI impl?
I wasn't expecting to divide the tasks into finer grained ones. All the
tasks are atomic and have about the same exec time. It's rather pass
100000/M tasks to each machine, then divide this number by N for each thread
in that process. This last bit is up to me to code.
Ideally, the task is just a functor with operator() member and M machines
and N threads are treated similarly. I guess it's up to me to write some
abstraction layer to view the whole M*N in a flat way.
Other questions, more architectural in nature, I'm not sure they are best
asked here?
Regards,
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net