Re: [boost] Proposal: MapReduce library (single machine)

15 Jun 2009


      Craig Henderson wrote:
...
I have designed the class infrastructure to be as flexible as possible using
templates. Job scheduling is a particular interest of mine, and is a policy
that can be specified. The current 'library' includes two schedulers,
mapreduce::schedule_policy::cpu_parallel as in the example which maximises
the use of the CPU cores in the machine, and
mapreduce::schedule_policy::sequential which runs one Map task followed by
one Reduce task. This is useful for debugging the algorithms.
Do you have different kind of parallel scheduling like openMP can have : 
static, dynamic, etc ...
...
So, to answer your question, I don't have specific performance metrics and
comparisons that I can shared with you at this time. The principle for the
library is that everything is templated (policy-based) so can be swapped
around and re-implemented to best suite the needs of the application. The
supplied implementations provide the framework and a decent implementationof the policies, but will not be optimal for all users
Well, some figures could be nice to at least check we don't go slower 
than on a single CPU ;) A simple scalability test could be already enough.
THe other quirks I have are :
 * it seems t have a lot of work to be done to take one user function 
and turn it into something your library could manage.
 * it seems we have to write some loop ourselves at some point in the 
mapper and reducer. Can't this be leveraged somehow ? What an end-user 
may want to write is the single element->element sequential function for 
map and the element->element->element fold function to be used on top of 
the element list.

-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Re: [boost] Proposal: MapReduce library (single machine)

joel