Boost logo

Boost :

Subject: Re: [boost] Proposal: MapReduce library (single machine)
From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2009-06-15 15:59:35


> Do you have different kind of parallel scheduling like openMP can have :
> static, dynamic, etc ...

I've already answered this in other threads... the scheduling is implemented
in a policy class so other threading approaches can be used. Current
implementations are Sequential (single thread Map followed by Reduce
phases), and CPU Parallel to maximize CPU core utilization.

> > So, to answer your question, I don't have specific performance metrics
...
> Well, some figures could be nice to at least check we don't go slower
> than on a single CPU ;) A simple scalability test could be already enough.

I'm running some tests and will update the site with performance comparisons
shortly

> * it seems t have a lot of work to be done to take one user function
and turn it into something your library could manage.

Can you expand on this a bit? Sure there is some scaffolding for defining
types and constructing objects, but the WordCount example is just 5 lines
for the Map and 4 lines for the Reduce - that sounds quite lightweight me :)
Seriously, though, I'd like to understand your concern about 'a lot of
work', and hear suggestion on reducing the overhead.

> * it seems we have to write some loop ourselves at some point in the
> mapper and reducer. Can't this be leveraged somehow ? What an end-user
> may want to write is the single element->element sequential function for
> map and the element->element->element fold function to be used on top of
> the element list.

The idea of MapReduce is to map (k1,v1) --> list(k2,v2) and then reduce
(k2,list(v2)) --> list(v2). This inevitably requires iteration over
collections. A generic Map & Reduce task could be written to delegate to
sequential functions as you suggest, but I see this as an extension to the
library rather than a core component.

Thanks
-- Craig


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk