Boost logo

Boost :

Subject: Re: [boost] Proposal: MapReduce library (single machine)
From: Joel Falcou (joel.falcou_at_[hidden])
Date: 2009-06-16 02:19:05


Craig Henderson wrote:
> I've already answered this in other threads... the scheduling is implemented
> in a policy class so other threading approaches can be used. Current
> implementations are Sequential (single thread Map followed by Reduce
> phases), and CPU Parallel to maximize CPU core utilization.
>
I saw that, the question was, for your parallel scheduler, how do you
generate
worklaod for each processor ?
> I'm running some tests and will update the site with performance comparisons
> shortly
>
Great

> The idea of MapReduce is to map (k1,v1) --> list(k2,v2) and then reduce
> (k2,list(v2)) --> list(v2). This inevitably requires iteration over
> collections. A generic Map & Reduce task could be written to delegate to
> sequential functions as you suggest, but I see this as an extension to the
> library rather than a core component.
Well, canonically, running a map function only require the
(k1,v1)->(k2,v2) funcion.
The sequence iteration is leveraged by the map skeleton. Similary for
Reduce where
a fold like function is strictly needed. Having to specify how to
iterate over the sequence
is uneeded IMHO and add clutter to what you need to write. I don't see
an actual improvement on this
point if I still have to iterate myself on my data and just use yopur
tool to generate the scheduling.
I can do it by hand with a thread_pool and it won't be more verbose.

An "optimal" way to have this should be :

map_reduce<SomeSchedulingPolicy>( input_seq, output_seq, map_func,
reduce_func)

and having xxx_seq be conforming to some IterableSequence concept and
have xxx_func be functions object or PFO conforming
to the standard map/fold prototype. Instrospection on ypes and presence
of given methods/functions then helps finding how to
iterate over the sequence (using type_traits and suc) and generate the
appropriate, optimized iteration code calling map and fold where it should.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk