Subject: Re: [boost] RFC - Updated MapReduce library
From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2009-08-09 12:17:47
> Quoting from the start of your docs:
> "The Boost.MapReduce library is a MapReduce implementation across
> plurality of CPU cores rather than machines."
> Isn't that rather missing the point of what MapReduce is supposed to be
> about? If I'm limited to one machine, I can write parallel code using
> the full repertoire of techniques.
You can, and this is basically just another alternative technique. Writing
multithreaded applications can be difficult, and is often done badly, so
this library provides a framework to do the donkey-work and allow the
developer to concentrate on solving their problem. Other libraries already
exist for single-machine map/reduce (google for "phoenix mapreduce"), and
there's an evaluation paper on it at
> By re-designing my application to
> fit into the MapReduce pattern I can potentially scale it over multiple
> machines. But if I can't scale over multiple machines, why bother?
In this scenario, then don't bother, indeed. But if you want easily to
implement low-lock-contention multithreaded processing, then you might take
> Are you planning to support scaling over multiple machines in the
Yes, I am designing and developing a distributed file system that is aimed
to achieve this (see
integration to any other DFS could do the same.
The library is very much in its infancy, but I believe is useful enough to
be a part of Boost in its single-machine state.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk