Boost logo

Boost :

Subject: Re: [boost] RFC - Updated MapReduce library
From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2009-08-09 12:17:47


Hi Phil,

> Quoting from the start of your docs:
>
> "The Boost.MapReduce library is a MapReduce implementation across
> a
> plurality of CPU cores rather than machines."
>
> Isn't that rather missing the point of what MapReduce is supposed to be
> about? If I'm limited to one machine, I can write parallel code using
> the full repertoire of techniques.

You can, and this is basically just another alternative technique. Writing
multithreaded applications can be difficult, and is often done badly, so
this library provides a framework to do the donkey-work and allow the
developer to concentrate on solving their problem. Other libraries already
exist for single-machine map/reduce (google for "phoenix mapreduce"), and
there's an evaluation paper on it at
http://csl.stanford.edu/~christos/publications/2007.cmp_mapreduce.hpca.pdf

> By re-designing my application to
> fit into the MapReduce pattern I can potentially scale it over multiple
> machines. But if I can't scale over multiple machines, why bother?

In this scenario, then don't bother, indeed. But if you want easily to
implement low-lock-contention multithreaded processing, then you might take
a look.

> Are you planning to support scaling over multiple machines in the
> future?

Yes, I am designing and developing a distributed file system that is aimed
to achieve this (see
http://craighenderson.co.uk/blog/index.php/tag/distributed-file-system/) or
integration to any other DFS could do the same.

The library is very much in its infancy, but I believe is useful enough to
be a part of Boost in its single-machine state.

Regards
-- Craig


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk