Boost logo

Boost :

Subject: Re: [boost] RFC - Updated MapReduce library
From: joel (joel.falcou_at_[hidden])
Date: 2009-08-09 06:33:37

Craig Henderson wrote:
> This interface has changed several times and I can't decide the most
> appropriate. I have provided a base class to define the required types, hence the use of
> a function object. However, it is dangerous to use a real functor with an
> instance because the Map Tasks are independent of each other and run in
> different threads. If they had instance data, then synchronization becomes
> an issue, but more importantly, it breaks the programming model. In a true
> distributed system, map tasks will run on separate machines, and therefore
> unable to share data. Support for will is intended for a later release of
> the library, so I need to keep the design pure.
Why not having each thread with a local copy of the functor. Ideally,
those are stateless anyway and thus this copy is mainly free.
Other thing is, why not allowing the use of anything that acts as a
function and provides the correct interface. You'll face the dreaded
legacy code reuse wall if your users can't take their years old
sequential function and turn it into a mapper or reducer. Storing
boost::function inside the implementation to leverage this genericity is
maybe a good idea.

> These stats are very useful for research and testing, but I agree are less
> important in a production environment. The timings need to be built into the
> library infrastructure because the library user does not have access to the
> granularity of timing (without writing a bespoke schedule_policy). I can
> look at making the timing a another policy class, but I don't think the
> overhead is really that significant, is it?
I like when my phone phones and my toaster toasts. When I want a phone
that toast, I like to be able to explicitly decide so ;)
Make it a policy is def. better in my book.
> I'm disappointed you think this. I have worked really hard to make the
> interface as light as possible. If you compare the library interface to
> other implementations such as Phoenix, I hope you'll agree that this library
> is quite light.
It's mainly around the need to have type::other_type::stuff and to have
to check/rememebr which comes before.
An unified thing like result_of<type(user type)>::type looks better.
> I am keen to make it lighter if you can be specific with some suggestions,
> though?
This sample code is maybe the FIRST thing to be shown to the user really
as it is far clearer on the intend on how to structurate the library.
> Agreed on the performance figures, and I'll provide some comparisons in the
> future. Jose on this list has helped with some comparison with Phoenix, and
> the results are comparable with the WordCount example. You'll appreciate
> that I am limited to the machines I have access to, and Phoenix isn't
> available on my platform.
I fully understand that and ...
> Only that I am not familiar with openMP, and haven't looked at it. It's
> unlikely that I'll be able to do this, but if someone in the Boost community
> would like to help out, I'd be delighted.
... this is something I can contribute.
> In the documentation I did say that I am not providing a tutorial on
> programming in MapReduce, but maybe I will one day. I do, however, recognize
> that one example does not demonstrate the possibilities for the library, and
> I will be providing more samples in the future.
Nice thing could be small scale examples tied to tasks that one can have
to do in a parallel way and demonstrate that the MapReduce approach adds
a value at some point (paraphrasing Murray Cole statement of "show the

Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Boost list run by bdawes at, gregod at, cpdaniel at, john at