Subject: Re: [boost] RFC - Updated MapReduce library
From: joel (joel.falcou_at_[hidden])
Date: 2009-08-09 06:33:37
Craig Henderson wrote:
> This interface has changed several times and I can't decide the most
> appropriate. I have provided a base class to define the required types, hence the use of
> a function object. However, it is dangerous to use a real functor with an
> instance because the Map Tasks are independent of each other and run in
> different threads. If they had instance data, then synchronization becomes
> an issue, but more importantly, it breaks the programming model. In a true
> distributed system, map tasks will run on separate machines, and therefore
> unable to share data. Support for will is intended for a later release of
> the library, so I need to keep the design pure.
Why not having each thread with a local copy of the functor. Ideally,
those are stateless anyway and thus this copy is mainly free.
Other thing is, why not allowing the use of anything that acts as a
function and provides the correct interface. You'll face the dreaded
legacy code reuse wall if your users can't take their years old
sequential function and turn it into a mapper or reducer. Storing
boost::function inside the implementation to leverage this genericity is
maybe a good idea.
> These stats are very useful for research and testing, but I agree are less
> important in a production environment. The timings need to be built into the
> library infrastructure because the library user does not have access to the
> granularity of timing (without writing a bespoke schedule_policy). I can
> look at making the timing a another policy class, but I don't think the
> overhead is really that significant, is it?
I like when my phone phones and my toaster toasts. When I want a phone
that toast, I like to be able to explicitly decide so ;)
Make it a policy is def. better in my book.
> I'm disappointed you think this. I have worked really hard to make the
> interface as light as possible. If you compare the library interface to
> other implementations such as Phoenix, I hope you'll agree that this library
> is quite light.
It's mainly around the need to have type::other_type::stuff and to have
to check/rememebr which comes before.
An unified thing like result_of<type(user type)>::type looks better.
> I am keen to make it lighter if you can be specific with some suggestions,
This sample code is maybe the FIRST thing to be shown to the user really
as it is far clearer on the intend on how to structurate the library.
> Agreed on the performance figures, and I'll provide some comparisons in the
> future. Jose on this list has helped with some comparison with Phoenix, and
> the results are comparable with the WordCount example. You'll appreciate
> that I am limited to the machines I have access to, and Phoenix isn't
> available on my platform.
I fully understand that and ...
> Only that I am not familiar with openMP, and haven't looked at it. It's
> unlikely that I'll be able to do this, but if someone in the Boost community
> would like to help out, I'd be delighted.
... this is something I can contribute.
> In the documentation I did say that I am not providing a tutorial on
> programming in MapReduce, but maybe I will one day. I do, however, recognize
> that one example does not demonstrate the possibilities for the library, and
> I will be providing more samples in the future.
Nice thing could be small scale examples tied to tasks that one can have
to do in a parallel way and demonstrate that the MapReduce approach adds
a value at some point (paraphrasing Murray Cole statement of "show the
-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk