Boost logo

Boost Users :

Subject: Re: [Boost-users] [mapreduce] Prim Calculator
From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2009-08-22 05:55:20


> > Where in the sandbox is mapreduce? Thanks for all your comments! I'll
> > get back to you once I can run the prime calculator.
> >
>
> Here's a direct link to the files you need.
>
> https://svn.boost.org/svn/boost/sandbox/boost/mapreduce/job.hpp
> https://svn.boost.org/svn/boost/sandbox/boost/mapreduce/intermediates/i
> n_memory.hpp
>
> -- Craig

I've had more time to look at this now. While the code I posted here
yesterday works with the library in the sandbox, there is a flaw in the
logic that I missed. I have re-written the Prime Calculator with tighter
type correctness and discovered the problem.

Christian, you are using std::size and unsigned for your reduce key/value
types, which allows the emit() to swap keys and values around, so the final
output has the prime numbers in the key field instead of the value. This is
incorrect, and the result should be (true, (3,5,7,...)) and not
((3,0),(5,0),(7,0),...)

This has led to me discovering a problem in the library when iterating over
the results. The assertion in boost/mapreduce/intermediates/in_memory.hpp,
line 141 catches cases where there are multiple Values for a reduce Key.
This should be valid, and the assertion is incorrect, however, the iteration
code cannot currently cope with this. I need to re-visit this and post a
fix.

In the meantime, remove the iteration at the end of main() and define the
job as
typedef
boost::mapreduce::job<prime_calculator::map_task
                     , prime_calculator::reduce_task
                     , boost::mapreduce::null_combiner
                     ,
prime_calculator::number_source<prime_calculator::map_task>
                     ,
boost::mapreduce::intermediates::in_memory<prime_calculator::map_task,prime_
calculator::reduce_task>
                     ,
boost::mapreduce::intermediates::reduce_file_output<prime_calculator::map_ta
sk,prime_calculator::reduce_task>
> job;

Which will create a file mapreduce_2_of_2 containing the Prime Numbers in
column 2. mapreduce_1_of_2 will be empty because the non-Primes are not
emitted in the reduce task.

Regards
-- Craig


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net