|
Boost-Commit : |
Subject: [Boost-commit] svn:boost r55134 - in sandbox/libs/mapreduce: . doc test test/wordcount
From: cdm.henderson_at_[hidden]
Date: 2009-07-23 15:04:46
Author: chenderson
Date: 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
New Revision: 55134
URL: http://svn.boost.org/trac/boost/changeset/55134
Log:
Initial upload; based on v0.2 from the Boost Vault.
Added:
sandbox/libs/mapreduce/
sandbox/libs/mapreduce/doc/
sandbox/libs/mapreduce/doc/future.html (contents, props changed)
sandbox/libs/mapreduce/doc/index.html (contents, props changed)
sandbox/libs/mapreduce/doc/platform.html (contents, props changed)
sandbox/libs/mapreduce/doc/schedule_policies.html (contents, props changed)
sandbox/libs/mapreduce/doc/tutorial.html (contents, props changed)
sandbox/libs/mapreduce/doc/wordcount.html (contents, props changed)
sandbox/libs/mapreduce/mapreduce.sln (contents, props changed)
sandbox/libs/mapreduce/mapreduce.vcproj (contents, props changed)
sandbox/libs/mapreduce/test/
sandbox/libs/mapreduce/test/wordcount/
sandbox/libs/mapreduce/test/wordcount/wordcount.cpp (contents, props changed)
sandbox/libs/mapreduce/test/wordcount/wordcount.vcproj (contents, props changed)
Added: sandbox/libs/mapreduce/doc/future.html
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/doc/future.html 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,137 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+ <title>Boost.MapReduce Future Work</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <link href="http://www.boost.org/favicon.ico" rel="icon" type="http://www.boost.org/image/ico" />
+ <link rel="stylesheet" type="text/css" href="http://www.boost.org/style/basic.css" />
+</head>
+
+<body>
+ <div id="heading">
+ <div id="heading-placard"></div>
+
+ <h1 id="heading-title"><a href="/"><img src="http://www.boost.org/gfx/space.png" alt=
+ "Boost C++ Libraries" id="heading-logo" /><span id="boost">Boost</span>
+ <span id="cpplibraries">C++ Libraries</span></a></h1>
+
+ <p id="heading-quote"><span class="quote">“...one of the most highly
+ regarded and expertly designed C++ library projects in the
+ world.”</span> <span class="attribution">— <a href=
+ "http://www.gotw.ca/" class="external">Herb Sutter</a> and <a href=
+ "http://en.wikipedia.org/wiki/Andrei_Alexandrescu" class="external">Andrei
+ Alexandrescu</a>, <a href=
+ "http://safari.awprofessional.com/?XmlId=0321113586" class="external">C++
+ Coding Standards</a></span></p>
+ </div>
+
+ <div id="body">
+ <div id="body-inner">
+ <div id="content">
+ <div class="section">
+ <div class="section-0">
+ <div class="section-title">
+ <h1>Boost.MapReduce Future Work</h1>
+ <em>Note: This library is not yet part of the Boost Library and is still under development and review.</em>
+ </div>
+
+ <div class="section-body">
+ <p>
+ This is the first release of the MapReduce library, and there are a few features
+ that I'd still like to do.
+ </p>
+ <ul>
+ <li>
+ <p>Improve support for other platforms. This will require help from the Boost development community.</p>
+ </li>
+ <li>
+ <p>Add a <code>PartioningFunction</code> parameter in <code>local_disk</code> intermediate
+ handler to enable custominsation of the partitioning of data into the final result files.</p>
+ </li>
+ <li>
+ <p>Add a template to the <code>SortFn</code> sort function to prevent expansion of duplicates
+ if required. (For example, this expansion contradicts the <code>combiner</code> in wordcount,
+ and eliminating the two would improve performance considerably).</p>
+ </li>
+ <li>
+ <p>The only intermediate handler currently provided by the library is the <code>intermediates::local_disk<></code>
+ policy class. An early implementation of the library used in-memory storage for intermediates, and it
+ may be useful to redevelop this as a fully-fledged intermediate policy class.</p>
+ </li>
+ <li>
+ <p>An extension to the <code>intermediates::local_disk<></code> policy class could be to compress
+ the intermediate files, using the Boost.Iostreams zip/bzip2 compression libraries. This is a
+ long-term item that will be very useful when the library is extended to supported cross-machine
+ MapReduce. Until then, the value is very limited.</p>
+ </li>
+ </ul>
+ <h2>Multiple Machine Support</h2>
+ <p>
+ MapReduce was originally designed as a mechanism for working on large datasets across many (1000s) of
+ commodity servers. The current Boost library works across a plurality of CPU cores on a single machine.
+ There is a big jump to multi-machine support, so this is a long-term goal, but a goal nonetheless.
+ </p>
+ <h2>Distributed File System</h2>
+ <p>
+ To support the MapReduce across multiple machines, some form of distributed file system is required. I
+ have <a href='http://craighenderson.co.uk/blog/index.php/tag/distributed-file-system/'>begun development
+ of one using Boost libraries</a> (primarily Boost.FileSystem and Boost.Asio). The
+ question is going to be whether this really sits within Boost as a C++ library, or whether it is really
+ a runtime environment for MapReduce to sit atop. My feeling is that there is some value in having a scalable
+ and resilient DFS which is peerless and heterogenous across all platforms as a library that can be built into
+ an application, but whether that is the really remains to be seen.
+ </p>
+ </div>
+ </div>
+ </div>
+ </div>
+ <div id="sidebar">
+ <a accesskey="p" href="./platform.html"><img src="http://www.boost.org/doc/html/images/prev.png" alt="Prev" /></a>
+ <a accesskey="u" href="http://www.boost.org/doc/libs"><img src="http://www.boost.org/doc/html/images/up.png" alt="Up" /></a>
+ <a accesskey="h" href="http://www.boost.org/"><img src="http://www.boost.org/doc/html/images/home.png" alt="Home" /></a>
+
+ <hr />
+ <p><a href='./index.html'>Boost.MapReduce</a></p>
+ <p><a href='./tutorial.html'>Tutorial</a></p>
+ <p><a href='./wordcount.html'>Example</a></p>
+ <hr />
+ <p><a href='./schedule_policies.html'>Schedule Policies</a></p>
+ <p><a href='./platform.html'>Platform Notes</a></p>
+ <p><a href='./future.html'>Future Work</a></p>
+ </div>
+ <div class="clear"></div>
+ </div>
+ </div>
+
+ <div id="footer">
+ <div id="footer-left">
+
+ <div id="copyright">
+ <p>Copyright (C) 2009 Craig Henderson.</p>
+ </div> <div id="license">
+ <p>Distributed under the <a href="/LICENSE_1_0.txt" class=
+ "internal">Boost Software License, Version 1.0</a>.</p>
+ </div>
+ </div>
+
+ <div id="footer-right">
+ <div id="banners">
+ <p id="banner-xhtml"><a href="http://validator.w3.org/check?uri=referer"
+ class="external">XHTML 1.0</a></p>
+
+ <p id="banner-css"><a href=
+ "http://jigsaw.w3.org/css-validator/check/referer" class=
+ "external">CSS</a></p>
+
+ <p id="banner-osi"><a href=
+ "http://www.opensource.org/docs/definition.php" class="external">OSI
+ Certified</a></p>
+ </div>
+ </div>
+
+ <div class="clear"></div>
+ </div>
+</body>
+</html>
\ No newline at end of file
Added: sandbox/libs/mapreduce/doc/index.html
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/doc/index.html 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,194 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+ <title>Boost.MapReduce Documentation</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <link href="http://www.boost.org/favicon.ico" rel="icon" type="http://www.boost.org/image/ico" />
+ <link rel="stylesheet" type="text/css" href="http://www.boost.org/style/basic.css" />
+</head>
+
+<body>
+ <div id="heading">
+ <div id="heading-placard"></div>
+
+ <h1 id="heading-title"><a href="/"><img src="http://www.boost.org/gfx/space.png" alt=
+ "Boost C++ Libraries" id="heading-logo" /><span id="boost">Boost</span>
+ <span id="cpplibraries">C++ Libraries</span></a></h1>
+
+ <p id="heading-quote"><span class="quote">“...one of the most highly
+ regarded and expertly designed C++ library projects in the
+ world.”</span> <span class="attribution">— <a href=
+ "http://www.gotw.ca/" class="external">Herb Sutter</a> and <a href=
+ "http://en.wikipedia.org/wiki/Andrei_Alexandrescu" class="external">Andrei
+ Alexandrescu</a>, <a href=
+ "http://safari.awprofessional.com/?XmlId=0321113586" class="external">C++
+ Coding Standards</a></span></p>
+
+ </div>
+
+ <div id="body">
+ <div id="body-inner">
+ <div id="content">
+ <div class="section">
+ <div class="section-0">
+ <div class="section-title">
+ <h1>Boost.MapReduce</h1>
+ <em>Note: This library is not yet part of the Boost Library and is still under development and review.</em>
+ </div>
+
+ <div class="section-body">
+ <p><em>Copyright © 2009 Craig Henderson</em></p>
+ <p>Distributed under the Boost Software License, Version 1.0.<br />(See accompanying file LICENSE_1_0.txt
+ or copy at <a href='http://www.boost.org/LICENSE_1_0.txt' target='_blank'>http://www.boost.org/LICENSE_1_0.txt>)</p>
+
+ <h2>Motivation</h2>
+ <p>
+ MapReduce is a programming model and distributed processing platform implementation for generating and
+ processing large data sets using clusters of computers. Pioneered by Google and first presented in 2004,
+ the MapReduce programming model has gained significant momentum in commercial, research and open-source
+ projects since, and Google have updated and republished their seminal paper in 2008.
+ </p>
+ <p>
+ The scalability achieved using MapReduce to implement data processing across a large volume of CPUs, whether
+ on a single server or multiple machines is an attractive proposition. The Boost.MapReduce library is a
+ MapReduce implementation across a plurality of CPU cores rather than machines. The library is implemented
+ as a set of C++ class templates, and is a header-only library. It does, however, depend upon many other
+ Boost libraries, such as Boost.System, Boost.FileSystem and Boost.Thread.
+ </p>
+ <h2>Other Implementations</h2>
+ <p>
+ The Google MapReduce framework is written in C++ and is not made available publically. Hadoop is an Apache
+ project implementation of MapReduce, originally developed as an infrastructure for the Nutch Java Search
+ Engine project. Hadoop is written in Java, with interfaces to a number of programming languages including
+ C++ and Python. This system includes a distributed file system HDFS (Hadoop Distributed File System), which
+ is highly fault-tolerant and designed to be deployed on low-cost hardware. HDFS provides high throughput
+ access to application data and is suitable for applications that have large data sets.
+ </p>
+ <p>
+ Phoenix is a shared-memory implementation of MapReduce. Phoenix can be used to program multi-core chips as
+ well as shared-memory multiprocessors (SMPs and ccNUMAs) and is available from the original authors for the
+ Sun Solaris operating system. A port to the Linux operating system is also available. The Phoenix source code
+ is distributed under a BSD license and the copyright is held by Stanford University.
+ </p>
+ <p>
+ Phoenix runs on a single computer and implements MapReduce across a plurality of CPU cores rather than machines
+ as in the Google and Hadoop implementations. This single-machine restriction simplifies the architecture
+ significantly. In place of the distributed file system, Phoenix uses shared memory model for storing data to be
+ processed, and the results. Each Map or Reduce task runs on a CPU core and the Phoenix runtime is responsible
+ for consolidating results and load balancing (allocating data to Map and Reduce tasks). The complexities of
+ network communication and fault tolerance are not required for the Phoenix framework on a single server.
+ </p>
+ <h1>Change History</h1>
+ <dl class="fields">
+ <dt>21st July 2009</dt>
+ <dd>
+ <a href='http://www.boostpro.com/vault/index.php?action=downloadfile&filename=mapreduce_0_2.zip&directory=&'>
+ DOWNLOAD v0.2
+ </a><br />
+ <ul>
+ <li>Moved the library into the <code>boost</code> namespace.</li>
+ <li>Created <code>PartitionFn</code> template parameter on <code>intermediates::local_disk</code> to
+ enable customisation of the partitioning of data into result files.</li>
+ <li>Use of <code>BOOST_THROW_EXCEPTION</code> in place of <code>throw</code>.</li>
+ <li>Rationalised and completed include guards</li>
+ <li>Support for gcc 4.3.3 on Ubuntu Linux</li>
+ </ul>
+ </dd>
+ </dl>
+ <dl class="fields">
+ <dt>19th July 2009</dt>
+ <dd>
+ <a href='http://www.boostpro.com/vault/index.php?action=downloadfile&filename=mapreduce_0_1.zip&directory=&'>
+ DOWNLOAD v0.1
+ </a><br />
+ Initial public release on Boost Vault<br />
+ </dd>
+ </dl>
+ <h1>References</h1>
+ <dl class="fields">
+ <dt>Title</dt>
+ <dd>MapReduce: Simplified Data Processing on Large Clusters</dd>
+ <dt>Author(s)</dt>
+ <dd>Jeffrey Dean and Sanjay Ghemawat</dd>
+ <dt>Appeared in</dt>
+ <dd>OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.</dd>
+ <dt>URL</dt>
+ <dd><a target="_blank" href='http://labs.google.com/papers/mapreduce.html'>http://labs.google.com/papers/mapreduce.html></dd>
+ </dl>
+
+ <dl class="fields">
+ <dt>Title</dt>
+ <dd>MapReduce: Simplified Data Processing on Large Clusters</dd>
+ <dt>Author(s)</dt>
+ <dd>Jeffrey Dean and Sanjay Ghemawat</dd>
+ <dt>Appeared in</dt>
+ <dd>Communications of the ACM 51(1) January 2008</dd>
+ <dt>URL</dt>
+ <dd><a target="_blank" href='http://portal.acm.org/citation.cfm?id=1327492'>http://portal.acm.org/citation.cfm?id=1327492></dd>
+ </dl>
+
+
+ <dl class="fields">
+ <dt>Title</dt>
+ <dd>Evaluating MapReduce for Multi-core and Multiprocessor Systems</dd>
+ <dt>Author(s)</dt>
+ <dd>Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., & Kozyrakis, C.</dd>
+ <dt>Appeared in</dt>
+ <dd>Proceedings of the 13th Intl. Symposium on High-Performance Computer Architecture (HPCA). Phoenix, AZ.</dd>
+ <dt>URL</dt>
+ <dd><a target="_blank" href='http://mapreduce.stanford.edu/'>http://mapreduce.stanford.edu/></dd>
+ </dl>
+ </div>
+ </div>
+ </div>
+ </div>
+ <div id="sidebar">
+ <a accesskey="u" href="http://www.boost.org/doc/libs"><img src="http://www.boost.org/doc/html/images/up.png" alt="Up" /></a>
+ <a accesskey="h" href="http://www.boost.org/"><img src="http://www.boost.org/doc/html/images/home.png" alt="Home" /></a>
+ <a accesskey="n" href="./tutorial.html"><img src="http://www.boost.org/doc/html/images/next.png" alt="Next" /></a>
+
+ <hr />
+ <p><a href='./index.html'>Boost.MapReduce</a></p>
+ <p><a href='./tutorial.html'>Tutorial</a></p>
+ <p><a href='./wordcount.html'>Example</a></p>
+ <hr />
+ <p><a href='./schedule_policies.html'>Schedule Policies</a></p>
+ <p><a href='./platform.html'>Platform Notes</a></p>
+ <p><a href='./future.html'>Future Work</a></p>
+ </div>
+ <div class="clear"></div>
+ </div>
+ </div>
+
+ <div id="footer">
+ <div id="footer-left">
+
+ <div id="copyright">
+ <p>Copyright (C) 2009 Craig Henderson.</p>
+ </div> <div id="license">
+ <p>Distributed under the <a href="/LICENSE_1_0.txt" class=
+ "internal">Boost Software License, Version 1.0</a>.</p>
+ </div>
+ </div>
+
+ <div id="footer-right">
+ <div id="banners">
+ <p id="banner-xhtml"><a href="http://validator.w3.org/check?uri=referer"
+ class="external">XHTML 1.0</a></p>
+
+ <p id="banner-css"><a href=
+ "http://jigsaw.w3.org/css-validator/check/referer" class=
+ "external">CSS</a></p>
+
+ <p id="banner-osi"><a href=
+ "http://www.opensource.org/docs/definition.php" class="external">OSI
+ Certified</a></p>
+ </div>
+ </div>
+
+ <div class="clear"></div>
+ </div>
+</body>
+</html>
Added: sandbox/libs/mapreduce/doc/platform.html
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/doc/platform.html 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,200 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+ <title>Boost.MapReduce platform notes</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <link href="http://www.boost.org/favicon.ico" rel="icon" type="http://www.boost.org/image/ico" />
+ <link rel="stylesheet" type="text/css" href="http://www.boost.org/style/basic.css" />
+</head>
+
+<body>
+ <div id="heading">
+ <div id="heading-placard"></div>
+
+ <h1 id="heading-title"><a href="/"><img src="http://www.boost.org/gfx/space.png" alt=
+ "Boost C++ Libraries" id="heading-logo" /><span id="boost">Boost</span>
+ <span id="cpplibraries">C++ Libraries</span></a></h1>
+
+ <p id="heading-quote"><span class="quote">“...one of the most highly
+ regarded and expertly designed C++ library projects in the
+ world.”</span> <span class="attribution">— <a href=
+ "http://www.gotw.ca/" class="external">Herb Sutter</a> and <a href=
+ "http://en.wikipedia.org/wiki/Andrei_Alexandrescu" class="external">Andrei
+ Alexandrescu</a>, <a href=
+ "http://safari.awprofessional.com/?XmlId=0321113586" class="external">C++
+ Coding Standards</a></span></p>
+
+ </div>
+
+ <div id="body">
+ <div id="body-inner">
+ <div id="content">
+ <div class="section">
+ <div class="section-0">
+ <div class="section-title">
+ <h1>Boost.MapReduce platform notes</h1>
+ <em>Note: This library is not yet part of the Boost Library and is still under development and review.</em>
+ </div>
+
+ <div class="section-body">
+ <h2>Microsoft Windows and MSVC 8 (2005)</h2>
+ <p>
+ This library has been developed and tested using Micrsoft Visual C++ v8, aka Visual Studio 2005.
+ The code compiles cleanly for and runs as 32bit and 64bit processes on Windows XP 32Bit and Windows
+ 2003 Server 64Bit Edition.</p>
+ <h2>STL</h2>
+ <p>
+ The STL implementation supplied with Micrsoft Visual C++ v8 suffers significant performance
+ problems as it includes indiscriminate fine granularity synchronisation locking. The MapReduce
+ library is designed to be a high performance library and partitions data such that multiple threads
+ can process data independently of other threads. The unnecessary overhead of locking in MSVC8's STL
+ library negates some of the high-performance benefits of the library.
+ </p>
+ <p>
+ I therefore recommend using an alternative STL implementation to achieve maximum performance. I have
+ tested the library with STLPort 5.2.1, compiled without thread support
+ <pre>STLport-5.2.1>configure msvc8 -p winxp -x --without-thread --with-dynamic-rtl</pre> and have seen
+ significant time differences. Using the <a href='./wordcount.html'>Word Count example</a> on a sample
+ dataset consists of six plain text files consisting a total of 90.8 MB (95,284,354 bytes), the STLPort
+ version ran in 26% of the time taken using the MSVC STL.
+ </p>
+<pre>
+MapReduce Wordcount Application
+2 CPU cores
+class mapreduce::job<class wordcount::map_task,class wordcount::reduce_task,clas
+s wordcount::combiner,class mapreduce::datasource::directory_iterator<class word
+count::map_task>,class mapreduce::intermediates::local_disk<class wordcount::map
+_task,struct mapreduce::detail::file_sorter,struct mapreduce::detail::file_merge
+r> >
+
+Running CPU Parallel MapReduce...
+CPU Parallel MapReduce Finished.
+
+MapReduce statistics:
+ MapReduce job runtime : 434 seconds, of which...
+ Map phase runtime : 418 seconds
+ Reduce phase runtime : 16 seconds
+
+ Map:
+ Total Map keys : 6
+ Map keys processed : 6
+ Map key processing errors : 0
+ Number of Map Tasks run (in parallel) : 2
+ Fastest Map key processed in : 8 seconds
+ Slowest Map key processed in : 389 seconds
+ Average time to process Map keys : 81 seconds
+
+ Reduce:
+ Number of Reduce Tasks run (in parallel): 2
+ Number of Result Files : 10
+ Fastest Reduce key processed in : 2 seconds
+ Slowest Reduce key processed in : 4 seconds
+ Average time to process Reduce keys : 5 seconds</pre>
+<pre>
+MapReduce Wordcount Application
+2 CPU cores
+class mapreduce::job<class wordcount::map_task,class wordcount::reduce_task,clas
+s wordcount::combiner,class mapreduce::datasource::directory_iterator<class word
+count::map_task>,class mapreduce::intermediates::local_disk<class wordcount::map
+_task,struct mapreduce::detail::file_sorter,struct mapreduce::detail::file_merge
+r> >
+
+Running CPU Parallel MapReduce...
+CPU Parallel MapReduce Finished.
+
+MapReduce statistics:
+ MapReduce job runtime : 116 seconds, of which...
+ Map phase runtime : 114 seconds
+ Reduce phase runtime : 2 seconds
+
+ Map:
+ Total Map keys : 6
+ Map keys processed : 6
+ Map key processing errors : 0
+ Number of Map Tasks run (in parallel) : 2
+ Fastest Map key processed in : 1 seconds
+ Slowest Map key processed in : 112 seconds
+ Average time to process Map keys : 19 seconds
+
+ Reduce:
+ Number of Reduce Tasks run (in parallel): 2
+ Number of Result Files : 10
+ Fastest Reduce key processed in : 0 seconds
+ Slowest Reduce key processed in : 1 seconds
+ Average time to process Reduce keys : 0 seconds
+</pre>
+ <h2>gcc 3.4.4 under cygwin</h2>
+ <p>
+ I have successfully compiled using GCC 3.4.4 under Cygwin, but do not have a full
+ development environment with Boost et al. to run any tests.</p>
+ <pre>$ g++ -Wall -c -DLINUX -I../../../.. -I/cygdrive/c/root/Development/Library/Boost/boost_1_39_0 wordcount.cpp</pre>
+ <p>
+ There are also some missing functions in the <code>linux_os</code> namespace which
+ I have not implemented. Any help implementing these for non-Windows platforms is appreciated.</p>
+<pre>
+namespace linux_os {
+ unsigned const number_of_cpus(void); // !!! not implemented
+ std::string &get_temporary_filename(std::string &pathname); // !!! not implemented
+} // namespace linux_os
+</pre>
+ <h2>gcc 4.3.3 on Ubuntu Linux 9.04</h2>
+ <p>
+ I have successfully compiled using GCC 4.3.3 on Ubuntu Linux 9.04 (32bit), but do not yet have a full
+ development environment with Boost et al. to run any tests.</p>
+ <pre>$ g++ -Wall -c -DLINUX -I../../../.. -I/cygdrive/c/root/Development/Library/Boost/boost_1_39_0 wordcount.cpp</pre>
+
+ </div>
+ </div>
+ </div>
+ </div>
+ <div id="sidebar">
+ <a accesskey="p" href="./schedule_policies.html"><img src="http://www.boost.org/doc/html/images/prev.png" alt="Prev" /></a>
+ <a accesskey="u" href="http://www.boost.org/doc/libs"><img src="http://www.boost.org/doc/html/images/up.png" alt="Up" /></a>
+ <a accesskey="h" href="http://www.boost.org/"><img src="http://www.boost.org/doc/html/images/home.png" alt="Home" /></a>
+ <a accesskey="n" href="./future.html"><img src="http://www.boost.org/doc/html/images/next.png" alt="Next" /></a>
+
+ <hr />
+ <p><a href='./index.html'>Boost.MapReduce</a></p>
+ <p><a href='./tutorial.html'>Tutorial</a></p>
+ <p><a href='./wordcount.html'>Example</a></p>
+ <hr />
+ <p><a href='./schedule_policies.html'>Schedule Policies</a></p>
+ <p><a href='./platform.html'>Platform Notes</a></p>
+ <p><a href='./future.html'>Future Work</a></p>
+ </div>
+ <div class="clear"></div>
+ </div>
+ </div>
+
+ <div id="footer">
+ <div id="footer-left">
+
+ <div id="copyright">
+ <p>Copyright (C) 2009 Craig Henderson.</p>
+ </div> <div id="license">
+ <p>Distributed under the <a href="/LICENSE_1_0.txt" class=
+ "internal">Boost Software License, Version 1.0</a>.</p>
+ </div>
+ </div>
+
+ <div id="footer-right">
+ <div id="banners">
+ <p id="banner-xhtml"><a href="http://validator.w3.org/check?uri=referer"
+ class="external">XHTML 1.0</a></p>
+
+ <p id="banner-css"><a href=
+ "http://jigsaw.w3.org/css-validator/check/referer" class=
+ "external">CSS</a></p>
+
+ <p id="banner-osi"><a href=
+ "http://www.opensource.org/docs/definition.php" class="external">OSI
+ Certified</a></p>
+ </div>
+ </div>
+
+ <div class="clear"></div>
+ </div>
+</body>
+</html>
\ No newline at end of file
Added: sandbox/libs/mapreduce/doc/schedule_policies.html
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/doc/schedule_policies.html 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,132 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+ <title>Boost.MapReduce Schedule Policies</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <link href="http://www.boost.org/favicon.ico" rel="icon" type="http://www.boost.org/image/ico" />
+ <link rel="stylesheet" type="text/css" href="http://www.boost.org/style/basic.css" />
+</head>
+
+<body>
+ <div id="heading">
+ <div id="heading-placard"></div>
+
+ <h1 id="heading-title"><a href="/"><img src="http://www.boost.org/gfx/space.png" alt=
+ "Boost C++ Libraries" id="heading-logo" /><span id="boost">Boost</span>
+ <span id="cpplibraries">C++ Libraries</span></a></h1>
+
+ <p id="heading-quote"><span class="quote">“...one of the most highly
+ regarded and expertly designed C++ library projects in the
+ world.”</span> <span class="attribution">— <a href=
+ "http://www.gotw.ca/" class="external">Herb Sutter</a> and <a href=
+ "http://en.wikipedia.org/wiki/Andrei_Alexandrescu" class="external">Andrei
+ Alexandrescu</a>, <a href=
+ "http://safari.awprofessional.com/?XmlId=0321113586" class="external">C++
+ Coding Standards</a></span></p>
+
+ </div>
+
+ <div id="body">
+ <div id="body-inner">
+ <div id="content">
+ <div class="section">
+ <div class="section-0">
+ <div class="section-title">
+ <h1>Boost.MapReduce Schedule Policies</h1>
+ <em>Note: This library is not yet part of the Boost Library and is still under development and review.</em>
+ </div>
+
+ <div class="section-body">
+ <p>
+ <em>Schedule Policies</em> are used by the MapReduce runtime system to schedule
+ execution of Map and Reduce tasks. The policy is specified in the call to
+ <code>mapreduce::job::run()</code>, which has two variants for coding convenience.
+ </p>
+<pre>
+template<typename SchedulePolicy>
+void run(specification const &spec, results &result);
+
+template<typename SchedulePolicy>
+void run(SchedulePolicy &schedule, specification const &spec, results &result);
+</pre>
+ <p>
+ Both overloads of <code>run()</code> are template functions where the template parameter
+ is a <code>SchedulePolicy</code>. The first variant will default construct a schedule policy
+ class, and the second variant will use the supplied policy class. This enables the library
+ user to develop their own scheduler policies that may need configuration before being used.
+ </p>
+ <p>
+ Boost.MapReduce provides two Schedule Policy implementations in the <code>mapreduce::schedule_policy</code>
+ namespace; <code>sequential</code> and <code>cpu_parallel</code>.
+ </p>
+ <h2>sequential</h2>
+ <p>
+ The <code>sequential</code> schedule policy runs the MapReduce job on the main execution thread,
+ first running a single Map Task followed by a number of Reduce Tasks in sequence. This schedule
+ policy provides a simple MapReduce execution system without any multi-threaded activity. While
+ unlikely to be useful in a production system, it is a very useful policy to aid debugging of a
+ MapReduce-implemented algorithm.
+ </p>
+ <h2>cpu_parallel</h2>
+ <p>
+ The <code>cpu_parallel</code> schedule policy is the main scheduling algorithm for Boost.MapReduce.
+ The class implements a multi-threaded execution of multiple simultaneous Map tasks followed by multiple
+ simultaneous Reduce tasks. Statistics from the individual Map and Reduce tasks are then collated into
+ statistics for the Job as a whole.
+ </p>
+ <p>The <em>Boost.Threads</em> library is used for the multi-threading to ensure portability is maximised.</p>
+ </div>
+ </div>
+ </div>
+ </div>
+ <div id="sidebar">
+ <a accesskey="p" href="./wordcount.html"><img src="http://www.boost.org/doc/html/images/prev.png" alt="Prev" /></a>
+ <a accesskey="u" href="http://www.boost.org/doc/libs"><img src="http://www.boost.org/doc/html/images/up.png" alt="Up" /></a>
+ <a accesskey="h" href="http://www.boost.org/"><img src="http://www.boost.org/doc/html/images/home.png" alt="Home" /></a>
+ <a accesskey="n" href="./platform.html"><img src="http://www.boost.org/doc/html/images/next.png" alt="Next" /></a>
+
+ <hr />
+ <p><a href='./index.html'>Boost.MapReduce</a></p>
+ <p><a href='./tutorial.html'>Tutorial</a></p>
+ <p><a href='./wordcount.html'>Example</a></p>
+ <hr />
+ <p><a href='./schedule_policies.html'>Schedule Policies</a></p>
+ <p><a href='./platform.html'>Platform Notes</a></p>
+ <p><a href='./future.html'>Future Work</a></p>
+ </div>
+ <div class="clear"></div>
+ </div>
+ </div>
+
+ <div id="footer">
+ <div id="footer-left">
+
+ <div id="copyright">
+ <p>Copyright (C) 2009 Craig Henderson.</p>
+ </div> <div id="license">
+ <p>Distributed under the <a href="/LICENSE_1_0.txt" class=
+ "internal">Boost Software License, Version 1.0</a>.</p>
+ </div>
+ </div>
+
+ <div id="footer-right">
+ <div id="banners">
+ <p id="banner-xhtml"><a href="http://validator.w3.org/check?uri=referer"
+ class="external">XHTML 1.0</a></p>
+
+ <p id="banner-css"><a href=
+ "http://jigsaw.w3.org/css-validator/check/referer" class=
+ "external">CSS</a></p>
+
+ <p id="banner-osi"><a href=
+ "http://www.opensource.org/docs/definition.php" class="external">OSI
+ Certified</a></p>
+ </div>
+ </div>
+
+ <div class="clear"></div>
+ </div>
+</body>
+</html>
\ No newline at end of file
Added: sandbox/libs/mapreduce/doc/tutorial.html
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/doc/tutorial.html 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,182 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+ <title>Boost.MapReduce Tutorial</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <link href="http://www.boost.org/favicon.ico" rel="icon" type="http://www.boost.org/image/ico" />
+ <link rel="stylesheet" type="text/css" href="http://www.boost.org/style/basic.css" />
+</head>
+
+<body>
+ <div id="heading">
+ <div id="heading-placard"></div>
+
+ <h1 id="heading-title"><a href="/"><img src="http://www.boost.org/gfx/space.png" alt=
+ "Boost C++ Libraries" id="heading-logo" /><span id="boost">Boost</span>
+ <span id="cpplibraries">C++ Libraries</span></a></h1>
+
+ <p id="heading-quote"><span class="quote">“...one of the most highly
+ regarded and expertly designed C++ library projects in the
+ world.”</span> <span class="attribution">— <a href=
+ "http://www.gotw.ca/" class="external">Herb Sutter</a> and <a href=
+ "http://en.wikipedia.org/wiki/Andrei_Alexandrescu" class="external">Andrei
+ Alexandrescu</a>, <a href=
+ "http://safari.awprofessional.com/?XmlId=0321113586" class="external">C++
+ Coding Standards</a></span></p>
+
+ </div>
+
+ <div id="body">
+ <div id="body-inner">
+ <div id="content">
+ <div class="section">
+ <div class="section-0">
+ <div class="section-title">
+ <h1>Boost.MapReduce Tutorial</h1>
+ <em>Note: This library is not yet part of the Boost Library and is still under development and review.</em>
+ </div>
+
+ <div class="section-body">
+ <p>This tutorial introduces the concepts and framework for MapReduce programming using the Boost library.
+ Note that it is NOT a tutorial on the MapReduce programming idiom itself. Maybe that will follow one day...</p>
+ <h2>Principles</h2>
+ <p>
+ As a library user, you specify a <em>map</em> function object that processes a key/value pair to generate
+ a set of intermediate key/value pairs, and a <em>reduce</em> function object that merges all intermediate
+ values associated with the same intermediate key. These function objects are call MapTask and ReduceTask
+ respectively.
+ </p>
+<pre>
+map (k1,v1) --> list(k2,v2)
+reduce (k2,list(v2)) --> list(v2)</pre>
+ <h2>MapReduce Job</h2>
+ <p>
+ A single instance of execution in MapReduce is called a Job, and is implemented by <code>boost::mapreduce::job</code>.
+ The simplest definition of a MapReduce Job type just specifies the user-defined MapTask and ReduceTask:</p>
+<pre>typedef
+mapreduce::job<
+ wordcount::map_task,
+ wordcount::reduce_task>
+job;
+</pre>
+ <p>
+ The library's <code>job</code> class provides for more configuration than this, though.
+ <!-- !!! See <a href='./job.html'>Job</a> for more information. -->
+ </p>
+<pre>template<typename MapTask,
+ typename ReduceTask,
+ typename Combiner=null_combiner,
+ typename Datasource=datasource::directory_iterator<MapTask>,
+ typename IntermediateStore=intermediates::local_disk<MapTask> >
+class job;
+</pre>
+ <h2>MapTask</h2>
+ <p>Requirements of a MapTask function object are</p>
+ <ul>
+ <li>Provide type definitions for Map Key (<code>k1</code>) and Map Value (<code>v1</code>);
+ <code>key_type</code> and <code> value_type</code></li>
+ <li>Provide type definitions for Intermediate Key (<code>k2</code>) and Intermediate Value (<code>v2</code>);
+ <code>intermediate_key_type</code> and <code> intermediate_value_type</code></li>
+ <li>Define a constructor taking a <code>job::map_task_runner</code> object by reference</li>
+ <li>Store a reference to the <code>job::map_task_runner</code> object passed to the constructor,
+ to be used to emit intermediate results</li>
+ <li>Define a function-call operator <code>void operator()(key_type const &key, value_type
+ const &value);</code> Note that the <code>const</code> qualifiers on these parameters
+ are optional, but recommended where possible.</li>
+ </ul>
+<pre>
+class map_task
+{
+ public:
+ typedef std::string key_type;
+ typedef std::ifstream value_type;
+ typedef std::string intermediate_key_type;
+ typedef unsigned intermediate_value_type;
+
+ map_task(job::map_task_runner &runner);
+ void operator()(key_type const &key, value_type const &value);
+
+ private:
+ job::map_task_runner &runner_;
+};
+</pre>
+ <h2>ReduceTask</h2>
+ <p>Requirements of a ReduceTask function object are</p>
+ <ul>
+ <li>Provide type definitions for Reduce Value (<code>v2</code>);
+ <code> value_type</code></li>
+ <li>Define a constructor taking a <code>job::reduce_task_runner</code> object by reference</li>
+ <li>Store a reference to the <code>job::reduce_task_runner</code> object passed to the constructor,
+ to be used to emit results</li>
+ <li>Define a function-call operator <code>void operator()(typename map_task::intermediate_key_type
+ const &key, It it, It ite);</code> where It is an iterator type.</li>
+ </ul>
+<pre>
+class reduce_task
+{
+ public:
+ typedef unsigned value_type;
+
+ reduce_task(job::reduce_task_runner &runner);
+
+ template<typename It>
+ void operator()(typename map_task::intermediate_key_type const &key, It it, It ite);
+
+ private:
+ job::reduce_task_runner &runner_;
+};
+</pre>
+<p>See the <a href='./wordcount.html'>Word Count example</a> for a detailed breakdown of a simple implementation.</p>
+ </div>
+ </div>
+ </div>
+ </div>
+ <div id="sidebar">
+ <a accesskey="p" href="./index.html"><img src="http://www.boost.org/doc/html/images/prev.png" alt="Prev" /></a>
+ <a accesskey="u" href="./index.html"><img src="http://www.boost.org/doc/html/images/up.png" alt="Up" /></a>
+ <a accesskey="h" href="http://www.boost.org/"><img src="http://www.boost.org/doc/html/images/home.png" alt="Home" /></a>
+ <a accesskey="n" href="./wordcount.html"><img src="http://www.boost.org/doc/html/images/next.png" alt="Next" /></a>
+
+ <hr />
+ <p><a href='./index.html'>Boost.MapReduce</a></p>
+ <p><a href='./tutorial.html'>Tutorial</a></p>
+ <p><a href='./wordcount.html'>Example</a></p>
+ <hr />
+ <p><a href='./schedule_policies.html'>Schedule Policies</a></p>
+ <p><a href='./platform.html'>Platform Notes</a></p>
+ <p><a href='./future.html'>Future Work</a></p>
+ </div>
+ <div class="clear"></div>
+ </div>
+ </div>
+
+ <div id="footer">
+ <div id="footer-left">
+
+ <div id="copyright">
+ <p>Copyright (C) 2009 Craig Henderson.</p>
+ </div> <div id="license">
+ <p>Distributed under the <a href="/LICENSE_1_0.txt" class=
+ "internal">Boost Software License, Version 1.0</a>.</p>
+ </div>
+ </div>
+
+ <div id="footer-right">
+ <div id="banners">
+ <p id="banner-xhtml">XHTML 1.0</p>
+
+ <p id="banner-css"><a href=
+ "http://jigsaw.w3.org/css-validator/check/referer" class=
+ "external">CSS</a></p>
+
+ <p id="banner-osi"><a href=
+ "http://www.opensource.org/docs/definition.php" class="external">OSI
+ Certified</a></p>
+ </div>
+ </div>
+ <div class="clear"></div>
+ </div>
+</body>
+</html>
\ No newline at end of file
Added: sandbox/libs/mapreduce/doc/wordcount.html
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/doc/wordcount.html 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,425 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+ <title>Boost.MapReduce Word Count example</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <link href="http://www.boost.org/favicon.ico" rel="icon" type="http://www.boost.org/image/ico" />
+ <link rel="stylesheet" type="text/css" href="http://www.boost.org/style/basic.css" />
+</head>
+
+<body>
+ <div id="heading">
+ <div id="heading-placard"></div>
+
+ <h1 id="heading-title"><a href="/"><img src="http://www.boost.org/gfx/space.png" alt=
+ "Boost C++ Libraries" id="heading-logo" /><span id="boost">Boost</span>
+ <span id="cpplibraries">C++ Libraries</span></a></h1>
+
+ <p id="heading-quote"><span class="quote">“...one of the most highly
+ regarded and expertly designed C++ library projects in the
+ world.”</span> <span class="attribution">— <a href=
+ "http://www.gotw.ca/" class="external">Herb Sutter</a> and <a href=
+ "http://en.wikipedia.org/wiki/Andrei_Alexandrescu" class="external">Andrei
+ Alexandrescu</a>, <a href=
+ "http://safari.awprofessional.com/?XmlId=0321113586" class="external">C++
+ Coding Standards</a></span></p>
+
+ </div>
+
+ <div id="body">
+ <div id="body-inner">
+ <div id="content">
+ <div class="section">
+ <div class="section-0">
+ <div class="section-title">
+ <h1>Boost.MapReduce Word Count example</h1>
+ <em>Note: This library is not yet part of the Boost Library and is still under development and review.</em>
+ </div>
+
+<div class="section-body">
+<p>
+By way of an example of using the MapReduce library, we implement a Word Count application.
+We'll use a <code>datasource</code> class supplied by the library to iterate through a directory
+of files containing words to be counted. The Map phase will create a list of words and a count of 1,
+and the Reduce phase will accept a list of words and corresponding counts, total the counts
+for each word, and produce a final list of words with their totals.
+</p>
+<pre>
+map (filename; string, file stream; ifstream) --> list(word; string, count; unsigned int)
+reduce (word; string, list(count; unsigned int)) --> list(count; unsigned int)</pre>
+
+
+<h2>Type Definitions</h2>
+<p>
+ For convenience, brevity and maintainability, define a <code>job</code> type for the MapReduce job.
+ This local <code>job</code> type will be an defined in terms of the library's <code>mapreduce::job</code>
+ class with template parameters specific to the Word Count application.
+</p>
+
+<pre>
+typedef
+mapreduce::job<
+ wordcount::map_task,
+ wordcount::reduce_task>
+job;
+</pre>
+<p>The class <code>mapreduce::job</code> actually has 5 template parameters. The first two must be supplied, the last
+three have default values. The definition above is therefore equivalent to</p>
+<pre>
+typedef
+mapreduce::job<
+ class wordcount::map_task,
+ class wordcount::reduce_task,
+ struct mapreduce::null_combiner,
+ class mapreduce::datasource::directory_iterator<class wordcount::map_task>,
+ class mapreduce::intermediates::local_disk<
+ class wordcount::map_task,
+ struct mapreduce::detail::file_sorter,
+ struct mapreduce::detail::file_merger>
+ >
+job;
+</pre>
+
+<h2>MapTask</h2>
+<p>
+ The MapTask will be implemented by a function-object <code>wordcount::map_task</code>. There are four required
+ data types to be defined in the functor for the <code>key</code>/<code>value</code> types of the input and
+ output of the map task.
+</p>
+<pre>
+typedef std::string key_type;
+typedef std::ifstream value_type;
+typedef std::string intermediate_key_type;
+typedef unsigned intermediate_value_type;
+</pre>
+<p>
+ Now the function-call operator, which takes two parameters; the <code>key</code> and <code>value</code> for the
+ map task to process. Normally these parameters would be expected to be passed as a reference-to-const, but in
+ the Word Count example, the <code>value</code> parameter is defined as an <code>std::ifstream</code> object. If
+ this was passed as reference-to-const, then the function would not be able to read from the file as the read
+ operation modifies the state of the object. As a result, the <code>value</code> parameter is passed as a plain
+ reference.
+</p>
+<p>
+ The function simply loops until the end-of-file is reached on the supplied <code>std::ifstream</code> object.
+ In each iteration a <em>word</em> is read into a <code>string</code> object, converted to lowercase text and
+ non-alphanumeric characters are stripped from the beginning and end. The <em>word</em> is then stored as an
+ intermediate <code>key</code> with a <code>value</code> of <code>1</code>, by calling the
+ <code>emit_intermediate()</code> function of the <code>job::map_task_runner</code> object which was passed to
+ the constructor of the <code>map_task</code> object.
+</p>
+<pre>
+// not a reference to const to enable streams to be passed
+void operator()(key_type const &/*key*/, value_type &value)
+{
+ while (!value.eof())
+ {
+ std::string word;
+ value >> word;
+ std::transform(word.begin(), word.end(), word.begin(),
+ std::bind1st(
+ std::mem_fun(&std::ctype<char>::tolower),
+ &std::use_facet<std::ctype<char> >(std::locale::classic())));
+
+ size_t length = word.length();
+ size_t const original_length = length;
+ std::string::const_iterator it;
+ for (it=word.begin();
+ it!=word.end() && !std::isalnum(*it, std::locale::classic());
+ ++it)
+ {
+ --length;
+ }
+
+ for (std::string::const_reverse_iterator rit=word.rbegin();
+ length>0 && !std::isalnum(*rit, std::locale::classic());
+ ++rit)
+ {
+ --length;
+ }
+
+ if (length > 0)
+ {
+ if (length == original_length)
+ runner_.emit_intermediate(word, 1);
+ else
+ runner_.emit_intermediate(std::string(&*it,length), 1);
+ }
+ }
+}
+</pre>
+
+<h2>ReduceTask</h2>
+<p>
+ The ReduceTask will be implemented by a function-object <code>wordcount::reduce_task</code>. There
+ is one required data type to be defined in the functor for the <code>value</code> type output of
+ the reduce task.
+</p>
+<pre>
+typedef unsigned value_type;
+</pre>
+<p>
+ The function-call operator takes three parameters; the <code>key</code> of the reduce task and a pair
+ of iterators dictating the range of <code>value</code> objects for the reduce task. In this Word Count
+ example, the <code>key</code> is a text string containing the <em>word</em>, and the iterators contain
+ a list of frequencies for the word. The ReduceTask simply sums the frequencies by calling
+ <code>std::accumulate</code> and stores the final result by calling the <code>emit()</code> function of
+ the <code>job::reduce_task_runner</code> object which was passed to the constructor of the
+ <code>reduce_task</code> object.
+</p>
+<pre>
+template<typename It>
+void operator()(typename map_task::intermediate_key_type const &key, It it, It const ite)
+{
+ runner_.emit(key, std::accumulate(it, ite, reduce_task::value_type()));
+}
+</pre>
+
+<h2>Program</h2>
+<p>
+ To run the MapReduce Word Count algorithm, we need a program to set up an
+ environment, run the algorithm and report the results.
+</p>
+<p>
+ The code below shows an example. Note that error handling has been removed for brevity.
+ A <code>datasource</code> object is created to iterate through a directory of files and
+ pass each file into a map task. A <code>mapreduce::specification</code> object is then
+ created. This is used to specify system parameters such a the number of map tasks to run.
+ <em>Note that this is a hint to the MapReduce runtime, and may differ from th actual
+ number of maps that are used.</em> The final supporting object that is created is an
+ instance of <code>mapreduce::results</code>. This structure will be populated by the
+ runtime to provide metrics and timings of the MapReduce job execution.
+</p>
+<p>
+ To run the MapReduce job, call the <code>run</code> function of the <code>job</code> class.
+ There are two variant of <code>run</code>, for coding convenience.
+</p>
+<pre>
+ template<typename SchedulePolicy>
+ void run(specification const &spec, results &result);
+
+ template<typename SchedulePolicy>
+ void run(SchedulePolicy &schedule, specification const &spec, results &result);
+</pre>
+<p>
+ Both overloads of <code>run()</code> are template functions where the template parameter
+ is a <code>SchedulePolicy</code>. The first variant will default construct a schedule policy
+ class, and the second variant will use the supplied policy class. This enables the library
+ user to develop their own scheduler policies that may need configuration before being used.
+ See <a href='./schedule_policies.html'>Schedule Policies</a> for more information.
+</p>
+
+<pre>
+int main(int argc, char **argv)
+{
+ wordcount::job::datasource_type datasource;
+ datasource.set_directory(argv[1]);
+
+ mapreduce::specification spec;
+ spec.map_tasks = atoi(argv[2]);
+
+ mapreduce::results result;
+ wordcount::job mr2(datasource);
+
+ mr2.run<mapreduce::schedule_policy::cpu_parallel<wordcount::job> >(spec, result);
+
+...
+</pre>
+<p>
+ At the end of the MapReduce job execution, the results can be written to the screen.
+</p>
+<pre>
+std::cout << std::endl << "\n" << "MapReduce statistics:";
+std::cout << "\n " << "MapReduce job runtime : " << result.job_runtime << " seconds, of which...";
+std::cout << "\n " << " Map phase runtime : " << result.map_runtime << " seconds";
+std::cout << "\n " << " Reduce phase runtime : " << result.reduce_runtime << " seconds";
+std::cout << "\n\n " << "Map:";
+std::cout << "\n " << "Total Map keys : " << result.counters.map_tasks;
+std::cout << "\n " << "Map keys processed : " << result.counters.map_tasks_completed;
+std::cout << "\n " << "Map key processing errors : " << result.counters.map_tasks_error;
+std::cout << "\n " << "Number of Map Tasks run (in parallel) : " << result.counters.actual_map_tasks;
+std::cout << "\n " << "Fastest Map key processed in : " << *std::min_element(result.map_times.begin(), result.map_times.end()) << " seconds";
+std::cout << "\n " << "Slowest Map key processed in : " << *std::max_element(result.map_times.begin(), result.map_times.end()) << " seconds";
+std::cout << "\n " << "Average time to process Map keys : " << std::accumulate(result.map_times.begin(), result.map_times.end(), boost::int64_t()) / result.map_times.size() << " seconds";
+
+std::cout << "\n\n " << "Reduce:";
+std::cout << "\n " << "Number of Reduce Tasks run (in parallel): " << result.counters.actual_reduce_tasks;
+std::cout << "\n " << "Number of Result Files : " << result.counters.num_result_files;
+std::cout << "\n " << "Fastest Reduce key processed in : " << *std::min_element(result.reduce_times.begin(), result.reduce_times.end()) << " seconds";
+std::cout << "\n " << "Slowest Reduce key processed in : " << *std::max_element(result.reduce_times.begin(), result.reduce_times.end()) << " seconds";
+std::cout << "\n " << "Average time to process Reduce keys : " << std::accumulate(result.reduce_times.begin(), result.reduce_times.end(), boost::int64_t()) / result.map_times.size() << " seconds";
+</pre>
+
+<h2>Output</h2>
+<p>
+ The wordcount program was run on a sample dataset consists of six plain text files consisting
+ a total of 90.8 MB (95,284,354 bytes). The smallest file is 163 KB (167,529 bytes) and the largest
+ is 88.1 MB (92,392,601 bytes).
+</p>
+<pre>
+MapReduce Wordcount Application
+2 CPU cores
+class mapreduce::job<class wordcount::map_task,class wordcount::reduce_task,stru
+ct mapreduce::null_combiner,class mapreduce::datasource::directory_iterator<clas
+s wordcount::map_task>,class mapreduce::intermediates::local_disk<class wordcoun
+t::map_task,struct mapreduce::detail::file_sorter,struct mapreduce::detail::file
+_merger> >
+
+Running CPU Parallel MapReduce...
+CPU Parallel MapReduce Finished.
+
+MapReduce statistics:
+ MapReduce job runtime : 141 seconds, of which...
+ Map phase runtime : 44 seconds
+ Reduce phase runtime : 97 seconds
+
+ Map:
+ Total Map keys : 6
+ Map keys processed : 6
+ Map key processing errors : 0
+ Number of Map Tasks run (in parallel) : 2
+ Fastest Map key processed in : 0 seconds
+ Slowest Map key processed in : 43 seconds
+ Average time to process Map keys : 7 seconds
+
+ Reduce:
+ Number of Reduce Tasks run (in parallel): 2
+ Number of Result Files : 10
+ Fastest Reduce key processed in : 12 seconds
+ Slowest Reduce key processed in : 36 seconds
+ Average time to process Reduce keys : 30 seconds
+</pre>
+
+<h2>Adding a Combiner</h2>
+<p>
+ In some circumstances, an optimisation can be made by consolidating the results of
+ the Map phase before they are passed to the Reduce phase. This consolidation is
+ done by a <code>combiner</code> functor.
+</p>
+<p>
+ In the case of the Word Count example, the Map phase will naturally produce list of
+ words, each with a count of 1. The <code>combiner</code> can be used to total the
+ number of each word in the list and produce a shorter list with unique word occurrences.
+</p>
+<pre>
+class combiner
+{
+ public:
+ void start(map_task::intermediate_key_type const &)
+ {
+ total_ = 0;
+ }
+
+ template<typename IntermediateStore>
+ void finish(map_task::intermediate_key_type const &key, IntermediateStore &intermediate_store)
+ {
+ if (total_ > 0)
+ intermediate_store.insert(key, total_);
+ }
+
+ void operator()(map_task::intermediate_value_type const &value)
+ {
+ total_ += value;
+ }
+
+ private:
+ size_t total_;
+};
+</pre>
+
+<p>
+The <code>combiner</code> runs as a part of the Map Task, hence the time
+taken for the Map phase is significantly increased with the introduction
+of a combiner, but the Reduce phase is reduce almost no time at all.
+</p>
+
+<pre>
+MapReduce Wordcount Application
+2 CPU cores
+class mapreduce::job<class wordcount::map_task,class wordcount::reduce_task,clas
+s wordcount::combiner,class mapreduce::datasource::directory_iterator<class word
+count::map_task>,class mapreduce::intermediates::local_disk<class wordcount::map
+_task,struct mapreduce::detail::file_sorter,struct mapreduce::detail::file_merge
+r> >
+
+Running CPU Parallel MapReduce...
+CPU Parallel MapReduce Finished.
+
+MapReduce statistics:
+ MapReduce job runtime : 116 seconds, of which...
+ Map phase runtime : 114 seconds
+ Reduce phase runtime : 2 seconds
+
+ Map:
+ Total Map keys : 6
+ Map keys processed : 6
+ Map key processing errors : 0
+ Number of Map Tasks run (in parallel) : 2
+ Fastest Map key processed in : 1 seconds
+ Slowest Map key processed in : 112 seconds
+ Average time to process Map keys : 19 seconds
+
+ Reduce:
+ Number of Reduce Tasks run (in parallel): 2
+ Number of Result Files : 10
+ Fastest Reduce key processed in : 0 seconds
+ Slowest Reduce key processed in : 1 seconds
+ Average time to process Reduce keys : 0 seconds
+</pre>
+
+<h2>Source Code</h2>
+<p>The full source code for the Word Count example can be found <code>libs/mapreduce/test/wordcount/wordcount.cpp</code>.</p>
+
+ </div>
+ </div>
+ </div>
+ </div>
+ <div id="sidebar">
+ <a accesskey="p" href="./tutorial.html"><img src="http://www.boost.org/doc/html/images/prev.png" alt="Prev" /></a>
+ <a accesskey="u" href="./index.html"><img src="http://www.boost.org/doc/html/images/up.png" alt="Up" /></a>
+ <a accesskey="h" href="http://www.boost.org/"><img src="http://www.boost.org/doc/html/images/home.png" alt="Home" /></a>
+ <a accesskey="n" href="./schedule_policies.html"><img src="http://www.boost.org/doc/html/images/next.png" alt="Next" /></a>
+
+ <hr />
+ <p><a href='./index.html'>Boost.MapReduce</a></p>
+ <p><a href='./tutorial.html'>Tutorial</a></p>
+ <p><a href='./wordcount.html'>Example</a></p>
+ <hr />
+ <p><a href='./schedule_policies.html'>Schedule Policies</a></p>
+ <p><a href='./platform.html'>Platform Notes</a></p>
+ <p><a href='./future.html'>Future Work</a></p>
+ </div>
+ <div class="clear"></div>
+ </div>
+ </div>
+
+ <div id="footer">
+ <div id="footer-left">
+
+ <div id="copyright">
+ <p>Copyright (C) 2009 Craig Henderson.</p>
+ </div> <div id="license">
+ <p>Distributed under the <a href="/LICENSE_1_0.txt" class=
+ "internal">Boost Software License, Version 1.0</a>.</p>
+ </div>
+ </div>
+
+ <div id="footer-right">
+ <div id="banners">
+ <p id="banner-xhtml">XHTML 1.0</p>
+
+ <p id="banner-css"><a href=
+ "http://jigsaw.w3.org/css-validator/check/referer" class=
+ "external">CSS</a></p>
+
+ <p id="banner-osi"><a href=
+ "http://www.opensource.org/docs/definition.php" class="external">OSI
+ Certified</a></p>
+ </div>
+ </div>
+ <div class="clear"></div>
+ </div>
+</body>
+</html>
\ No newline at end of file
Added: sandbox/libs/mapreduce/mapreduce.sln
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/mapreduce.sln 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,26 @@
+
+Microsoft Visual Studio Solution File, Format Version 9.00
+# Visual Studio 2005
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "mapreduce", "mapreduce.vcproj", "{F1A9A9FC-ACE9-4F93-8162-B888697FD81B}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "wordcount", "test\wordcount\wordcount.vcproj", "{AB0444E8-E927-470A-BF0B-A60E67F91B06}"
+EndProject
+Global
+ GlobalSection(SolutionConfigurationPlatforms) = preSolution
+ Debug|Win32 = Debug|Win32
+ Release|Win32 = Release|Win32
+ EndGlobalSection
+ GlobalSection(ProjectConfigurationPlatforms) = postSolution
+ {F1A9A9FC-ACE9-4F93-8162-B888697FD81B}.Debug|Win32.ActiveCfg = Debug|Win32
+ {F1A9A9FC-ACE9-4F93-8162-B888697FD81B}.Debug|Win32.Build.0 = Debug|Win32
+ {F1A9A9FC-ACE9-4F93-8162-B888697FD81B}.Release|Win32.ActiveCfg = Release|Win32
+ {F1A9A9FC-ACE9-4F93-8162-B888697FD81B}.Release|Win32.Build.0 = Release|Win32
+ {AB0444E8-E927-470A-BF0B-A60E67F91B06}.Debug|Win32.ActiveCfg = Debug|Win32
+ {AB0444E8-E927-470A-BF0B-A60E67F91B06}.Debug|Win32.Build.0 = Debug|Win32
+ {AB0444E8-E927-470A-BF0B-A60E67F91B06}.Release|Win32.ActiveCfg = Release|Win32
+ {AB0444E8-E927-470A-BF0B-A60E67F91B06}.Release|Win32.Build.0 = Release|Win32
+ EndGlobalSection
+ GlobalSection(SolutionProperties) = preSolution
+ HideSolutionNode = FALSE
+ EndGlobalSection
+EndGlobal
Added: sandbox/libs/mapreduce/mapreduce.vcproj
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/mapreduce.vcproj 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,229 @@
+<?xml version="1.0" encoding="Windows-1252"?>
+<VisualStudioProject
+ ProjectType="Visual C++"
+ Version="8.00"
+ Name="mapreduce"
+ ProjectGUID="{F1A9A9FC-ACE9-4F93-8162-B888697FD81B}"
+ RootNamespace="mapreduce"
+ Keyword="Win32Proj"
+ >
+ <Platforms>
+ <Platform
+ Name="Win32"
+ />
+ </Platforms>
+ <ToolFiles>
+ </ToolFiles>
+ <Configurations>
+ <Configuration
+ Name="Debug|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(SolutionDir)$(ConfigurationName)\compiler\$(ProjectName)"
+ ConfigurationType="4"
+ CharacterSet="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="0"
+ AdditionalIncludeDirectories="../.."
+ PreprocessorDefinitions="WIN32_LEAN_AND_MEAN"
+ MinimalRebuild="true"
+ BasicRuntimeChecks="3"
+ RuntimeLibrary="3"
+ UsePrecompiledHeader="0"
+ WarningLevel="4"
+ Detect64BitPortabilityProblems="true"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLibrarianTool"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ <Configuration
+ Name="Release|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(SolutionDir)$(ConfigurationName)\compiler\$(ProjectName)"
+ ConfigurationType="4"
+ CharacterSet="1"
+ WholeProgramOptimization="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ AdditionalIncludeDirectories="../.."
+ PreprocessorDefinitions="WIN32_LEAN_AND_MEAN"
+ RuntimeLibrary="2"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ Detect64BitPortabilityProblems="true"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLibrarianTool"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ </Configurations>
+ <References>
+ </References>
+ <Files>
+ <Filter
+ Name="Source Files"
+ Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+ UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+ >
+ </Filter>
+ <Filter
+ Name="Header Files"
+ >
+ <File
+ RelativePath="..\..\boost\mapreduce.hpp"
+ >
+ </File>
+ <Filter
+ Name="mapreduce"
+ >
+ <File
+ RelativePath="..\..\boost\mapreduce\datasource.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\hash_partitioner.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\intermediates.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\job.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\mergesort.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\null_combiner.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\platform.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\schedule_policy.hpp"
+ >
+ </File>
+ <Filter
+ Name="schedule_policy"
+ >
+ <File
+ RelativePath="..\..\boost\mapreduce\schedule_policy\cpu_parallel.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\schedule_policy\sequential.hpp"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="intermediates"
+ >
+ <File
+ RelativePath="..\..\boost\mapreduce\intermediates\in_memory.hpp"
+ >
+ </File>
+ <File
+ RelativePath="..\..\boost\mapreduce\intermediates\local_disk.hpp"
+ >
+ </File>
+ </Filter>
+ </Filter>
+ </Filter>
+ <Filter
+ Name="Resource Files"
+ Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+ UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+ >
+ </Filter>
+ </Files>
+ <Globals>
+ </Globals>
+</VisualStudioProject>
Added: sandbox/libs/mapreduce/test/wordcount/wordcount.cpp
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/test/wordcount/wordcount.cpp 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,260 @@
+// Boost.MapReduce library
+//
+// Copyright (C) 2009 Craig Henderson.
+// cdm.henderson_at_[hidden]
+//
+// Use, modification and distribution is subject to the
+// Boost Software License, Version 1.0. (See accompanying
+// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
+//
+// For more information, see http://www.boost.org/libs/mapreduce/
+//
+
+/*
+Use a variant of Boost built with STLport IOStreams
+
+STLport-5.2.1>set include=C:\root\Development\Library\STLport\STLport-5.2.1\stlport;%include%
+STLport-5.2.1>configure msvc8 -p winxp -x --without-thread --with-dynamic-rtl
+STLport-5.2.1\build\lib>nmake clean install
+
+Edit boost_1_39_0\tools\build\v2\user-config.jam to
+# -------------------
+# MSVC configuration.
+# -------------------
+# Configure specific msvc version (searched for in standard locations and PATH).
+using msvc : 8.0 ;
+# ----------------------
+# STLPort configuration.
+# ----------------------
+using stlport : : .../STLport/STLport-5.2.1/stlport : .../STLport/STLport-5.2.1/lib/vc8 ;
+
+boost_1_39_0> set include=C:\root\Development\Library\STLport\STLport-5.2.1\stlport;%include%
+boost_1_39_0> set INCLUDE=%INCLUDE%;C:\root\Development\Library\zlib\include
+boost_1_39_0> set ZLIB_INCLUDE=C:\root\Development\Library\zlib\include
+boost_1_39_0> set LIBPATH=%LIBPATH%;C:\root\Development\Library\zlib\lib
+boost_1_39_0> set ZLIB_LIBPATH=C:\root\Development\Library\zlib\lib
+boost_1_39_0> set ZLIB_BINARY=zdll
+boost_1_39_0> ..\bjam --toolset=msvc stdlib=stlport "stdlib:stlport-iostream=on" --without-python --with-filesystem --with-thread --with-date_time
+*/
+
+#ifdef _WIN32
+#ifndef WINVER // Allow use of features specific to Windows XP or later.
+#define WINVER 0x0501 // Change this to the appropriate value to target other versions of Windows.
+#endif
+
+#ifndef _WIN32_WINNT // Allow use of features specific to Windows XP or later.
+#define _WIN32_WINNT 0x0501 // Change this to the appropriate value to target other versions of Windows.
+#endif
+
+#ifndef _WIN32_WINDOWS // Allow use of features specific to Windows 98 or later.
+#define _WIN32_WINDOWS 0x0410 // Change this to the appropriate value to target Windows Me or later.
+#endif
+
+#ifndef _WIN32_IE // Allow use of features specific to IE 6.0 or later.
+#define _WIN32_IE 0x0600 // Change this to the appropriate value to target other versions of IE.
+#endif
+
+#endif
+
+#include <boost/config.hpp>
+#if defined(BOOST_MSVC)
+#pragma warning(disable:4996 4512) // for wordcount std::transform
+#endif
+
+#include <boost/mapreduce.hpp>
+#include <iostream>
+#include <numeric> // accumulate
+
+#if defined(BOOST_MSVC) && defined(_DEBUG)
+#include <crtdbg.h>
+#endif
+
+namespace wordcount {
+
+class map_task;
+class reduce_task;
+class combiner;
+
+typedef
+boost::mapreduce::job
+ < wordcount::map_task
+ , wordcount::reduce_task
+ , wordcount::combiner
+#if 0 && defined(_DEBUG)
+ , boost::mapreduce::datasource::directory_iterator<wordcount::map_task>
+ , boost::mapreduce::intermediates::in_memory<wordcount::map_task>
+#endif
+ >
+job;
+
+class map_task : boost::noncopyable
+{
+ public:
+ typedef std::string key_type;
+ typedef std::ifstream value_type;
+ typedef std::string intermediate_key_type;
+ typedef unsigned intermediate_value_type;
+
+ map_task(job::map_task_runner &runner)
+ : runner_(runner)
+ {
+ }
+
+ // not a reference to const to enable streams to be passed
+ void operator()(key_type const &/*key*/, value_type &value)
+ {
+ while (!value.eof())
+ {
+ std::string word;
+ value >> word;
+ std::transform(word.begin(), word.end(), word.begin(),
+ std::bind1st(
+ std::mem_fun(&std::ctype<char>::tolower),
+ &std::use_facet<std::ctype<char> >(std::locale::classic())));
+
+ size_t length = word.length();
+ size_t const original_length = length;
+ std::string::const_iterator it;
+ for (it=word.begin();
+ it!=word.end() && !std::isalnum(*it, std::locale::classic());
+ ++it)
+ {
+ --length;
+ }
+
+ for (std::string::const_reverse_iterator rit=word.rbegin();
+ length>0 && !std::isalnum(*rit, std::locale::classic());
+ ++rit)
+ {
+ --length;
+ }
+
+ if (length > 0)
+ {
+ if (length == original_length)
+ runner_.emit_intermediate(word, 1);
+ else
+ runner_.emit_intermediate(std::string(&*it,length), 1);
+ }
+ }
+ }
+
+ private:
+ job::map_task_runner &runner_;
+};
+
+class reduce_task : boost::noncopyable
+{
+ public:
+ typedef unsigned value_type;
+
+ reduce_task(job::reduce_task_runner &runner)
+ : runner_(runner)
+ {
+ }
+
+ template<typename It>
+ void operator()(typename map_task::intermediate_key_type const &key, It it, It const ite)
+ {
+ runner_.emit(key, std::accumulate(it, ite, reduce_task::value_type()));
+ }
+
+ private:
+ job::reduce_task_runner &runner_;
+};
+
+class combiner
+{
+ public:
+ void start(map_task::intermediate_key_type const &)
+ {
+ total_ = 0;
+ }
+
+ template<typename IntermediateStore>
+ void finish(map_task::intermediate_key_type const &key, IntermediateStore &intermediate_store)
+ {
+ if (total_ > 0)
+ intermediate_store.insert(key, total_);
+ }
+
+ void operator()(map_task::intermediate_value_type const &value)
+ {
+ total_ += value;
+ }
+
+ private:
+ unsigned total_;
+};
+
+} // namespace wordcount
+
+
+int main(int argc, char **argv)
+{
+#if defined(BOOST_MSVC) && defined(_DEBUG)
+// _CrtSetBreakAlloc(380);
+ _CrtSetDbgFlag(_CrtSetDbgFlag(_CRTDBG_REPORT_FLAG) | _CRTDBG_LEAK_CHECK_DF);
+#endif
+
+ std::cout << "MapReduce Wordcount Application";
+ if (argc < 2)
+ {
+ std::cerr << "Usage: wordcount directory [num_map_tasks]\n";
+ return 1;
+ }
+
+ wordcount::job::datasource_type datasource;
+ datasource.set_directory(argv[1]);
+
+ std::cout << "\n" << std::max(1,(int)boost::thread::hardware_concurrency()) << " CPU cores";
+ std::cout << "\n" << typeid(wordcount::job).name() << "\n";
+
+#if 0 || defined(_DEBUG)
+ std::cout << "\nRunning Sequential MapReduce...";
+
+ boost::mapreduce::specification spec;
+ spec.map_tasks = 1;
+
+ boost::mapreduce::results result;
+ boost::mapreduce::schedule_policy::sequential<wordcount::job> scheduler;
+ wordcount::job mr1(datasource);
+ mr1.run(scheduler, spec, result);
+
+ std::cout << "\nFinished.";
+#else
+ std::cout << "\nRunning CPU Parallel MapReduce...";
+
+ boost::mapreduce::specification spec;
+ boost::mapreduce::results result;
+ wordcount::job mr2(datasource);
+
+ if (argc > 2)
+ spec.map_tasks = atoi(argv[2]);
+
+ mr2.run<boost::mapreduce::schedule_policy::cpu_parallel<wordcount::job> >(spec, result);
+
+ std::cout << "\nCPU Parallel MapReduce Finished.";
+#endif
+ std::cout << std::endl << "\n" << "MapReduce statistics:";
+ std::cout << "\n " << "MapReduce job runtime : " << result.job_runtime << " seconds, of which...";
+ std::cout << "\n " << " Map phase runtime : " << result.map_runtime << " seconds";
+ std::cout << "\n " << " Reduce phase runtime : " << result.reduce_runtime << " seconds";
+ std::cout << "\n\n " << "Map:";
+ std::cout << "\n " << "Total Map keys : " << result.counters.map_tasks;
+ std::cout << "\n " << "Map keys processed : " << result.counters.map_tasks_completed;
+ std::cout << "\n " << "Map key processing errors : " << result.counters.map_tasks_error;
+ std::cout << "\n " << "Number of Map Tasks run (in parallel) : " << result.counters.actual_map_tasks;
+ std::cout << "\n " << "Fastest Map key processed in : " << *std::min_element(result.map_times.begin(), result.map_times.end()) << " seconds";
+ std::cout << "\n " << "Slowest Map key processed in : " << *std::max_element(result.map_times.begin(), result.map_times.end()) << " seconds";
+ std::cout << "\n " << "Average time to process Map keys : " << std::accumulate(result.map_times.begin(), result.map_times.end(), boost::int64_t()) / result.map_times.size() << " seconds";
+
+ std::cout << "\n\n " << "Reduce:";
+ std::cout << "\n " << "Number of Reduce Tasks run (in parallel): " << result.counters.actual_reduce_tasks;
+ std::cout << "\n " << "Number of Result Files : " << result.counters.num_result_files;
+ std::cout << "\n " << "Fastest Reduce key processed in : " << *std::min_element(result.reduce_times.begin(), result.reduce_times.end()) << " seconds";
+ std::cout << "\n " << "Slowest Reduce key processed in : " << *std::max_element(result.reduce_times.begin(), result.reduce_times.end()) << " seconds";
+ std::cout << "\n " << "Average time to process Reduce keys : " << std::accumulate(result.reduce_times.begin(), result.reduce_times.end(), boost::int64_t()) / result.map_times.size() << " seconds";
+
+ return 0;
+}
Added: sandbox/libs/mapreduce/test/wordcount/wordcount.vcproj
==============================================================================
--- (empty file)
+++ sandbox/libs/mapreduce/test/wordcount/wordcount.vcproj 2009-07-23 15:04:45 EDT (Thu, 23 Jul 2009)
@@ -0,0 +1,206 @@
+<?xml version="1.0" encoding="Windows-1252"?>
+<VisualStudioProject
+ ProjectType="Visual C++"
+ Version="8.00"
+ Name="wordcount"
+ ProjectGUID="{AB0444E8-E927-470A-BF0B-A60E67F91B06}"
+ RootNamespace="wordcount"
+ Keyword="Win32Proj"
+ >
+ <Platforms>
+ <Platform
+ Name="Win32"
+ />
+ </Platforms>
+ <ToolFiles>
+ </ToolFiles>
+ <Configurations>
+ <Configuration
+ Name="Debug|Win32"
+ OutputDirectory="$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)\compiler"
+ ConfigurationType="1"
+ CharacterSet="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="0"
+ AdditionalIncludeDirectories="../../../.."
+ PreprocessorDefinitions="WIN32_LEAN_AND_MEAN"
+ MinimalRebuild="true"
+ BasicRuntimeChecks="3"
+ RuntimeLibrary="3"
+ UsePrecompiledHeader="0"
+ WarningLevel="4"
+ WarnAsError="true"
+ Detect64BitPortabilityProblems="true"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ LinkIncremental="2"
+ AdditionalLibraryDirectories=""
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ OptimizeForWindows98="1"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCWebDeploymentTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ <Configuration
+ Name="Release|Win32"
+ OutputDirectory="$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)\compiler"
+ ConfigurationType="1"
+ CharacterSet="1"
+ WholeProgramOptimization="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ InlineFunctionExpansion="2"
+ AdditionalIncludeDirectories="../../../.."
+ PreprocessorDefinitions="WIN32_LEAN_AND_MEAN;BOOST_LIB_DIAGNOSTIC"
+ RuntimeLibrary="2"
+ UsePrecompiledHeader="0"
+ WarningLevel="4"
+ WarnAsError="true"
+ Detect64BitPortabilityProblems="true"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ LinkIncremental="1"
+ AdditionalLibraryDirectories=""
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ OptimizeReferences="2"
+ EnableCOMDATFolding="2"
+ OptimizeForWindows98="1"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCWebDeploymentTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ </Configurations>
+ <References>
+ </References>
+ <Files>
+ <Filter
+ Name="Source Files"
+ Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+ UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+ >
+ <File
+ RelativePath=".\wordcount.cpp"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="Header Files"
+ Filter="h;hpp;hxx;hm;inl;inc;xsd"
+ UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+ >
+ </Filter>
+ <Filter
+ Name="Resource Files"
+ Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+ UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+ >
+ </Filter>
+ </Files>
+ <Globals>
+ </Globals>
+</VisualStudioProject>
Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk