MapReduce

From View

Jump to: navigation, search
Mapreduce diagram, taken from "MapReduce: Simplified Data Processing on Large Clusters" by Jeffrey Dean and Sanjay Ghemawat.
Enlarge
Mapreduce diagram, taken from "MapReduce: Simplified Data Processing on Large Clusters" by Jeffrey Dean and Sanjay Ghemawat.

Contents

Description

This dwarf was originally called "Monte Carlo", after the technique of using statistical methods based on repeated random trials. The patterns defined by the programming model MapReduce are a more general version of the same idea: repeated independent execution of a function, with results aggregated at the end. Nearly no communication is required between processes.

Uniprocessor Mapping

MapReduce techniques are a general approach that can be applied to a variety of specific problems. The pattern of computation used will depend on the problem being run.

Parallel Mapping

MapReduce techniques are, in general, embarassingly parallel: since the overall problem involves a number of independent trials, no (or very little) communication is required. The only potential difficulty is that the trials might require different amounts of computation, meaning that different threads of computation might complete at different times.

Elaboration

Simplest Example:

The simplest example is the classic Monte Carlo computation of "pi" by throwing darts at a board and computing the fraction of those inside and outside the circle. This is in the UPC lecture (Lecture 13) on the CS267 web page: (http://www.cs.berkeley.edu/~demmel/cs267_Spr05/Lectures/lectures.html)

Examples:

o Lester Group: Quantum Monte Carlo (http://www.cchem.berkeley.edu/walgrp/) http://www.nersc.gov/projects/incite/incite_photosyn.php

o Global Optimization for Portien Structure Prediction: http://vis.lbl.gov/~scrivelli/ (http://thglab.lbl.gov/scientific/ps.html)

Specialized Languages:

o none (but perhaps BOINC can be thought of as a specification framework for monte-carlo problems?)

Specialized Hardware:

o “the grid” 

o BOINC infrastructure http://boinc.berkeley.edu/

How Current Computer Architectures Fail:

o No problems that are common across all Monte Carlo calculations…

References

Jeffrey Dean and Sanjay Chemawat, MapReduce: Simplified Data Processing on Large Clusters</a>. OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. http://labs.google.com/papers/mapreduce.html

Personal tools