The Landscape of Parallel Computing Research: A View From Berkeley
The recent switch to parallel microprocessors is a milestone in the history of computing. A multidisciplinary group of researchers here in Berkeley has been meeting since Spring 2005 to discuss this change from the conventional wisdom. Our white paper summarizes our learnings from these discussions. This wiki is a meetingplace for us as a research community to explore the future of parallel processing. The video interview with Dave Patterson, Krste Asanovic and Kurt Keutzer, or Dave Patterson's presentation at a recent Distinguished Colloquium here at Berkeley are great introductions to the Berkeley View project. Here are the slides from a related talk by Dave Patterson.
We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven question under the following assumptions:
- The target should be 1000s of cores per chip, as this hardware is the most efficient in MIPS per watt, MIPS per area of silicon, and MIPS per development dollar.
- Instead of traditional benchmarks, use 7+ “dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.)
- “Autotuners” should play a larger role than conventional compilers in translating parallel programs.
- To maximize programmer productivity, programming models should be independent of the number of processors.
- To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: data-level parallelism, independent task parallelism, and instruction-level parallelism.
1. What are the applications?
2. What are common kernels of the applications?
Architecture and Hardware
3. What are the HW building blocks?
4. How to connect them?
Programming Model and Systems Software
5. How to describe applications and kernels?
6. How to program the hardware?
7. How to measure success?
The discussions lead to the following unconvertional perspectives:
- Regarding multicore versus manycore: We believe that manycore is the future of computing. Furthermore, it is unwise to presume that multicore architectures and programming models suitable for 2 to 32 processors can incrementally evolve to serve manycore systems of 1000’s of processors.
- Regarding the application tower: We believe a promising approach is to use 7+ Dwarfs as stand-ins for future parallel applications since applications are rapidly changing and because we need to investigate parallel programming models as well as architectures.
- Regarding the hardware tower: We advise limiting hardware building blocks to 50K gates, to innovate in memory as well as in processor design, to consider separate latency-oriented and bandwidth-oriented networks as well as circuit switching in addition to packet switching.
- Regarding the programming models that bridge the two towers: To maximize programmer productivity, programming models should be independent of number of processors, and naturally allow the programmer to describe concurrency latent in the application. To maximize application efficiency, programming models should allow programmers to indicate locality and use a richer set of data types and sizes, and they should support successful and well-known parallel models of parallelism: data level parallelism, independent task parallelism, and instruction-level parallelism. We also think that autotuners should take on a larger, or at least complementary, role to compilers in translating parallel programs. Finally, we claim that parallel programming need not be difficult. Real world applications are naturally parallel and hardware is naturally parallel; what is needed is a programming model that is naturally parallel.
- To provide an effective parallel computing roadmap quickly so that industry can safely place its bets, we encourage researchers to use autotuners and RAMP] to explore this space rapidly and to measure success by how easy it is to program the 7+ dwarfs to run efficiently on manycore systems.
- While embedded and server computing have historically evolved along separate paths, in our view the manycore challenge brings them much closer together. By leveraging the good ideas from each path, we believe we will find better answers to the seven questions.
Pages for Associated Projects
BeBop - Parallel Machines and Parameters
Upcoming meeting agenda for on-campus Berkeley View class
quicktime Video interview of Krste Asanovic, Dave Patterson, and Kurt Keutzer discussing some of the areas of the project (January 2007).
Interview with John Hennessy and Dave Patterson by Kunle Olukotun in December 2006 / January 2007ACM Queue magazine covers parallel challenge in addition to the history of their computer architecture books. Queue added a podcast of the interview as well. Amazingly, the podcast was downloaded more than 100,000 times.
Dave Patterson's Interview with HPCWire Magazine , March 2, 2007.
Essay by Dave Patterson on parallelism challenge for lay audience in January 2007 Intelligent Enterprise.
Dave Patterson's talk on the Berkeley View at the PARC Forum, November 9, 2006.