React, But Don’t Overreact, to Parallelism
As this report makes clear, software and hardware researchers and practitioners should address important concerns regarding parallelism. At such critical junctures, enthusiasm seems to dictate that all talents and resources be applied to the crisis at hand. Taking the longer view, however, a prudent research portfolio must include concomitant efforts to advance all systems aspects, lest they become tomorrow’s bottlenecks or crises.
For example, in the rush to innovate on chip multiprocessors (CMPs), it is tempting to ignore sequential core performance and to deploy many simple cores. That approach may prevail, but history and Amdahl’s law suggest caution. Three decades ago, a hot technology was vectors. Pioneering vector machines, such as the Control Data STAR-100 and Texas Instruments ASC, advanced vector technology without great concern for improving other aspects of computation. Seymour Cray, in contrast, designed the Cray-11 to have great vector performance as well as to be the world’s fastest scalar computer. Ultimately, his approach prevailed, and the early machines faded away.
Moreover, Amdahl’s law raises concern.2 Amdahl’s limit argument assumed that a fraction, P, of software execution is infinitely parallelizable without overhead, whereas the remaining fraction, 1 - P, is totally sequential. From that assumption, it follows that the speedup with N cores—execution time on N cores divided by execution time on one core—is governed by 1/[(1 - P) + P/N]. Many learn that equation, but it is still instructive to consider its harsh numerical consequences. For N = 256 cores and a fraction parallel P = 99%, for example, speedup is bounded by 72. Moreover, Gustafson3 made good “weak scaling” arguments for why some software will fare much better. Nevertheless, the committee is skeptical that most future software will avoid sequential bottlenecks. Even such a very parallel approach as MapReduce4 has near-sequential activity as the reduce phase draws to a close.
For those reasons, it is prudent to continue work on faster sequential cores,
Internet applications) has been important in the steady increase in available parallelism for these sorts of applications.
A large and growing collection of applications lies between those extremes. In these applications, there is parallelism to be exploited, but it is not easy to extract: it is less regular and less structured in its spatial and temporal control and its data-access and communication patterns. One might argue that these have been the focus of the high-performance computing (HPC) research community for many decades and thus are well understood with respect to those aspects that are amenable to parallel decomposition. The research community also knows that algorithms best suited for a serial machine (for example, quicksort, simplex, and gaston) differ from their counterparts that are best suited for parallel machines