Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 22
4
Future Supercomputing and Research
As the uses of computing to address important societal problems continue to grow and the place of
supercomputing within the overall computing industry continues to change, the value of innovation in
supercomputer architecture, modeling, systems software, applications software, and algorithms will
endure. Drawing on the recent supercomputing reports summarized in Chapter 2 and on its own insights,
the committee outlines the main arguments for a vigorous research program in supercomputing. Some
characteristics of successful innovation in areas such as high-performance computing are described and
some key research problems are identified.
INNOVATION IN HIGH-END COMPUTING
A mature field is one characterized by small incremental improvements rather than large changes. By
that measure, computing, and in particular high-end computing, is not a mature field. The underlying
technology continues to evolve at a rapid pace, and there are ample opportunities for innovations in
architecture, systems software, and applications software to dramatically improve performance and
productivity. New architecture and software technologies are needed to maintain historical growth in
performance. To ensure that new technologies are available to form the basis for supercomputers in the
5-15 year time frame (a typical interval between research innovation and commercial deployment in the
computer industry), a significant and continuous investment in basic research is required. Historically,
such an investment in basic research has returned large dividends in terms of new technology. The need
for basic research in supercomputing is particularly acute. Although there has been basic research in
general-purpose computing technologies with broad markets, and there has been significant expenditure
in advanced development efforts such as the ASC program and the TeraGrid, there has been relatively
little investment in basic research in supercomputing architecture and software over the past decade,
resulting in few innovations to be incorporated into today's supercomputer systems. Because
supercomputing is affected by dislocations due to nonuniform technology scaling (described
subsequently) before mainstream computers are affected, the lack of that research will eventually weigh
on the broader computer industry as well.
Successful innovation programs in areas such as high-performance computing have a number of
characteristics:
I International Technology Roadmap for Semiconductors. 2002. Update. Available online at
.
22
OCR for page 23
FUTURE SUPERCOMPUTING AND RESEARCH
Continuous investment is needed at all stages of the technology pipeline, from initial
investigation of new concepts to technology demonstrations and products. With no initial, speculative
research, the pipeline dries out. With no technology demonstrations, new research ideas are much less
likely to migrate to products. With no investment in products, supercomputers are not built.
· Continuous investment is needed in all contributing sectors, including universities, national
laboratories, and vendors. All of these sectors, which often depend on government funding for their
existence, have small communities of researchers and developers that are necessary for the continued
evolution of supercomputing.
· A mix of small science projects and large efforts that create significant experimental prototypes is
necessary. Large numbers of small individual projects are often the best way of studying new concepts.
A smaller number of technology demonstration systems can draw on the successes of basic research in
architecture, software, and applications concepts, demonstrate their interplay, and validate concepts ahead
of their use in production systems. It is important that such pilot systems be built because without real
hardware platforms, systems software and applications programs will not be written nor will the
experience gained from such system building be acquired. For instance, pilot systems serve to identify
research issues associated with the integration of hardware and software and to address system-level
problems such as I/O performance in high-performance computing.
Research is not a linear pipeline or a funnel, where losers are successively winnowed out until
one winning product emerges. Successful research projects often incorporate the best ideas from related
efforts. In addition, the experience gained from later stages often triggers reconsideration of earlier
decisions. Good research should be organized to maximize the flow of ideas and people across projects
and concepts.
23
ARCHITECTURE RESEARCH
The memory wall and the programming wall are two particularly challenging problems in
supercomputing that could benefit from architecture research. The memory wall, the growing mismatch
between memory bandwidth and latency and processor cycle time, is a major factor limiting performance.
On many applications, modern processors are limited by memory system performance to a small fraction
of peak performance. This problem appears to be growing worse over time.
The memory wall is a special case of nonuniform scaling. As technology improves, different aspects
of technology scale at different rates, leading to disparities in system parameters. When these disparities
become large enough, or when a new technology is introduced, there is a discontinuity in system design
that calls for innovative architecture and software.
To address the memory wall problem, innovative architectures are needed that increase memory
bandwidth or perhaps memory bandwidth per unit cost both to local memory and across an
interconnection network to remote memory. In addition, architectures must tolerate memory latency.
Memory latency (local and global) is expected to increase relative to processor cycle time. Processors,
however, can be designed to tolerate this latency without loss of performance by exploiting parallelism.
While waiting for one result to return from memory, the processor works on a different, parallel part of
the problem, perhaps using some combination of vectors and fine-grained multithreading. Innovation is
needed to better identify and exploit locality. The memory bandwidth required by an application can
often be reduced by a combination of architecture and software. The architecture provides local storage
locations (registers and caches), and the software transforms the program to reduce the volume of
intermediate data produced so that it fits in these local stores.
A programming wall also exists namely, it is becoming increasingly difficult to write complex
codes for high-end computers. Moreover, considerable effort is required to port these codes to new
systems with different performance parameters. Innovation in architecture must take that issue into
account. Architecture research is needed to devise a stable and efficient interface to systems software and
applications, thereby masking, at least to some extent, the variety of architectural strategies that enhance
OCR for page 24
24
THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT
performance. For example, one could attempt to define an appropriate simple abstract virtual machine
that reflected the architectural characteristics. All applications would then be programmed and compiled
for that virtual machine. This approach could make it much easier for the software system to provide
performance portability that is, the ability to move a code from one machine to another without
extensive performance tuning.2 In addition, widespread adoption of better abstractions than those we
have now would enable higher-level programming languages and simpler algorithm and software
development.
Sequential orientation the one-operation-at-a-time nature of contemporary architecture and
software is another major impediment to high-performance computing. While most high-end computers
are necessarily parallel, they are built using processors and programming languages that are
fundamentally serial. By incorporating notions of parallelism in the virtual machine described above
some issues of sequential orientation also might be mitigated.3
SOFTWARE RESEARCH
at,
The development of scalable scientific codes today is a laborious process. Mathematical algorithms
are translated by a programmer into detailed programs and tuned to a specific architecture using
programming notations that reflect the underlying architecture a manual, error-intensive process. The
resulting code is hard to maintain, evolve, and port to new machines. The programmer must provide a
wealth of detail that can obscure the high-level structure of the application solution for example, the
strategy to obtain parallelism. Also, the programmer may have an imperfect understanding of how low-
level mechanisms are best used to achieve high performance.
High-performance computing offers unique challenges because of the need for large-scale parallelism
and for latency tolerance and because performance is important for large, expensive hardware platforms.
Research is needed to find fresh approaches to expressing both data and control parallelism at the
application level, so that the strategy for achieving latency tolerance, locality, and parallelism is devised
, _ _ , , ~
.. .. .. .. . . . .. .. . .. . . . . . .. ..
and expres sect ny the app llcatlon Dove loper, will le separating out the low- leve l Details that support
particular platforms. Both new compilation and operating system capabilities and new tools are needed to
realize high performance on modern supercomputing architectures.
Many of the languages, operating systems, and tools in current use have evolved by modifying
languages and operating systems designed for sequential systems to infer opportunities for parallelism
and to add explicit mechanisms to invoke parallel layouts and operations. Similarly, sequential tools have
been modified for use in parallel environments. While that evolution is natural (and leverages existing
knowledge and skills), it may no longer be sufficient. For example, the effort spent to maintain
compatibility by changing the sequential base may limit the time available for enhancing support for
parallelism and weaken the integrity of the parallel versions.
New software approaches for high-performance computing could exploit several special advantages.
First, many application solutions are derived from a precise mathematical formulation of a physical
problem, such as a finite difference discretization of a differential equation on a grid. By taking the
problem domain into account, the necessary relationships between states of the computation and states of
a mathematical system can facilitate both the mapping of the computation onto a large-scale parallel
machine and the ensuing code development and testing. Second, HPC codes are often developed by small
teams of highly capable scientists, who are often willing and able to use expert-friendly tools and
environments if those tools will enhance their productivity. Finally, the difficulties in using the current
tools for challenging and hard problems are a strong incentive for the user community to explore more
advanced software technology.
2MPI is sometimes cited as such a target, but it is a low-level abstraction.
3Threads provide an aspect of concurrency, but possibly not in the form most appropriate for some applications
developers and some hardware architectures.
OCR for page 25
FUTURE SUPERCOMPUTING AND RESEARCH
25
It is important to pursue the most promising software research with perseverance and a long-term
view. It often takes significant investments and a long time to change widely used programming
paradigms. A significant investment also may be needed in compilers, run times, tools, and libraries to
induce a user community to shift paradigms, even if the new paradigm holds the promise for significant
productivity enhancements. An alternative approach, which is more common, is to fund a diversity of
small enhancements to current systems. That approach runs the risk that none of the enhancements will
make enough of a difference, and that few of them will be pushed far enough to be transferred to practice.
At the operating systems level, current HPC systems (especially cluster systems) have inherited a
design that is not well tuned to the needs of large-scale parallel processing. As an example, a Linux
cluster is managed by a large number of autonomous kernels, each making independent decisions on
memory or processor allocation even though the entire cluster (or large parts of it) needs to run as one
tightly coupled application. This discrepancy has been observed on many systems to have a negative
effect on performance. A parallel application is not an entity that is recognized as a whole by the
distributed operating system; there are no standardized parallel operating system services (e.g., parallel
scheduling, parallel I/O, parallel memory management), although some implementations exist and are
used. Communication and synchronization within a parallel application are achieved inefficiently using
the same operating system mechanisms that are used for communication and synchronization across
independent processes, or they bypass the system services. The mere need for bypass indicates that
current interfaces are inadequate. Research that addresses the performance and semantic inadequacies of
operating systems could lead to significant benefits in performance and software productivity and could
push high-performance computing into new realms for example, the use of large-scale parallelism for
interactive computing.
Software research in HPC is likely to be more successful if closely coupled with research on
algorithms and on architecture. Innovation often comes from a redesign of the interfaces between the
various layers and from a better match between functionality across layers.
A major challenge in building revolutionary architectures and software systems is dealing with the
large volume of legacy code. On the one hand, innovative research should not be constrained by
compatibility needs of existing instruction sets, programming languages, operating systems, and
application implementations. Advanced HPC research is likely to be more productive if it is free to
explore paradigm-shifting approaches. On the other hand, a transition plan is needed to encourage
adoption of new technologies. The transition plan should leverage the existing code base, because the
cost of rewriting all of the legacy code from scratch is prohibitive.
RESEARCH ON APPLICATIONS AND ALGORITHMS
Supercomputing applications exist in a number of well-established and important fields, such as
national security (cryptanalysts, intelligence, defense systems design, and nuclear stockpile stewardship),
weather and climate forecasting, and automotive and aircraft design. New applications are emerging in
the life sciences and biochemistry, among others. This interim report was written in advance of an
applications workshop to be held by the committee that will help it to formulate the supercomputing-
related research needs in these areas. Following are some tentative observations.
There is an ever-increasing need for increased performance. The limitations of present-day
supercomputers prevent many applications from being run using realistic parameter ranges of spatial
resolution and time integration. For such applications, a significant increase of simulation and prediction
quality can be attained by applying more computer power with primarily the same algorithms. For
example, mesh resolution can be increased. In other applications, new algorithms and/or new processes
are required to substantially advance the application involved. Increased mesh resolution often requires
the development of new physics or algorithms for sub-grid-scale processes. In some cases, submodels of
detailed processes may be required within a coarser mesh (e.g., cloud-resolving submodels embedded
within a larger climate model grid).
OCR for page 26
26
THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT
As applications evolve, the workload characteristics change. Many codes evolve toward more
complex, time-dependent and data-dependent control logic and more irregular data structures. This
evolution taxes current architectures.
Huge amounts of model output and real data play an integral part of almost all supercomputing
applications. The ways in which large datasets are prepared, stored, visualized, and analyzed highlight
the need for new software, input/output, storage, and communication capabilities to go along with
enhanced supercomputers and advanced methods.
Improvements in algorithms can sometimes improve performance much more than improvements in
hardware and software do. For example, algorithms for solving the special linear system arising from the
Poisson equations on a regular grid have improved over time from needing O(n29 arithmetic operations to
O6n log n) or even O6n9. Such algorithmic improvements can contribute to increased supercomputer
performance as much as decades of hardware evolution. While such breakthroughs are hard to predict,
the rewards can be significant. Further research can lead to such breakthroughs in the many complicated
domains to which supercomputers are applied.
New algorithmic demands are driven by the following needs:
.
Discip1/tinary needs. The need for higher-resolution analyses leads to larger problems to solve that
lead to the need for faster algorithms (e.g., O(n log n) instead of O(n29 ). As the resolution increases,
completely different physical models may be required (e.g., particle models instead of continuum
models), which in turn require different solution methods. In some problems (such as turbulence),
physically unresolved processes at small length or time scales may have large effects on macroscopic
phenomena, requiring approximations that differ from those for the resolved processes.
Interdiscip1/tinary needs. Many real-world phenomena involve two or more coupled physical
processes for which individual models and algorithms may be known (clouds, winds, ocean currents, heat
flow inside and between the atmosphere and the ocean, atmospheric chemistry, and so on) but where the
coupled system must be solved. Vastly differing time and length scales of the different disciplinary
models frequently makes this coupled model much harder to solve.
· Synthesis and optimization rep1/lacing ana1/lysis. After one has a model that can be used to analyze
(predict) the behavior of a physical system (such as an aircraft or weapons system), it is often desirable to
use that model to trv to synthesize or optimize a system so that it has certain desired Properties. Such a
.
, , ~ , ~ ~
problem can be much more challenging than analysis alone. As an example, a typical analysis computes,
from the shape of an airplane wing, the lift resulting from air flow over the wing, by solving a differential
equation. The related optimization problem is to choose the wing shape that maximizes lift, incorporating
the constraints that ensure that the wing can be manufactured. Solving that problem requires determining
the direction of change in wing shape that causes the lift to increase, either by repeating the analysis as
changes to shape are tried or by analytically computing the appropriate change in shape.
~ Huge data sets. Many fields (one is biology) that previously had relatively few quantitative data
to analyze now have very large quantities, often of varying type, meaning, and uncertainty. These data
may be represented by a diversity of data structures, including tables of numbers, irregular graphs,
adaptive meshes, relational databases, two- or three-dimensional images, text, or various combined
representations. Extracting scientific meaning from these data requires coupling numerical, statistical,
and logical modeling techniques in ways that are unique to each discipline.
· Changing machine mode1/ts. A machine model is the set of operations and their costs presented to
the programmer by the underlying hardware and software. As the machine model changes between
technology generations, an algorithm will probably have to be changed to maintain performance and
scalability. This could involve adjusting a few parameters in the algorithm describing data layouts,
running a combinatorial optimization scheme to rebalance the load, or using a completely different
algorithm that trades off computation and communication in different ways. Some success has been
4A Poisson equation is an equation that describes many physical systems, including heat flow, fluid flow,
diffusion, electrostatics, and gravity, with n unknowns.
OCR for page 27
FUTURE SUPERCOMPUTING AND RESEARCH
achieved in automating this process, but only for a few important algorithmic kernels. For example, the
ATLAS5 and FFTW6 systems automatically choose implementations of matrix-matrix-multiplication and
the fast Fourier transform, respectively, to maximize performance on a particular architecture, depending
on properties such as memory speed and number of registers.
Emerging application areas also drive the need for new algorithms and applications. Bioinformatics,
27
for example, is driving the need to couple equation-driven numerical computing with probabilistic and
constraint-driven computing. Large volumes of data from the Human Genome Project, clinical trials,
statistics, population genetics, and imaging and visualization research stress the I/O capabilities of
contemporary systems.
Many simulation and optimization codes that are now evolved and maintained by independent
software vendors (ISVs) originated in research labs and universities. The arguments for government
funding of supercomputing that are outlined in the next section apply as well to supercomputing
application software: markets are likely to underinvest in such software, the government is an early and
main customer for many such packages, and ISV codes are heavily used in weapon design. Thus, there is
a strong case for government investment, not only in algorithm and application research, but also in the
development of robust and scalable application software.
5See .
6See .
Representative terms from entire chapter:
future supercomputing