clock cycle).5 Transistors were also used to achieve higher frequencies than were supported by the raw transistor speedups, for example, by duplicating logic and by reducing the depth of logic between pipeline latches to allow faster clock cycles. Both of these efforts yielded diminishing returns in the mid-2000s. ILP improvements are continuing, but also with diminishing returns.6
Continuing the progress of semiconductor scaling—whether used for multiple cores or not—is now dependent on innovation in structures and materials to overcome the reduced performance scaling traditionally provided by Dennard scaling.7
Continued scaling also depends on continued innovation in lithography. Current state-of-the-art manufacturing uses a 193-nanometer wavelength to print structures that are only tens of nanometers in size. This apparent violation of optical laws has been supported by innovations in mask patterning and compensated for by increasingly complex computational optics. Future lithography scaling is dependent on continued innovation.
1.1.3 The Shift to Multicore Architectures and Related Architectural Trends
The shift to multicore architectures meant that architects began using the still-increasing transistor counts per chip to build multiple cores per chip rather than higher-performance single-core chips. Higher-performance cores were eschewed in part because of diminishing performance returns and emerging chip power constraints that made small performance gains at a cost of larger power use unattractive. When single-core scaling slowed, a shift in emphasis to multicore chips was the obvious choice, in part because it was the only alternative that could be deployed rapidly. Multicore chips consisting of less complex cores that exploited only the most effective ILP ideas were developed. These chips offered the promise of performance scaling linearly with power. However, this scaling was only possible if software could effectively make use of them (a significant challenge). Moreover, early multicore chips with just a few cores could be used effectively at either the operating system level, avoiding the need to change application software, or by a select group of applications retargeted for multicore chips.
With the turn to multicore, at least three other related architectural trends are important to note to understand how computer designers and architects seek to optimize performance—a shift toward increased data parallelism, accelerators and reconfigurable circuit designs, and system-on-a-chip (SoC) integrated designs.
First, a shift toward increased data parallelism is evident particularly in graphics processing units (GPUs). GPUs have evolved, moving from fixed-function pipelines to somewhat configurable ones to a set of throughput-oriented “cores” that allowed more successful general-purpose GPU (GP-GPU) programming.
Second, accelerators and reconfigurable circuit designs have matured to provide an intermediate alternative between software running on fixed hardware, for example, a multicore chip, and a complete hardware solution such as an application-specific integrated circuit, albeit with their own cost and configuration challenges. Accelerators perform fixed functions well, such as encryption-decryption and compression-decompression, but do nothing else. Reconfigurable fabrics, such as field-programmable gate arrays (FPGAs), sacrifice some of the performance and power benefits of fixed-function accelerators but can be retargeted to different needs. Both offer intermediate solutions in at least four ways: time needed to design and test, flexibility, performance, and power.
Reconfigurable accelerators pose some serious challenges in building and configuring applications; tool chain issues need to be addressed before FPGAs can become widely used as cores. To use accelerators and reconfigurable logic effectively, their costs must be overcome when they are not in use. Fortunately, if power, not silicon area, is the primary cost measure,
5Achieved application performance depends on the characteristics of the application’s resource demands and on the hardware.
6ILP improvements are incremental (10–20 percent), leading to single-digit compound annual growth rates.
7According to Mark Bohr, “Classical MOSFET scaling techniques were followed successfully until around the 90nm generation, when gate-oxide scaling started to slow down due to increased gate leakage” (Mark Bohr, February 9, 2009, “ISSCC Plenary Talk: The New Era of Scaling in an SOC World”) At roughly the same time, subthreshold leakage limited the scaling of the transistor Vt (threshold voltage), which in turn limited the scaling of the voltage supply in order to maintain performance. Since the active power of a circuit is proportional to the square of the supply voltage, this reduced scaling of supply voltage had a dramatic impact on power. This interaction between leakage power and active power has led chip designers to a balance where leakage consumes roughly 30 percent of the power budget. Several approaches are being undertaken. Copper interconnects have replaced aluminum. Strained silicon and Silicon-on-Insulator have provided improved transistor performance. Use of a low-K dielectric material for the interconnect layers has reduced the parasitic capacitance, improving performance. High-K metal gate transistor structures restarted gate “oxide” scaling with orders of magnitude reduction in gate leakage. Transistor structures such as FinFET, or Intel’s Tri-Gate have improved control of the transistor channel, allowing additional scaling of Vt for improved transistor performance and reduced active and leakage power.