Selected Perspectives on Computer Science
The character of computer science research can perhaps best be appreciated by seeing some of the things computer scientists do and why they choose to do them. In Part Two, computer scientists explain not only some of the results achieved in several areas of computer science research but also what interests and excites them about the research. The diversity of the topics addressed in these essays reflects the diversity of the field itself.
The essays in Part Two are organized by chapter into several clusters:
Exponential growth, computability, and complexity. How computer science research makes possible a sustained growth in computing power and how theoretical models of computation help us understand intrinsic limits on what is computable (Chapter 2).
Simulation. How computer models can be used to simulate aspects of the physical world (Chapter 3).
Abstraction, representation, and notations. How abstraction is used to express understanding of a problem, manage complexity, and select the appropriate level of detail and degree of generality (Chapter 4).
Data, representation, and information. How computer science has developed new ways of storing, retrieving, and manipulating data, and how these techniques can profoundly influence the models of a wide range of professionals (Chapter 5).
Achieving intelligence. How computer science’s aspiration to emulate human intelligence has resulted in advances in machine learning,
computers’ natural-language understanding, and improved strategies for game-like situations (Chapter 6).
Building computing systems of practical scale. How the design, development, and large-scale deployment of working computing systems—notably, the Internet—not only are of great practical value but also constitute a diverse and fruitful area of research by which computer scientists may improve these systems and create new ones (Chapter 7).
Research behind everyday computation. How computer scientists’ efforts in the service of human effectiveness have led to such advances as spreadsheets, text-formatting programs, and information retrieval from the Internet, and how these and other innovations have had a very broad impact (Chapter 8).
Personal passion about computer science research. How several computer scientists explain their commitment to computer science research (Chapter 9).
Exponential Growth, Computability, and Complexity
The essay by Hill that follows describes how exponential growth in computing capability drives technological and intellectual progress and how computer science research works to sustain this remarkable growth.
Next, Kleinberg and Papadimitriou address the intellectual essence of computation. They provide glimpses not only into the foundational models that define computation but also into theoreticians’ thinking about these models—what’s convincing, what’s surprising, what’s not.
Finally, Bennett describes how quantum computing research seeks to dodge a fundamental physical limit of current computing technology and stretch our conception of computation.
HARNESSING MOORE’S LAW
Mark D. Hill, University of Wisconsin, Madison
For most products, the claim “better and cheaper” casts suspicion on the salesperson. For computer-related products, however, “better and cheaper” has been wildly true for decades. In this essay, we seek to explain this success so as to give readers a foundation with which to appreciate what the future might hold. Specifically, we explain how rapid technological progress (e.g., the technologist’s Moore’s law) has been harnessed to enable better and cheaper computers (the popular media’s Moore’s law). We then touch upon future prospects for technological progress, computer implementation challenges, and potential impacts.
How Does Computer Hardware Work?
Before delving into our rapid progress in improving computers, it is important to reflect on how computers work. As discussed in Chapter 1 of this volume, computers are not designed to perform a single task. Instead they are machines that can perform many different computational tasks, including tasks that are not specified or even conceived until after the computer is deployed.
We enable this flexibility by using software. Software represents a computational task to hardware as a collection of commands. Each command is called an instruction and the set of instructions that the hardware supports—its vocabulary—is its instruction set architecture. Most instructions are simple. An “add” instruction adds two numbers and indicates which instruction is next. A “branch” instruction compares two numbers to choose which instruction is next. The instructions manipulate data, such as numbers and characters.
Moreover, instructions can manipulate other instructions, since most modern computers store instructions just like data. Treating instructions as data enables programs, such as compilers, that can translate high-level language programs (e.g., written in Java) into machine instructions (see Aho and Larus in Chapter 4).
Computer hardware has three basic components:
A processor executes instructions. Today, it is most common to implement a processor on a single silicon chip called a microprocessor. Most computers today employ a single microprocessor, but larger computers employ multiple microprocessors.
Memory stores instructions and data. Current desktop computer memories are implemented in several chips and backed up by one or
more magnetic disks. Larger computers can employ hundreds of memory chips and disks.
Input/output devices connect a computer to both humans (e.g., keyboards, displays, mice, and speakers) (see Foley in Chapter 8) and to other computers via networks (see Peterson and Clark in Chapter 7).
Hardware designers seek to make processors that execute instructions faster, memory that provides more information, and input/output devices that communicate more effectively. Furthermore, we can and do focus on making these primitive elements faster without worrying about what the software running on the hardware is actually doing.
But how do we know this all works? Why is relatively simple hardware sufficient for most computing? Which instructions do I need in my instruction set architecture? Kleinberg and Papadimitriou in this chapter address these questions using the formal foundations of the Universal Turing Machine and the Church-Turing hypothesis. These foundations allow hardware designers to concentrate on the practical questions of engineering effective hardware.
Moore’s Law and Exponential Growth
The key enabler of “better and cheaper” computers has been rapid technological progress. Arguably, the most important enabler has been the progress in the number of transistors (switches) per semiconductor integrated circuit (chip). In 1965, Gordon Moore used four data points to predict that the number of transistors per chip would double every 2 years. He was not far off! The trend over the last 35 years has indeed been exponential, with the number of transistors per chip doubling approximately every 18 months. Technologists call this trend Moore’s law.
Some commentators have implied that exponential growth, such as with Moore’s law, is unique to computer technology. This belief is incorrect. In fact, exponential growth is common and occurs whenever the rate of increase of a quantity is proportional to the size of the quantity. Examples include compound interest and unconstrained biological population growth. For example, $100 earning compound interest at a rate of 10 percent will more than double in 8 years: $214 = $100 × (1 + 0.10)8.
To understand the implications of rapid exponential growth, consider the absurd example of your annual salary starting at a modest $16 but then doubling every 18 months for 36 years. This improvement corresponds to an annual growth rate of 59 percent. In the early years, your finances would be very tight (e.g., $64 after 3 years). You would have to live with your parents. With patience, you will eventually be able to buy a car ($16,000 after 15 years). Then you can move out and buy a house
($100,000 after 24 years). Eventually you will be very rich and have to dream up fundamentally new ways to spend money ($300 million after 36 years).
The potency of Moore’s law is evident when we observe that it has followed the rate and duration of the absurd income example just given. Sixteen transistors per chip is a mid-1960s number, while 300 million transistors per chip is an early 21st century count. Of course, Moore’s law is not a law of nature. Rather it is a business expectation in the semiconductor industry: to stay even with their competitors, technologists should solve the problems to double the number of transistors per chip every 18 months.
Fortunately, the number of transistors per chip is not the only technological aspect that is achieving rapid, exponential growth rates. The smaller transistors enabling Moore’s law also switch much faster. Moreover, improvements rivaling (or exceeding) Moore’s law have occurred in magnetic disk storage capacity and effective fiber optic network bandwidth.
Unfortunately, some other technologies are improving much more slowly, notably the round-trip delays to memory chips, to disks, and across networks. Memory chip delays are improving slowly because memory chip manufacturers optimize for larger memory capacity instead. Disk delays seem limited by the inertia of mechanically moving macroscopic mass, while network delays are ultimately limited by the speed of light.
How We Have Harnessed Exponential Growth
While it is obvious that rapid technological improvements provide great opportunities for implementing computers, it is also the case that rapid and differential rates pose great challenges. Rapid change means that ideas that were too wasteful (e.g., in number of transistors) soon become appropriate and then become inadequate. The processors in 1960s computers required hundreds or thousands of chips. It was not until 1970 that is was even possible to implement a primitive processor on a single chip (and that first so-called microprocessor could only access a memory of 300 bytes, i.e., 0.0003 megabytes!).
Subsequently, Moore’s law enabled microprocessors to use more and faster transistors. New microprocessor designers exploited this transistor bonanza to obtain computers that got faster more rapidly than the transistors got faster. Central to this effort are methods of using many transistors to cooperate side-by-side in what we call parallelism. Not surprisingly, approaches for exploiting transistor parallelism depend on scale or “level” (much as approaches for organizing humans to work in parallel depend
greatly on whether 10, 1000, or 1 million people are available). With transistors in a microprocessor, we have organized our focus on bit and instruction level parallelism. Near the end of this essay, we discuss how the currently modest use of thread level parallelism needs to be greatly expanded.
During the 1970s and earlier 1980s, microprocessor development focused on accelerating the execution of each instruction using bit level parallelism. Designers made changes to enable each instruction to both (a) manipulate a larger range of numbers (and symbols) and (b) perform those manipulations faster using more transistors side by side. Early microprocessors, for example, slowly performed manipulations on integers taking on one of 65,536 (216) values (or fewer). In a single instruction, a current microprocessor can rapidly transform data whose values range through billions (232) or quintrillions (264) of integer and fractional values.
Since the mid-1980s, many microprocessor improvements have focused on parallelism between instructions, not within instructions, with what is called instruction level parallelism. Thus, instead of executing instruction A, then instruction B, and then instruction C, we try to do instructions A, B, and C at the same time. To exploit instruction level parallelism past a few instructions, however, it is necessary to predict the outcomes of program decisions (often embodied in branch instructions) and speculatively execute subsequent instructions. Instruction level parallelism techniques have been so successful that modern microprocessors can have several dozen instructions in execution at any one time! Nevertheless, most microprocessors still appear to execute instructions one at a time. This illusion allows these microprocessors to run existing software, unmodified. It enables performance gains without endangering the billions of dollars invested to develop and deploy existing software.
The fact that technologies progress at different rates also poses challenges to computer implementations. Twenty years ago, the time to execute a multiply instruction and the time to access a computer’s main memory were comparable. Differential rates of improvement now make a multiplication more than 100 times faster than accessing main memory. Computer architects have responded with a plethora of types and layers of caches. Caches are smaller, faster memories that transparently hold the most-recently accessed subset of the information from a larger, slower memory. Caches are faster than main memory, because they are smaller (fewer bits to reach and select among) and can be implemented in technologies more optimized for speed. Caches often hold a very small fraction of the larger memory (e.g., a 64 kilobyte cache for a 64 megabyte memory). Nevertheless, they are able to satisfy a substantial fraction of requests quickly (e.g., 98 percent) due to locality (the property that, at many sizes and time scales, recently accessed information is highly likely
to be re-accessed quickly). In some ways, caches work for reasons similar to why your cellular phone can hold many of the numbers you actually call. In fact, caches are so effective that current microprocessor designers spend most of their transistors on caches! More generally, caching benefits many aspects of computer systems, as locality is common and smaller memories are faster than larger ones in many technologies.
Meeting the above challenges has enabled computers whose performance has been doubling every 2 years. Ambiguously, this trend is also called Moore’s law (e.g., by the popular press). At this point, we have two Moore’s laws:
Technologist’s Moore’s law: number of transistors per chip doubles every 18 months, and
Popular Moore’s law: computer performance doubles every 2 years.
This popular Moore’s law has provided incredible opportunities for the rest of computer science. Everyone from researchers to product designers can dream up many ideas and ask not whether something will be practical, but when (assuming, of course, one is not trying to solve problems that cannot be solved or require execution times that defeat exponential growth with exponential complexity—see Kleinberg and Papadimitriou). This performance has facilitated major achievements, such as beating a chess grandmaster (Koller and Biermann, in Chapter 6), more realistic computer graphics (Fedkiw, in Chapter 3), better natural-language processing (Lee, in Chapter 6), and sequencing the human genome. In these and other cases, however, tremendous algorithmic and software advances were also necessary to effectively use faster hardware.
Many times, however, cost reduction matters more than increasing performance. In these cases, rather than using more transistors to obtain more performance at constant cost, it makes more sense to use a constant number of transistors to obtain the same performance at reduced cost. Fortuitously, it turns out that every 2 years or so one can obtain the same level of performance for half the cost. In a decade, Jim Gray has observed, it will be possible to buy an equivalent computer for the sales tax on one today (even if the sales tax is as low as 3 percent, which approximately equals 1/25). Cost matters because more cost-effective computation can effectively be more widely applied. This cost reduction has enabled the rapid spread of inexpensive computers and enabled the explosive growth of the Internet (Peterson and Clark, in Chapter 7). Once again, creative innovations, such as word processors (Ullman, in Chapter 8), spreadsheets (Foley, in Chapter 8), and compelling multi-user environments (Bruckman, in Chapter 7) are necessary to make cost-effective hardware useful.
Future of Moore’s Law and Beyond
Recently, there have been several accounts predicting the end of the technologist’s Moore’s law. Since at least the early 1970s, there have been numerous predictions of its demise. However, technologists’ creativity has repeatedly solved the challenges to keep it on track. We are bullish that Moore’s law is safe for at least another decade, due to technologies already operating in the laboratory, backed by the motivation and vision of a trillion-dollar industry.1
Nevertheless, it seems probable that the doubling time for conventional chips will increase and the doubling will eventually halt as atomic limits are approached. There is already evidence that the doubling time for memory chips is now closer to every 2 years instead of every 18 months. A contributing factor to this slowdown is the exponentially increasing cost of building factories for fabricating chips.
Taking a longer view, however, innovation beyond the technologist’s Moore’s law is likely. Computing has already been implemented by a series of technologies: mechanical switches, vacuum tubes, discrete transistors, and now chips driven by Moore’s law. Eventually, almost all computing technologies will be supplanted by newer ones. One emerging candidate uses synthetic organic molecules to perform switching and storage. Another seeks to exploit quantum superposition to change both the model and the implementation of some future computers (see Bennett in this chapter). Both approaches, however, are unproven and may yet be surpassed by other technologies, some of which are not yet invented.
Moreover, we are not yet close to “fundamental” limits. We are many orders of magnitude from subatomic energy and storage limits imposed by currently understood physics. Furthermore, there is an existence proof that it is possible to organize computers in a way that is much better for many tasks: the human brain.
As an aside, some argue that we do not need more computing performance. While it is indeed the case that other constraints, such as low power, low noise, and low environmental impact, are becoming more important, we argue for more cost-effective computer performance. First, all past predictions that there was enough computing performance have been wrong. Two decades ago, some predicted that the ultimate personal computer would support three M’s: one million instructions per second, one megabyte, and one million display elements. Today’s personal computers routinely exceed the first two attributes by two orders of magnitude and still seem inadequate. Second, there are clear opportunities of
See the International Technology Roadmap for Semiconductors Web site at http://public.itrs.net.
applying more computing performance (and more algorithms) to make human-computer interactions closer to the natural-language exchanges envisioned in 2001: A Space Odyssey a third of a century ago. Third, as computing and communication get more cost-effective, surprising new applications are likely to be invented. Some of us used mainframes for decades without predicting the spreadsheet, while others used the Internet for years without predicting the World Wide Web. It will be surprising if an order-of-magnitude improvement in cost-effective computer performance does not enable a new disruptive application.
Harnessing Future Exponential Growth
In the coming decades, we seek to harness future technology growth, be it slow, fast, or discontinuous. How will we use more transistors (or more exotic technologies) to enable hardware that will support the creative applications being forecast in the rest of this volume?
How do we exploit billions and trillions of transistors? (A computer with one gigabyte of memory, for example, has more than 8 billion transistors.) One known way is to go beyond bit- and instruction-level parallelism to also exploit thread-level parallelism. When a processor executes a sequence of instructions, we say it is executing a thread. Thread-level parallelism is exploited when software specifies multiple threads to execute and hardware is capable of executing the threads concurrently. Today, several important applications are specified with multiple threads, including database management systems (Gray, in Chapter 5) and Web crawlers (Norvig, in Chapter 8); multiple threading is also used to beat human chess champions (Koller and Biermann, in Chapter 6). Unfortunately, too many current applications are programmed with a single thread, even if the problem could be solved in parallel. Much established business software, for example, was written for a single thread, even when the problem was once solved by many clerks operating in parallel. Nevertheless, creating multi-threaded software has been difficult. In addition, the motivation to create it has been reduced by the fact that many current computers execute only one thread at a time. Emerging techniques for supporting multiple threads on a microprocessor, however, promise to make multi-threading hardware much more widely available. Moreover, there is also tremendous additional potential for using threads executing in parallel on multiple computers that communicate via local- or wide-area networks. In the extreme, computers around the world could be harnessed to solve problems on demand. This vision is now known as grid computing, since it strives to deliver customers access to computing resources much as the power grid delivers customers power. We encourage future efforts to exploit parallel threads in all their forms, because doing so represents
the best-known way to grow computing performance much faster than Moore’s law enables for single processors.
We also see additional opportunity provided by technology discontinuities and synergies. First, Moore’s law will soon enable a complete (albeit simple) computer on a chip. Today’s personal computers, for example, use a complete processor on a chip—the microprocessor—together with several memory and support chips. Single-chip computers could dramatically cut costs and expand the effectiveness of systems. While we have primitive single-chip systems today, more powerful ones might unleash progress in a manner similar to that unleashed by the increasing performance of microprocessors over the last three decades. For at least the next decade, however, single-chip solutions must focus on systems smaller than personal computers (because personal computers use too much memory to fit on a chip). Nevertheless, the history of computing has shown that new smaller systems are great catalysts for change: mainframes to minicomputers to personal computers to personal digital assistants to cellular phones. Second, technologists are now fabricating sensors and emitters on chips. This technology holds the promise of systems that integrate with their environment at unprecedently low cost. A current success story is an accelerometer for triggering automobile airbag deployment. Third, the further integration of computers with communication will make the world even smaller. Communication, with and without wires, will enable ubiquitous connectivity. The World Wide Web shows us how communication magnifies the value of computation. Now, imagine a Web where you are always online everywhere!
Each of these trends will further integrate computers into our lives. In many cases, integration allows the computers to “disappear.” When Emily clicked to buy a book in the prelude to this volume, she was not even aware of most of the computers that implemented the transaction. Furthermore, she may not recognize her cellular phone, personal digital assistant, or pacemaker as computers; an integration that is an appropriate and natural consequence of our abilities to hide complexity from users.
Our success in hiding computers when they work, however, brings with it a responsibility to hide them when they fail. Imagine Web services as available as telephones and personal computers as dependable as televisions. Numerous solutions will be needed to enable this dependability, taking into account needs and appropriate costs. Large commercial systems may seek 10 to 100 times improvements in availability for small overheads (e.g., 10 percent), while critical functions like pacemakers may tolerate tripling hardware costs. In many cases, the underlying hardware may get more unreliable, because transistors are so small (and susceptible to cosmic rays) and numerous (more chances to fail). While some improvements can be done in hardware, transparently to software, other
solutions will require hardware and software changes. In many cases, we will have to design systems assuming that parts will fail. Unlike the current Web, however, we should seek to ensure that all systems mask almost all of those failures from users. By laying a more reliable foundation, we can expand the realms in which society can depend on information technology.
The last half-century has seen substantial computing advances and impacts on society. We expect the synergies just discussed to provide plenty of non-technical and technical challenges and opportunities. For society, the real information revolution may be coming soon. On the technical side, there is much work to be done. Arthur C. Clarke said, “Any sufficiently advanced technology is indistinguishable from magic.” Let’s create some more magic!
COMPUTABILITY AND COMPLEXITY
Jon Kleinberg, Cornell University, and Christos Papadimitriou, University of California, Berkeley
The Quest for the Quintic Formula
One of the great obsessions of Renaissance sages was the solution of polynomial equations: find an x that causes a certain polynomial to evaluate to 0. Today we all learn in school how to solve quadratic equations (polynomial equations of degree two, such as ax2 + bx + c = 0), even though many of us have to look up the formula every time (it’s ). Versions of this formula were known to the Babylonians as early as 2000 BC, and they were rediscovered in many ancient cultures. The discovery of similar but much more complicated formulas for solving equations of degree three and four—the cubic and quartic formulae—had to wait until the 16th century AD. During the next three centuries, the greatest minds in Europe strove unsuccessfully to discover the quintic formula, cracking equations of degree five, until the flowering of modern algebra brought the quest to a sudden, surprising resolution: a proof that there is no quintic formula.
This story, on first hearing, can engender a few natural reactions. Among them, surprise—what’s the obstacle to a quintic formula? Why was it so hard to prove it didn’t exist? And, more subtly, a mild sense of perplexity—what do we mean by a quintic formula anyway? Why can’t we write “the largest x such that ax5 + bx4 + cx3 + dx2 + ex + f = 0” and declare this to be a formula?
So we back up. By a “formula” in this story, we meant a particular thing: a finite sequence of steps that begins with the given values of the coefficients and ends with a root x; each step consists of one of the basic arithmetic operations applied to certain of the quantities already computed, or else it consists of the extraction of a root of one of the quantities already computed. Now we can assert more precisely, thanks to the work of the 19th-century mathematicians Abel and Galois: there is no quintic formula.
Viewed from the safe distance of a few centuries, the story is clearly one about computation, and it contains many of the key ingredients that arise in later efforts to model computation: We take a computational process that we understand intuitively (solving an equation, in this case), formulate a precise model, and from the model derive some highly unexpected consequences about the computational power of the process. It is precisely this approach that we wish to apply to computation in general. But moving from this example to a fully general model of computation
requires some further fundamental ideas, because the notion of a “formula”—a straight-line recipe of arithmetic operations—omits two of the crucial properties of general-purpose computation. First, computation can be repetitive; we should be able to perform some action over and over until a certain stopping condition is satisfied. Second, computation should contain “adaptive” steps of the following form: test whether a certain condition is satisfied; if it is, then perform action A; if it isn’t, then perform action B. Neither of these is present in straight-line formulas; but a little reflection convinces one that they are necessary to specify many of the other activities that we would consider computational.
Computation as a Universal Technology
So, guided by this intuition, let us move beyond stylized forms of computation and seek to understand general-purpose computation in all its richness—for if we could do this, then we might find similarly surprising consequences that apply much more broadly. Such was the goal of Alan Turing in the 1930s, and such was also the goal of a host of other mathematical logicians at that time.
Turing’s entry in this field is particularly compelling, not so much because of its mathematical elegance but because of its basic, commonsensical motivation and power. He sought a streamlined mathematical description of what goes on when a person is performing calculations in a large notebook: he or she writes down and erases symbols, turns the pages left or right, keeps a limited number of symbols in his or her memory. The computing device Turing proposed—the Turing machine—has access to an infinite sequence of “pages,” each of which can hold only one symbol. At any time, the machine can be in one of a finite set of possible “states of mind”—its working memory. The flow of control proceeds simply as follows: based on its current state, and the symbol it is currently reading, the machine may write a new symbol on the current page (erasing the existing one), move to the next or preceding page, and change its state. Subject to these rules, the Turing machine processes the input it is given and may eventually choose to halt, at which point the notebook contains its output.
Why should we accept this model? First, it accords well with common sense. It seems to be the way that symbolic computation, as performed slowly and painfully by humans, proceeds. Indeed, with some practice, one can implement seemingly any natural symbolic task on a Turing machine. Second, it is robust—a version of the model with very small sets of available symbols and states (say, eight of each) is, in a precise sense, just as powerful as a version with an arbitrary finite set of each, only the control rules become more complicated. Moreover, it does not matter that
the “pages” on which the computation is performed are arranged in a linear sequence; it would not add to the power of the model if we arranged them instead in an arbitrary web of connections. Finally, and most crucially, Turing’s model is precisely equivalent to the other formalisms proposed in his day, among them, and preceding it, Alonzo Church’s lambda calculus—and it is also equivalent to modern general-purpose programming languages such as C and Java (with access to an arbitrary amount of memory).
For the accumulation of these reasons, we are justified in believing that we have arrived at the “right” model of computation; and this is the content of the Church-Turing thesis: a symbolic function is computable if and only if it can be computed on a Turing machine (or its equivalents). It is important to notice what is being claimed: we have not derived the notion of “computability” from a set of more primitive axioms; rather, after extensive thought experiments, we have asserted that “computability” corresponds precisely to what can be computed on a Turing machine.
Accepting the Turing machine as the basis for our precise definition of computability is a momentous step. Its first consequence is that of universality: there is a “universal Turing machine” U that does the following. As input, U receives the description of a Turing machine M (the “code” of M, written in U’s book) and an input n to M (a later chapter in the same book). As output, U returns the result, if any, of running M on n. Today, we would think of U simply as an interpreter—it executes a step-by-step simulation of any Turing machine M presented to it. Indeed, our style of writing programs and then executing them is so ingrained in our view of computation that it takes a moment to appreciate the consequences that flow from a universal machine. It means that programs and data are really the same thing: a program is just a sequence of symbols that looks like any other piece of input; but when fed to a universal machine, this input wakes up and begins to compute. Think of mobile code, Java applets, e-mail viruses: your computer downloads them as data and then runs them as programs.
The principle of interchangeable parts—that components of a machine can be mass-produced independently, and a working whole can be assembled from a random selection of one part from each kind—was the disruptive insight that underpinned the Industrial Revolution. Universality is perhaps an even more radical approach to assembling systems: a single computer on your desk can run your word processor, your spreadsheet, and your online calendar, as well as new applications not yet conceived of or written. And while this may seem completely natural, most of technology in fact does not work this way at all. In most aspects of one’s technological life, the device is the application; they are one and the same. If you own a radio and want to watch TV, you must buy a new
device. If you want to drive, you use a car; but if you want to fly, you use an airplane. Your car cannot download a new set of instructions and suddenly be able to fly, or to maneuver underwater. But the computational world has a flexibility of application that cannot really be imagined elsewhere—and that is because the world of computation is powered by universal machines.
We have all seen the consequences of this flexibility very clearly in the past decade, as the World Wide Web became a new medium within a mere 7 years of its introduction. If we look at other media—at the phone, the radio, the television—it took a much longer time from their inceptions to widespread prominence. What was the difference? Of course there are many factors that one can point to, but mingled among them is the universality of the computers on which Web protocols and interfaces run. People had already bought personal computers for their homes, and built office information systems around them, before the Web ever existed. When the Web arrived, it could spread through this infrastructure with amazing speed. One cannot really imagine an analogue of this process for the television—it would be as though millions of Americans had been induced to buy large inert boxes for their living rooms, and a decade later someone dreamed up the technology to begin broadcasting pictures to them. But this is more or less what happened with the Web.
The Limits of Computation
Computer science was born knowing its own limitations. For the strength of the universal machine leads directly to a second, and more negative, consequence—uncomputability, the fact that certain natural computational tasks cannot be carried out by Turing machines or, by extension, computer programs. The leap from the notion of universality to this impossibility result is surprisingly effortless, if ingenious. It is rooted in two basic issues—first, the surprising difficulty in determining the “ultimate” behavior of a program; and second, the self-referential character of the universal machine U.
To appreciate the first of these, recall that our universal machine U simply performed a step-by-step simulation of a Turing machine M on an input n. This means that if M computes forever, never halting, then U’s simulation of M will run forever as well. This is the notion of an “infinite loop,” familiar to beginning (and experienced) programmers everywhere—your program keeps running with no sign of any output. Do you stop it and see what’s wrong, or wait to see if it’s just taking longer than expected to come up with an answer? We might well want something stronger than the blind simulation that U provides; we might want a “Universal Termination Detector”—a Turing machine D that behaves as
follows. Given a description of a Turing machine M and an input n to M, the machine D performs a finite number of steps, and then correctly reports whether or not M will ever halt with a result when it is run on n. (So in particular, the machine D itself halts on every input.)
Could one build such a thing? One’s first reaction is to start dreaming up tricks by which one could look at a program and determine whether it will halt or not—looking for obviously repetitive behavior with no stopping condition. But gradually the problem begins to look hopelessly difficult. Maybe the program you’re analyzing for termination is systematically enumerating the natural numbers, searching for a counter-example to a famous conjecture in mathematics; and it will only stop when it finds one. So a demonstration that this single program eventually terminates must implicitly resolve this mathematical conjecture! Could detecting the termination of programs really be as hard as automating mathematics?
This thought experiment raises the suggestion that we should perhaps be considering the problem from the other direction, trying to show that it is not possible to build a Universal Termination Detector. Another line of reasoning that might make us start considering such an impossibility result is, as suggested above, the self-referential nature of the universal machine U: U is a Turing machine that can simulate the behavior of any Turing machine. So, in particular, we could run U on a description of itself; what would happen? When you find yourself asking questions like this about a mathematical object—questions in which the object refers to itself—there are often explosive consequences lying just ahead. Indeed, the dangerous properties of self-reference appear in ordinary discourse. From ancient Greece we have Eubulides’ paradox, which asserts, “This statement is false”: is this a true statement or a false one? Or consider Bertrand Russell’s hypothetical barber, who only shaves the set of all people who do not shave themselves—who then shaves the barber himself?
In the case at hand, we can exploit the self-reference inherent in universality to prove that there is no Universal Termination Detector by showing that there is no Turing machine that correctly implements a Universal Termination Detector (Box 2.1).
This is a first, fundamental impossibility result for computation—a natural problem that cannot be solved computationally. And starting with this result, impossibility spreads like a shock wave through the space of problems. We might want a Universal Equivalence Tester: given two Turing machines M and M′, are they equivalent? Do they produce the same result on every input? But it is easy to convince yourself that if we could build an Equivalence Tester, we could use it to build a Termination Detector, which we know cannot exist. And so: There is no Universal Equivalence Tester. We might want a Universal Code Verifier: given a
We begin by observing that the set of all Turing machines, while clearly infinite, can be enumerated in a list M1; M2; M3, . . . in such a way that each Turing machine appears once in the list.
To do this, we can write a description of each Turing machine as a sequence of symbols and then order these descriptions in alphabetical order; we first list all descriptions with one symbol (if there are any), then all descriptions with two symbols, and so forth.
Our impossibility proof proceeds by assuming that there exists a Universal Termination Detector; we then show how this leads to a contradiction, establishing that our initial assumption cannot be valid. So to begin, let D be a Turing machine that is a Universal Termination Detector.
We construct, from D, a Turing machine X that will lead to the contradiction. On input n, here is what X does. It first invokes the Termination Detector D to determine whether the machine Mn will ever halt when run with input n. (This is the core of the self-reference: we investigate the behavior of Mn on its own position n in the alphabetical listing of Turing machines.) If it turns out that Mn will never halt on n, then X halts. But if it turns out that Mn will halt when run on n, then X gratuitously throws itself into an infinite loop, never halting.
X is not a very useful program; but that is not the point. The point is to notice that X is itself a Turing machine, and hence is one of the machines from our list; let us suppose that X is really Mk. The self-reference paradoxes we mentioned above—Eubulides’ and Russell’s—were both triggered by asking a natural question that exposed the latent contradiction. Our proof here employs such a question, and it is this: does X halt when it is run on input k?
We consider this question as follows. Suppose that X halts when it is run on k. Then, since we know X is really Mk, it follows that Mk halts on k; and so, by our construction of X, X should not halt when it is run on k. On the other hand, suppose that X does not halt when it is run on k. Then, again using the fact that X is really Mk, it follows that Mk does not halt on k; and so X should halt on k. So neither outcome is possible—X cannot halt on k, and it cannot fail to halt on k! This is a contradiction, so there cannot be such a machine X, and hence there cannot be a Universal Termination Detector D.
This style of proof is often referred to as diagonalization, and it was introduced by Cantor to show that one cannot put the real numbers in one-to-one correspondence with the integers. The term “diagonalization” here comes from the following intuition. We imagine drawing an infinite two-dimensional table whose rows correspond to the Turing machines M1; M2; M3; . . . and whose columns correspond to all the possible inputs 1; 2; 3; . . . . Each entry of this table—say the entry at the meeting of row i and column j—indicates whether or not machine Mi halts when it is run on input j. Viewed in these terms, our supposed machine X “walks down the diagonal” of this table; on input k, it consults the table entry at the meeting of row k and column k, and essentially inverts the answer that it finds there.
Turing machine M and an “undesirable” output n, is there any input that will cause M to produce the output n? But again, from a Code Verifier we could easily build a Termination Detector. No Termination Detector, no Code Verifier.
Suppose you want to verify that the program you’ve just written will never access a restricted area of memory; or suppose you want to ensure that a certain revision—a transformation of the program—will not cause it to change its behavior. Research in the area of programming languages has developed powerful techniques for program verification and transformation tasks, and they are used effectively to analyze complex programs in practice (Aho and Larus, in Chapter 4, discuss the transformation problem in the context of compiler optimization). Such techniques, however, are developed in a constant struggle against the negative results discussed above: over the years researchers have carved out broader and broader tractable special cases of the problem, but to solve these verification and transformation tasks in their full generality—to perform them correctly on all possible programs—is provably impossible. Such results impose fundamental limitations on our ability to implement and reason about complex pieces of software; they are among the laws that constrain our world.
When Finite Is Not Good Enough
Computers, as we think of them now, did not exist when Turing carried out his seminal work. But by the 1950s and 1960s, as truly automated computation became increasingly available, a growing amount of attention was devoted to the study and development of algorithms—step-by-step computational procedures, made precise by the notion of computability. And as this development began to gather force, it became clear that uncomputability was only one of the laws that constrained our ability to solve problems. The world abounded in problems whose solvability was not in doubt—but for which solving any but the smallest instances seemed practically infeasible.
Some of the most vivid of these problems came from the area of operations research, a field that sprang in large part from the epiphany—conceived during World War II, and spilling into civilian life ever after—that there was a science to the efficient coordinated movement of armies and organizations, the efficient allocation of supplies and raw materials. Thus, we can consider the Traveling Salesman Problem: You are given a map of N cities and the distances between them, as well as a “budget” B; you must visit all the cities via a tour whose total length is at most B. Or consider the Matching Problem: You must pair up 2N newly admitted
college students—some of whom don’t like one another—into N pairs of roommates, so that each pair consists of students who will get along.
In the 1960s, Jack Edmonds came up with a beautiful and efficient algorithm to solve the Matching Problem; and he wrote a paper describing the method. But how should one describe the result, actually? “A computational solution to the Matching Problem”?—this is not quite right, since there’s an obvious way to solve it: try all possible pairings, and see if any of them works. There was no question that the Matching Problem had a computable solution in Turing’s sense. The crux of the result was in the efficiency of the solution. Jack Edmonds understood this difficulty very clearly: “I am claiming, as a mathematical result, the existence of a good algorithm for finding a . . . matching in a graph. There is an obvious finite algorithm, but that algorithm increases in difficulty exponentially with the size of the graph. It is by no means obvious whether or not there exists an algorithm whose difficulty increases only algebraically with the size of the graph.”
It is hard to find much to add to this. There is clearly an algorithm that solves the Matching Problem in a number of steps that is exponential in N—at least N factorial. But try to imagine how long this would take. Even on the fastest computers we have today the problem of forming 30 pairs could require an amount of time comparable to the age of the universe. Yes, the solution is computable; yes, we can even imagine how the whole computation will proceed; but such a method is of no real value to us at all if we are seeking a solution to a problem of even moderate size. We need to find a qualitatively faster method; and this is exactly what Jack Edmonds had accomplished.
Edmonds’s concern with “good algorithms” fit naturally into a research agenda that was being pursued contemporaneously by Juris Hartmanis and Richard Stearns—that of determining the intrinsic computational complexity of natural problems, by determining the smallest number of computational steps that are required to produce a solution. And so, following this line of attack, we seek algorithms that require only “polynomial time”—on an input of size N, a “good” algorithm should use a number of steps that is bounded by a polynomial function of N such as N2 or N5. Clearly polynomial functions grow much more slowly than exponential ones, and they have a very desirable scaling property—if you increase your input size by a constant multiplicative factor, the running time goes up by a predictable constant factor as well. But however much one tries to justify our interest in polynomial time on theoretical grounds, its primary justification follows the same lines as the Church-Turing thesis that we saw earlier: it accords well with our experience from practice. Problems solvable by polynomial-time algorithms tend overwhelmingly to be efficiently solvable in practice; and for problems that lack polynomial-
time algorithms, one tends to encounter enormously difficult instances with some regularity.
The choice of polynomial time as a mathematical surrogate for efficiency has served computer science very well—it has been a powerful guide to the design of good algorithms. And algorithmic efficiency has proved to be a far subtler concept than we could have imagined. Consider again the Traveling Salesman Problem and the Matching Problem. For each, the “search space” is enormous: for the Traveling Salesman, any ordering of the N cities forms a tour that must in principle be considered; for the Matching Problem, any set of pairs that matches all of them is a candidate solution. And yet, despite similarities at this level, their behavior from the point of view of computational difficulty seems to be utterly divergent. Matching has a polynomial-time algorithm, and very large problems are solved every day in practice. For the Traveling Salesman Problem, on the other hand, we are famously without a polynomial-time algorithm, and the solution of relatively small instances can still require a major computational effort.
Where does this enormous difference in computational complexity lie? What features of a computational problem determine its underlying difficulty? The ongoing effort to resolve these questions is a core research activity in computer science today; it has led to a rich theory of computational intractability—including the notion of NP-completeness, which has spread from computer science into the physical, natural, and social sciences. It has also led to the celebrated “P versus NP” question (Box 2.2), which has drawn the attention of mathematicians as well as computer scientists and is now featured on several lists of the foremost open questions in mathematics.
Exponential growth is a recurring theme in computer science, and it is revealing to juxtapose two of its fundamental roles: in Moore’s law, which charts the exponential growth in the power of computing machinery (see Hill’s essay in this chapter), and in computational complexity, which asserts that the effort needed to solve certain problems grows exponentially in their size. Do these two principles cancel out? If our computing power is growing exponentially over time, will we really be bothered by problems of exponential complexity? The answer is that Moore’s law does not make our concerns about exponential complexity go away, and it is important to realize why. The full search space for a 17-city instance of the Traveling Salesman Problem is 16 times larger than the search space for a 16-city instance. So if exhaustively searching the space of solutions for a 16-city problem is at the limit of your current computer’s abilities, and if computing power doubles every year and a half, then you’ll need to wait 6 years before your new computer can tackle a 17-city problem by brute force—6 years to be able to solve a problem that is only
We have been concerned with the set of all problems that can be solved by a polynomial-time algorithm; let’s use P to denote this set of problems.
Now, we believe that the Traveling Salesman Problem is very difficult to solve computationally; it is likely that this problem does not belong to the set P we have just defined. But there is at least one good thing we can say about its tractability. Suppose we are given N cities, the distances between them, and a budget B; and suppose that in fact there exists a tour through all these cities of length at most B. Then there exists a way for someone to prove this to us fairly easily: he or she could simply show us the order in which we should visit the cities; we would then tabulate the total distance of this tour and verify that it is at most B.
So the Traveling Salesman Problem may not be efficiently solvable, but it is at least efficiently verifiable: if there is a short tour among the cities we are given, then there exists a “certificate”—the tour itself—that enables us to verify this fact in polynomial time. This is the crux of the Traveling Salesman Problem’s complexity, coiled like the “trick” that helps you solve a difficult puzzle: it’s hard to find a short tour on your own, but it’s easy to be convinced when the short tour is actually revealed to you. This notion of a certificate—the extra piece of information that enables you to verify the answer—can be formalized for computational problems in a general way. As a result, we can consider the set of all problems that are efficiently verifiable in this sense. This is the set NP—the set of all problems for which solutions can be checked (though not necessarily solved) in polynomial time.
It is easy to show that any problem in P must also belong to NP; essentially, this is as easy as arguing that if we can solve a problem on our own in polynomial time, then we can verify any solution in polynomial time as well—even without the help of an additional certificate. But what about the other side of the question: is there a problem that belongs to NP but not to P, a problem for which verifying is easy but solving is hard? Although there is widespread belief that such problems must exist, the issue remains unresolved; this is the famous “P versus NP” question.
To address this question, it is natural to seek out the “hardest” problems in NP, for they are the best candidates for problems that belong to NP but not to P.
one city larger! Waiting for Moore’s law to deliver better computing power only gets you so far, and it does not beat down the exponential complexity of a deeply intractable problem. What is needed is not just better hardware on which to apply brute force, but also a better algorithm for finding a solution—something like what Edmonds found for the Matching Problem.
The fact that simply-stated problems can have enormous complexity, with solutions that are computationally very difficult to find, has led to new perspectives on a number of well-studied ideas. Cryptography has
How can we formalize the notion that one problem is at least as hard as another? The answer lies in reducibility, an idea that came up implicitly when we discussed computational impossibility. We say that a problem A is “reducible” to a problem B if, given a “black box” capable of solving instances of B in polynomial time, we can design a polynomial-time algorithm for A. In other words, we are able to solve A by drawing on a solution to B as a “resource.” It follows that if we actually had a polynomial-time algorithm for problem B, we could use this as our “black box,” and hence design a polynomial-time algorithm for problem A. Or, simply running this reasoning backward, if there is no polynomial-time algorithm for A, then there cannot be a polynomial-time algorithm for B: problem B is at least as hard as problem A.
So here is a natural thing we might search for: a single problem B in NP with the property that every problem in NP is reducible to B. Such a problem would, quite conclusively, be among the hardest problems in NP—a solution to it would imply a solution to everything in NP. But do such problems exist? Why should there be a single problem that is this powerful?
In the early 1970s, Steve Cook in North America and Leonid Levin in the Soviet Union independently made the crucial breakthrough, discovering a number of natural problems in NP with precisely this property: everything in NP can be reduced to them. Today we say that such a problem is “NP-complete,” and research over the past decades has led to the discovery that there are literally thousands of natural NP-complete problems, across all the sciences. For example, determining the winner(s) under a wide variety of election and auction schemes is NP-complete. Optimizing the layout of the gates in a computer chip is NP-complete. Finding the folding of minimum energy for discrete models of proteins is NP-complete. The Traveling Salesman Problem—the tough nut that started us on this road—is NP-complete. And an important thing to bear in mind is this: because every problem in NP is reducible to any of these, they are all reducible to each other. There is a polynomial-time algorithm for one if and only if there is a polynomial-time algorithm for all. So we have come to realize that researchers in a host of different areas, struggling over a spectrum of computationally intractable problems, have in a fairly precise sense all been struggling over the same problem; this is the great insight that NP-completeness has given us. The original question remains open.
been revolutionized by the theory of complexity, for the design of secure communication protocols is a field that exists in a mirror-world where difficult computational problems—codes that are easy to apply and hard to break—are resources to be cultivated (see Sudan in Chapter 7). The RSA public-key cryptosystem was inspired by the presumed difficulty of factoring large integers, with the prime factors of a number N forming the hidden “key” whose knowledge allows for easy decryption. The age-old notion of randomness—a concept that is intuitively apparent but notoriously tricky to define—has been given an appealing formalization based
on computational complexity: a sequence of digits appears “random” to an observer if it is computationally difficult for the observer to predict the next digit with odds significantly better than guessing.
For designers of algorithms, we have seen that their struggle with the complexity of computation has proceeded at a number of different levels. One boundary divides the computable from the uncomputable—it is feasible to build a step-by-step interpreter for computer programs, but one cannot design an algorithm that decides whether arbitrary programs will terminate. Another boundary divides polynomial-time solvability from the exponential growth of brute-force search. But while polynomial time is indeed a good high-level means for gauging computational tractability, there are an increasing number of applications, typically involving very large datasets, where simply having a polynomial-time algorithm is far from adequate. Suppose the size of the input is measured in terabytes (millions of megabytes); an algorithm that takes a number of steps equal to the cube of the input size is no more useful in practice than one that never terminates.
None of this is to say that polynomial time has lost its relevance to the design of algorithms. But for many large-scale problems, we are faced with a reprise of the situation in the 1950s and 1960s, when we tumbled from a concern with computability to a concern with computational complexity: as our real-world computational needs expand, our guidelines for what constitutes an “acceptable” algorithm become increasingly stringent. And this in turn has led to new lines of research, focusing for example on algorithms that must run in time very close to the size of the input itself; algorithms that must “stream” through the input data in one pass, unable to store significant parts of it for post-processing.
While algorithm design has deepened into the study of increasingly time-efficient techniques, it has also opened outward, revealing that running time is just one of many sources of complexity that must be faced. In many applications—scheduling under real-time conditions, or managing a busy network—the input is not a static object but a dynamic one, with new data arriving continuously over time. Decisions must be made and solutions must be constructed adaptively, without knowledge of future input. In other applications, the computation is distributed over many processors, and one seeks algorithms that require as little communication, and synchronization, as possible. And through all these settings, from NP-complete problems onward, algorithms have been increasingly designed and analyzed with the understanding that the optimal solution may be unattainable, and that the optimum may have to serve only as an implicit benchmark against which the quality of the algorithm’s solution can be measured.
The Lens of Computation
Our contemplation of computation has led us quickly to the “P versus NP” question, now considered among the deepest open problems in mathematics and computer science. More recently, our views on complexity have been influenced by the striking confluence of computation and quantum physics: What happens to our standard notions of running time and complexity when the computation unfolds according to the principles of quantum mechanics? It is now known that access to such a hypothetical “quantum computer” would yield polynomial-time algorithms for certain problems (including integer factorization and the breaking of the RSA cryptosystem) that are believed to be intractable on standard computers. What are the ultimate limits of quantum computers? And are there theoretical obstacles to building them (in addition to the practical ones currently braved in labs all over the world)? These questions, purely computational in their origin, present some of the most daunting challenges facing theoretical physics today.
Biology is a field where the synergy with computation seems to go ever deeper the more we look. Let us leave aside all the ways in which sophisticated algorithmic ideas are transforming the practice of biology with vast genomic and structural databases, massive numerical simulations of molecules, and the fearsome symbol-crunching of the Human Genome Project. Instead, consider how we might view molecular biology itself through a computational lens, with the cell as an information-processing engine. For fundamentally, a biological system like a cell is simply a chemical system in which the information content is explicit. The genome is part of the overall chemical system, but it is not there to take part in the chemistry itself; rather, it is there as a sophisticated encoding of the chemical processes that will take place. It is a programming abstraction—it is the representation of the real thing, co-existing with the thing itself.
Just as computation distinguishes itself from the rest of technology, so are biological systems intrinsically different from all other physical and chemical systems—for they too have separated the application from the device. We can take a cell and change a few symbols in its genome—splice in a new gene or two—and we can cause its chemistry to change completely. We have replaced one piece of code with another; the device—the cellular hardware—has been left alone, while we simply changed the application running on it. This phenomenon really seems to have no parallel in the other sciences. Surely, the non-biological world obeys fundamental laws, but it does not contain—and actually implement—an explicit representation of these laws. Where in the solar system are the few molecules, encoding the laws of gravity and motion, which we could modify to cause the planets to follow more eccentric orbits?
Computation as a technology that follows its own laws; computation as the quintessence of universality; computation as a powerful perspective on the world and on science—these are issues that still drive our study of the phenomenon today. And the more we grapple with the underlying principles of computation, the more we see their reflections and imprints on all disciplines—in the way structured tasks can be cast as stylized computational activities; in the surprising complexity of simple systems; and in the rich and organic interplay between information and code.
QUANTUM INFORMATION PROCESSING
Charles H. Bennett, IBM Research
Computer science is based on a very fruitful abstraction whose roots are as old as mathematics but whose power was fully appreciated only in the 20th century: the notion that information and computation are a worthwhile object of study in their own right, independent of the physical apparatus used to carry the information or perform the computation. While at the most obvious level, Moore’s law (see Hill in this chapter) is a hardware success story, this success has only been possible because information is such an abstract stuff: make a bit a thousand times smaller and it remains useful for the same purposes as before, unlike a thousand-fold smaller car or potato.
Although Moore’s law has survived many early predictions of its demise, no exponential growth can go on forever. The present few decades are clearly an exceptional time in the history of computing. Sooner or later something will have to give, since at the present rate of shrinkage, information technology will reach atomic dimensions within 20 years. Accordingly, considerable thought and long-range planning are already being devoted to the challenges of designing and fabricating devices at the atomic scale and getting them to work reliably, a field broadly known as nanotechnology. However, it has long been known that atoms and other tiny objects obey laws of quantum physics that in many respects defy common sense. For example, observing an atom disturbs its motion, while not observing it allows it to spread out and behave as if it were in several different places at the same time. Until recently, computer designers considered such quantum effects mostly as a nuisance that would cause small devices to be less reliable and more error-prone than their larger cousins.
What is new, and what makes quantum informatics a coherent discipline, rather than a vexing set of technological problems, is the realization that quantum effects are not just a nuisance; rather, they offer a new and more comprehensive way of thinking about information, which can be exploited to perform important and otherwise impossible information-processing tasks. Already they have been used to create unconditionally secure cryptographic key agreement protocols, and in the future, if a quantum computer can be built, it could easily perform some computations (most notably the factorization of large numbers) that would take longer than the age of the universe by the best known algorithms, not only on today’s supercomputers, but also on the supercomputers of 2050 (by which time we predict Moore’s law will have ended).
The way in which quantum effects can speed up computation is not a simple quantitative improvement, such as would result from using a faster processor, or some fixed number of parallel processors to speed up computations by some constant factor depending on the hardware. Rather it is a qualitative improvement in the functional form of the computation cost, similar to what one typically gets by discovering a smarter algorithm to solve a problem that previously seemed hard. With quantum information processing the physical form of information, for the first time, has a qualitative bearing on the efficiency with which it can be processed, and the things that can be done with it. Quantum computers do not speed up all computations equally: a few, like factoring, are sped up superpolynomially; general NP search problems like the Traveling Salesman Problem are sped up quadratically, while other problems are not sped up at all.
But to say quantum computers offer the hope of using physical effects to dramatically speed up some computations is putting things backwards. In hindsight, it should rather be said that the laws of quantum physics, which as far as we know today apply to everything in the universe, provide a more powerful arsenal of physically performable mathematical operations than Turing imagined when he formalized the notion of a computer. Availing ourselves of this more powerful arsenal, although it does not enlarge the class of computable functions, makes some computational problems qualitatively easier than they seemed before. Unlike the former hope of a qualitative speedup from analog processing, which has largely been dashed by the ability of digital computers, via discrete approximation to simulate analog processes more accurately than they can be reproducibly performed in nature, quantum computation offers a realistic hope of qualitatively enlarging the scope of feasible computation.
However, it should be noted that actually building a useful quantum computer presents formidable technical challenges, which will probably be overcome eventually, but not any time soon. The confidence that these obstacles can ultimately be overcome rests largely on the theory of quantum error correction and fault tolerance, developed since 1995. This theory, which is analogous to the classical theory of fault tolerance developed by von Neumann and others in the days when computers were made of vacuum tubes and relays, allows arbitrarily large reliable quantum computations to be done on imperfect hardware, provided the hardware exceeds some threshold of reliability. The technical problem is that the quantum threshold is higher than the classical threshold discovered by von Neumann, while today’s quantum computing hardware processes quantum information far less reliably than vacuum tubes process classical information, so there remains a gap of several orders of magnitude that
still needs to be closed. There is every reason to believe it can be closed, but how and when remain to be seen.
Just how do quantum information and the hardware used to process it differ from the ordinary classical type of information processing formalized by Turing? States of a Turing machine tape, or any other digital storage medium, are in principle reliably distinguishable, and the tape can be copied accurately without disturbing it. This is a reasonable idealization of the behavior of macroscopic information processing hardware like punch cards, electromechanical relays, and even today’s most advanced microprocessors and memory chips. But it has been known since the early 20th century that at an atomic and subatomic scale actual matter behaves more subtly: not all states are reliably distinguishable even in principle, and information stored in such states cannot be copied without disturbing it. Speaking metaphorically, quantum information is like the information in a dream: attempting to describe your dream to someone else changes your memory of it, so you begin to forget the dream and remember only what you said about it.
This dreamlike behavior notwithstanding, quantum information obeys exact and well-understood laws. The so-called superposition principle holds that the possible states of any physical system correspond to directions in a d-dimensional space (“Hilbert space”), where the dimension d is characteristic of the system and represents the system’s maximum number of reliably distinguishable states. Two states are reliably distinguishable if and only if their Hilbert space directions are orthogonal (as is usually the case with macroscopic systems). Physically implementable operations on the system always conserve or reduce distinguishability, corresponding roughly to rigid rotations and projections in the Hilbert space. In the simplest nontrivial case d = 2 and the system (e.g., the internal state of an electron or photon, for example) is called a qubit. If two reliably distinguishable states of the qubit (e.g., horizontal and vertical polarizations for a photon) are arbitrarily designated |0> and |1>, a general state may be expressed as a linear combination α|0> + β|1>, where α and β are complex numbers such that |α|2 + |β|2 = 1. Another quantum principle, the projection postulate, holds that if a qubit in this state is subjected to an observation that would reliably distinguish |0> from |1>, the qubit behaves like |0> with probability |α|2 and like |1> with probability |β|2. Observing the system again yields no new information: the system again behaves like |0> or |1> according to the result of the first observation. More generally, observing a quantum system causes it to behave probabilistically, losing information about its previous state, except when the system was already in one of the states the observation was designed to distinguish. The art of quantum computing consists of accurately pre-
paring the quantum computer in a desired initial state, performing a sequence of accurately controlled manipulations (rotations) of its state without observing it during the intermediate stages of the computation, and then finally observing the final state to obtain a useful output.
This may sound a lot like analog computation, which is not believed to be significantly more powerful than conventional digital computation. How, one might ask, can a qubit—say a photon, which may be polarized at an arbitrary angle θ relative to the horizontal—be more powerful than an analog system—say a mechanical wheel oriented at angle θ relative to some standard position? At first sight, the wheel would appear more powerful. Not only can its orientation be accurately manipulated, but also the orientation (unlike a photon’s) can be observed quite precisely without significantly disturbing it. The essential difference, which makes quantum computation more powerful than analog computation, comes from the way individual information-bearing subsystems (e.g., qubits) combine to form a larger system (e.g., a quantum register, or a whole quantum computer). An n qubit register has 2n reliably distinguishable states corresponding to the n-bit strings |000…> through |111…1>; more generally the Hilbert space of a compound quantum system has a dimensionality equal to the product of the Hilbert space dimensions of its parts. By the superposition principle, the general state of an n qubit register corresponds to a direction in a 2n dimensional space; and during the computation the state may undergo controlled rotations in this large space. By contrast, an n wheel analog computer’s state lives in a parameter space of only n dimensions; more generally, an analog system’s parameter space has a dimensionality equal to the sum of the dimensionalities of its parts. The quantum computer’s advantage comes from its enormously larger state space. However, the advantage is rather subtle, because at the end of a quantum computation, in order to get a classical output, it is necessary to make an observation, and, by the projection postulate, doing so collapses the 2n parameter quantum state back down to n classical bits. This severe bottleneck at the end of the computation means that an n qubit quantum computer cannot process any greater volume of information than an n bit classical computer. However, for some computations, it can do the processing in far fewer steps because of the extra maneuvering room the large Hilbert space provides during intermediate stages of the computation.
As noted earlier, the big technical barrier to be overcome in constructing a quantum computer is to make the rate of hardware errors, during the unobserved portion of the computation preceding the final observation, sufficiently small. This is qualitatively similar to the problem digital computers faced in the era of vacuum tubes and relays, but quantitatively worse, because a quantum computer needs to be isolated much more
carefully from its environment to attain a given level of reliability. In particular, because of the disturbing effect of observation on quantum systems, quantum computers must be designed to prevent information about the data being processed from leaking out of the computer into the environment before the end of the computation. (If such premature leakage took place, the computation would begin to behave like a classical probabilistic computation instead of a quantum one, and the advantages of quantum speedup would be lost.) Fortunately the formalism of quantum error correction and fault tolerance allows arbitrarily good protection against such leakage to be achieved, provided the basic hardware exceeds some finite threshold of reliability.
Aside from computation per se, quantum information science includes the disciplines of quantum communication and quantum cryptography. The latter field is at a far more mature stage of development than is quantum computation per se, with successful laboratory experiments and even a few startup companies. Quantum cryptography is based on the fact that eavesdropping on quantum systems disturbs them. In a typical implementation, a random sequence of N faint polarized light pulses is sent through an optical fiber, after which the sender and receiver, by public discussion of the sent and received signals, estimate the amount of disturbance and hence of potential eavesdropping. If it is too great, they abort the protocol. Otherwise they can proceed to derive from the sent and received signals a shorter sequence of random secret key bits on which the eavesdropper has arbitrarily little information. More precisely it can be shown that any eavesdropping strategy obeying the laws of quantum physics, even when assisted by unlimited computing power, yields only exponentially little expected information on the final key sequence, if any. The eavesdropper thus faces the dilemma of eavesdropping gently and learning essentially nothing, or eavesdropping strongly and causing the protocol to abort. Quantum cryptography is practical because it does not require a full fledged quantum computer, only classical computers supplemented by equipment for generating, transporting, and detecting quantum signals, all of which are available today. Unlike classical methods of key agreement, it is unconditionally secure, and thus impervious to future improvements in algorithms or computing power.
Although it was inspired by physics, quantum information processing is a mathematically coherent and well-characterized extension of classical information processing. Indeed the latter can now best be viewed as a useful but limited subset of the larger subject of quantum information processing, somewhat as the real numbers are a useful but limited subset of the complex numbers. To continue the analogy, quantum information processing provides solutions, or improved solutions, to some problems in classical computation and cryptography, just as the complex plane
provides solutions and insights into problems in real analysis not explicitly involving complex variables. For this reason, even aside from its technological implications, quantum informatics is an intellectually exciting discipline, with far-reaching implications for the basic mathematical and physical sciences, both theoretical and experimental. It is already providing new ways of thinking about a wide variety of scientific and technical questions, and has begun to affect how science is taught, in a way that will bring a deeper understanding of the fruitful abstraction that is information not only to computer scientists but also to a broad segment of the lay public.
For further information:
Jozef Gruska, Quantum Computing (McGraw-Hill, 1999, ISBN 007 709503 0)
Michael Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, 2000, ISBN 0 0521 63235 8)
U.S. National Science Foundation report on quantum information science:
Online course notes:
Other quantum information Web sites: