Click for next page ( 38


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 37
Paper 3 I~ORETICAL "SEA=H: ITS PURPOSES AD INFL==E Theoretical computer science is concerned with discovering the inherent power and limitations of computing techniques. This agenda proceeds in both an abstract vein of existence and nonexistence results for various models of computation and a concrete vein of algorithms and general methods for particular classes of problems. Both universities and industry contribute to the development of theory, but this is an area in which the universities lead. Good theory codifies, synthesizes, and transmits ideas. It lives most comfortably in universities and plays a dominant role in the advanced curriculum of computer science. The agenda and style of theory contributes both directly and indirectly to industrial development. Theory Provides a common body of understanding that practitioners automatically employ in attacking technical tasks. Practical computing would be different indeed without ideas instilled by theory, ideas such as recursion, assignment, tree, grammar, critical section, divide-and- conquer, NP-completeness, and many more. mese words are so fully familiar to every practitioner that it is hard to realize that only a couple of decades ago none were in common use in the field and some did not exist. Theory pervades education and thereby invades industry. Industry . counts on fresh college graduates to know the concepts referred to in the previous paragraph, to have a ready stock of techniques, and to understand the power and limitations of these techniques. In effect, education injects theory into the industrial milieu as a free good. Theory solves certain Problems that would otherwise be out of reach. m is truth is particularly obvious in the domain of applied mathematics--witness theoretically based developments such as linear programming, fast Fourier transform (FFT), or finite element techniques. Automata theory has given us fast search techniques suitable for bibliographic retrieval. Formal language theory has given us automatic consistency tests for language designs. Algebra has given us new cryptographic techniques. Complexity theory has given us combinatoric algorithms to help with geometric layout. Theory gives insight into the capabilities of computers. Nonexistence results advise practitioners not to waste time in quests such as absolutely optimal compilation or general verification of 37

OCR for page 37
38 programs. Finer theoretical analysis often tells how closely, or in what cases, some of these goals might actually be achieved. Proven bounds give objective standards of performance for algorithms and point out opportunities for improvement. An interesting recent example is the Ziv-Lempel result that certifies the efficacy of dictionary-style data compression. Knowing definitively that one class of statistical methods for text compression is good whether or not we know the statistics of the source, we can now attack new problems ' in data compression with a degree of confidence similar to that with which we approach sorting. Theory suggests generalizations that simplify computing. m e conventional core of programming language technology is strongly informed by theory: environments and binding, functional abstraction and composition, and nested scope and activity of variables are matters upon which common agreement has' been reached after refining in the theoreticians' fire. In other equally important matters, no comprehensive theory exists (e.g., global scopes, libraries, dynamic storage allocations, and asynchrony). Industrial software applications that fit within the ground covered by well-understood theory--and there are many--can be promptly and reliably implemented. Furthermore, second-order tools (program generators) suggest themselves in these areas and can be built with confidence. THE THEORETICIAN'S PLACE The world of computer science is in general an artificial one, divorced from concern with most physical laws, which are most decidedly not digital. In this respect, computer science has more in common with humanly constructed fields like accounting than with hard science: many of its artifacts merely reflect convention. Thus theoretical computer science, in pursuing mathematical ' truth, can touch only on certain core aspects of computing, which pervade, but by definition cannot embrace, the mutable, artificial world of systems built by or used in industry. Theory cannot prescribe how systems should work, but it can illuminate inherent costs and trade-offs that may be considered in designing them. Some prevalent modes of theoretical research engender isolation. Considerable work in theoretical computer science has been devoted to excessively inward-looking corners of the field whose raison d'etre is partial fulfillment of Ph.D. requirements. Moreover, even good work often goes unsung beyond the main audience of peers for whom many theoreticians play. And potentially interesting questions from outside that group may go unnoticed because they are unstylish or not worded in the accepted jargon. The industrial theorist is better protected from these perils than his university colleague. Isolation is abetted by computer science curricula. Hard and deep science has tended to disappear from standard computer science curricula. A typical bachelor in computer science graduates almost illiterate in science in general, and is much less able to understand

OCR for page 37
39 the hard sciences (except for elementary discrete mathematics) than other science or engineering graduates are to comprehend computer science. With this preparation, it is no wonder that much graduate research tends to look inward, and to touch other disciplines only in simplistic fashion. For this reason, industry may weigh the extracurricular or hobby-like experience of a research candidate heavily: a ~pure" computer scientist is often seen as a dubious investment. Direct applications of theory tend to be invisible. The nature of software products is such that even a central, critical algorithm supplied by theoretical research may be completely submerged, since in most cases (outside of applied mathematics) the basic solubility of the problem with which it deals is easily perceived. What theory contributes may ~only" be efficiency, the difficulty and importance of achieving which is hard to appreciate unless one happens to have tried to do it oneself. Who but an insider would imagine, for example, that most fast pattern-matching routines are direct and conscious applications of the equivalence of deterministic and nondeterministic finite automata? Especially when diffuse, the role played by theoretical insight in devising particular pieces of software will often be hard to perceive. Only the original programmer will know which of any of a hundred minor algorithmic choices in a piece of software were shaped by theoretical generalizations. Theory is not a journeyman trade. This truism explains industry's widespread unwillingness to hire and inability to exploit people to do theory. Save for an exceptionally insightful few, being "good at theory" suggests native brightness more than it does immediate utility. A theoretician will do industry little good except as he evinces a lively interest in seeking challenges from and propagating results within the firm. Unlike a cobbler, a theoretician will serve industry ill by minding his own last. He must be something of a showman, for his product does not speak for itself, and something of a sponge, for his questions must bear on those of others. CONVERSION OF THEORY INS O PRACTICE We have argued that theory provides much of the common body of understanding that practitioners automatically employ in attacking new technical task. For lack of a better phrase we may call this immediate recognition that causes us to do things right without thinking about them Permanent knowledge. n Codification and transmission of permanent knowledge is the theoretician's responsibility. Let us examine some concrete examples of how theoretical work ideas in computer science have, or have not, impinged on practice, with special reference to the mechanisms of technology transfer.

OCR for page 37
40 Steady Progress: Formal Language Theory in Programming Languages Programming languages, notably FORTRAN, came first, well before any recognized applicable theory. (For a more extended discussion of compiler technology, see Paper 4.) That is not to say that early compiler technology was unscientific. Quite to the contrary, the seminal FORTRAN project led by John Backus at IBM had a strong research flavor, but the goal was a working demonstration, not scholarly publication. FORTRAN provided an overwhelming existence proof; it became obvious to most people in the field that they could in principle do the same without further instruction, but only by dint of mighty effort. m e science that followed immediately upon FORTRAN was largely derivative, seeking generality of syntax and semantics by obvious analogy to conventional mathematical notation. However, when ALGOL 60 syntax was described in a notation much like that which Chomsky had recently brought to linguistics, and when LISP was defined by McCarthy in terms from the theory of recursive functions, it became clear that programming languages could be investigated and implemented systematically. m e injection of a simple idea borrowed by theoretically knowledgeable scientists from other domains and the creation of a coherent mode of discourse was all it took to start the . explosion. University researchers enthusiastically took up formal syntax during the 1960s, but commercial purveyors of compilers, while aware of the field, were essentially uninfluenced by it except as a descriptive tool. In the commercial world, everybody "knew" you could do better by hand methods than by formalized techniques. Only a few groups tried to exploit the new knowledge. Massachusetts Computer Associates, for instance, developed and used precedence parsing and reduction analysis for their contract software business. In this exploratory period, various grammar models were tried with different trade-offs between simplicity and generality. Eventually, sufficient expertise developed that various research groups undertook to-prove they could write production compilers with their new techniques. GE's FORTRAN Y. for example, convincingly demonstrated the practicality of so-called LL parsing. Later, the so-called LALR model gained ascendancy. With the technology now fully established, new languages like CHILL and ADA have benefited from analysis of their grammars while the languages were being defined. Grudging Awareness: Theoretical Algorithms in Optimizing Compilers Optimization and flow analysis remained outside of fashionable theoretical research for 15 years, until they were perceived as pragmatically significant examples of the combinatoric problems that came in the 1970s to dominate the subject of algorithms. Individual practitioners began to introduce formal techniques into this corner of combinatoric algorithms over a decade ago, but in that whole period the subject of optimizing compilers remained outside the domain of

OCR for page 37
41 permanent knowledge, and even today, most optimizing compilers are undertaken mainly as prodigious, largely heuristic, projects. Moreover, modest compiler efforts have not seen optimization as a necessary goal. Nevertheless a transmissible framework now exists, and code optimization may be expected to join syntatic analysis as a well-established part of programming language technology. Scant Contact: Formal Semantics in Languages The history of formal semantics illustrates a different story. The primary groundwork of the theory dates to the 1930s. Its specific application to accurate definition of the meaning of programming languages began in the early 1960s. By now, a great deal of theory has developed in this area. Nevertheless, the influence of this theory on practice has been minimal. No catalyzing event like the ALGOL 60 report has yet occurred to demonstrate its utility to the world at large. Certain derivative ideas, such as the virtue of applications languages, have gained wide currency, and numerous ex post facto unofficial formal descriptions have been published. An attempt to tighten and debug the informal definition of DOD's ADA language by formal (and automated) techniques is in progress today. But overall this theory remains an academic rather than a pragmatically influential subject. Contagious Spread: Cryptography and FFT Spurred by the NBS data encryption standard, cryptographic theory burst upon computer science theoreticians and practitioners almost simultaneously. Natural curiosity concerning a type of activity customarily carried out in deepest secrecy drew the attention of the scientific and popular press to the theorists' work. Important theoretical innovations, notably public-key cryptosystems, entered the general consciousness almost immediately. The contagious spread of these cryptographic ideas is not unique. The FFT (fast Fourier transform) caught on even faster upon its rediscovery by Tukey and Cooley 20 years ago, and knowledge of radix exchange sort, then Hoare's quicksort, exploded similarly. mese techniques speeded up particular computing processes dramatically, and, more importantly, they impressed the community at large with the importance of considering the complexity of computing processes. Saturation: NP-Completeness Originally promulgated by Cook, the importance of the idea of NP-Completeness was hammered home when Karp published a portfolio on NP-complete problems with something for everybody. NP-Completeness has taken its place as a prime concept concerning computing.

OCR for page 37
42 HOW IDEAS SPREAD The preceding discussion suggests that the fruits of theory are imprinted in permanent knowledge by catalyzing events, such as the publication of the Algol report or of the quicksort algorithm, but that the ultimate influence of the ideas thus introduced may greatly overshadow the events that initially make them known. m e contributions of theory have diverse provenances: the FET came from outside the computer science community, Algol from a committee, quicksort from an individual, and cryptography from an underground bureaucracy. When the time is right, ideas can spread more rapidly than the printed word if there is a network ready and eager to propagate them. The network did not operate as well for formal semantics: Scott's results were transmitted quickly and with respect but not with fanfare, and not into the general consciousness. What accounts for the difference? First and foremost, one observes that the catalyzing events were publications on significant demonstration of an idea. The network advertised them, but did not constitute their prime publication. Second, though the most pragmatically successful theoretical contributions were in one sense breakthroughs, in another sense they were continuous advances--they did not require a totally new way of looking at the world. _ _ _ , . _ (Even FORTRAN, so remote syntactically from machine language, was transparently equivalent to it, and FORTRAN was comfortably like the mathematics for which machines were being used at the time.) Third, regardless of any step toward abstraction--often the most important contribution of a theoretical idea--applicability to known problems was demonstrated. In short, it is immediately applicable rather than pure theory that spreads fastest. A further proviso is that there be a theoretician to popularize it; it is difficult to find examples where a popularizer, not himself theoretically inclined, has succeeded in spreading useful theoretical gospel. Before Popularity We should not conclude, however, that theory of value, or theory that pays off, comes first and foremost from theorists with a bent for popularization, or that only work from which a dramatic development can be anticipated pays off. Far from it. Popularization is only the visible part of the iceberg of theory. m e clarity of the Algol report derived from a hidden prior decade of theoretical contributions. How did an astute organizer (Bauer) know that this was the time to act, or achieve such unity of purpose among a convention, or happen to cultivate a m omas Jefferson (Naur) to draft this report? Could it be done again for semantic description, or for data base technique, or for command languages? The most we can assert is a platitude: Good results are bound to emerge from creative people of vision and purpose.