Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 37
Paper 3
I~ORETICAL "SEA=H: ITS PURPOSES AD INFL==E
Theoretical computer science is concerned with discovering the inherent
power and limitations of computing techniques. This agenda proceeds in
both an abstract vein of existence and nonexistence results for various
models of computation and a concrete vein of algorithms and general
methods for particular classes of problems. Both universities and
industry contribute to the development of theory, but this is an area
in which the universities lead.
Good theory codifies, synthesizes, and transmits ideas. It lives
most comfortably in universities and plays a dominant role in the
advanced curriculum of computer science. The agenda and style of
theory contributes both directly and indirectly to industrial
development.
Theory Provides a common body of understanding that practitioners
automatically employ in attacking technical tasks. Practical computing
would be different indeed without ideas instilled by theory, ideas such
as recursion, assignment, tree, grammar, critical section, divide-and-
conquer, NP-completeness, and many more. mese words are so fully
familiar to every practitioner that it is hard to realize that only a
couple of decades ago none were in common use in the field and some did
not exist.
Theory pervades education and thereby invades industry. Industry
.
counts on fresh college graduates to know the concepts referred to in
the previous paragraph, to have a ready stock of techniques, and to
understand the power and limitations of these techniques. In effect,
education injects theory into the industrial milieu as a free good.
Theory solves certain Problems that would otherwise be out of
reach. m is truth is particularly obvious in the domain of applied
mathematics--witness theoretically based developments such as linear
programming, fast Fourier transform (FFT), or finite element
techniques. Automata theory has given us fast search techniques
suitable for bibliographic retrieval. Formal language theory has given
us automatic consistency tests for language designs. Algebra has given
us new cryptographic techniques. Complexity theory has given us
combinatoric algorithms to help with geometric layout.
Theory gives insight into the capabilities of computers.
Nonexistence results advise practitioners not to waste time in quests
such as absolutely optimal compilation or general verification of
37
OCR for page 38
38
programs. Finer theoretical analysis often tells how closely, or in
what cases, some of these goals might actually be achieved. Proven
bounds give objective standards of performance for algorithms and point
out opportunities for improvement.
An interesting recent example is the Ziv-Lempel result that
certifies the efficacy of dictionary-style data compression. Knowing
definitively that one class of statistical methods for text compression
is good whether or not we know the statistics of the source, we can now
attack new problems ' in data compression with a degree of confidence
similar to that with which we approach sorting.
Theory suggests generalizations that simplify computing. m e
conventional core of programming language technology is strongly
informed by theory: environments and binding, functional abstraction
and composition, and nested scope and activity of variables are matters
upon which common agreement has' been reached after refining in the
theoreticians' fire. In other equally important matters, no
comprehensive theory exists (e.g., global scopes, libraries, dynamic
storage allocations, and asynchrony).
Industrial software applications that fit within the ground covered
by well-understood theory--and there are many--can be promptly and
reliably implemented. Furthermore, second-order tools (program
generators) suggest themselves in these areas and can be built with
confidence.
THE THEORETICIAN'S PLACE
The world of computer science is in general an artificial one, divorced
from concern with most physical laws, which are most decidedly not
digital. In this respect, computer science has more in common with
humanly constructed fields like accounting than with hard science:
many of its artifacts merely reflect convention. Thus theoretical
computer science, in pursuing mathematical ' truth, can touch only on
certain core aspects of computing, which pervade, but by definition
cannot embrace, the mutable, artificial world of systems built by or
used in industry. Theory cannot prescribe how systems should work, but
it can illuminate inherent costs and trade-offs that may be considered
in designing them.
Some prevalent modes of theoretical research engender isolation.
Considerable work in theoretical computer science has been devoted to
excessively inward-looking corners of the field whose raison d'etre is
partial fulfillment of Ph.D. requirements. Moreover, even good work
often goes unsung beyond the main audience of peers for whom many
theoreticians play. And potentially interesting questions from outside
that group may go unnoticed because they are unstylish or not worded in
the accepted jargon. The industrial theorist is better protected from
these perils than his university colleague.
Isolation is abetted by computer science curricula. Hard and deep
science has tended to disappear from standard computer science
curricula. A typical bachelor in computer science graduates almost
illiterate in science in general, and is much less able to understand
OCR for page 39
39
the hard sciences (except for elementary discrete mathematics) than
other science or engineering graduates are to comprehend computer
science. With this preparation, it is no wonder that much graduate
research tends to look inward, and to touch other disciplines only in
simplistic fashion. For this reason, industry may weigh the
extracurricular or hobby-like experience of a research candidate
heavily: a ~pure" computer scientist is often seen as a dubious
investment.
Direct applications of theory tend to be invisible. The nature of
software products is such that even a central, critical algorithm
supplied by theoretical research may be completely submerged, since in
most cases (outside of applied mathematics) the basic solubility of the
problem with which it deals is easily perceived. What theory
contributes may ~only" be efficiency, the difficulty and importance of
achieving which is hard to appreciate unless one happens to have tried
to do it oneself. Who but an insider would imagine, for example, that
most fast pattern-matching routines are direct and conscious
applications of the equivalence of deterministic and nondeterministic
finite automata?
Especially when diffuse, the role played by theoretical insight in
devising particular pieces of software will often be hard to perceive.
Only the original programmer will know which of any of a hundred minor
algorithmic choices in a piece of software were shaped by theoretical
generalizations.
Theory is not a journeyman trade. This truism explains industry's
widespread unwillingness to hire and inability to exploit people to do
theory. Save for an exceptionally insightful few, being "good at
theory" suggests native brightness more than it does immediate
utility. A theoretician will do industry little good except as he
evinces a lively interest in seeking challenges from and propagating
results within the firm. Unlike a cobbler, a theoretician will serve
industry ill by minding his own last. He must be something of a
showman, for his product does not speak for itself, and something of a
sponge, for his questions must bear on those of others.
CONVERSION OF THEORY INS O PRACTICE
We have argued that theory provides much of the common body of
understanding that practitioners automatically employ in attacking
new technical task. For lack of a better phrase we may call this
immediate recognition that causes us to do things right without
thinking about them Permanent knowledge. n Codification and
transmission of permanent knowledge is the theoretician's
responsibility.
Let us examine some concrete examples of how theoretical work ideas
in computer science have, or have not, impinged on practice, with
special reference to the mechanisms of technology transfer.
OCR for page 40
40
Steady Progress: Formal Language Theory in Programming Languages
Programming languages, notably FORTRAN, came first, well before any
recognized applicable theory. (For a more extended discussion of
compiler technology, see Paper 4.) That is not to say that early
compiler technology was unscientific. Quite to the contrary, the
seminal FORTRAN project led by John Backus at IBM had a strong research
flavor, but the goal was a working demonstration, not scholarly
publication. FORTRAN provided an overwhelming existence proof; it
became obvious to most people in the field that they could in principle
do the same without further instruction, but only by dint of mighty
effort.
m e science that followed immediately upon FORTRAN was largely
derivative, seeking generality of syntax and semantics by obvious
analogy to conventional mathematical notation. However, when ALGOL 60
syntax was described in a notation much like that which Chomsky had
recently brought to linguistics, and when LISP was defined by McCarthy
in terms from the theory of recursive functions, it became clear that
programming languages could be investigated and implemented
systematically. m e injection of a simple idea borrowed by
theoretically knowledgeable scientists from other domains and the
creation of a coherent mode of discourse was all it took to start the
.
explosion.
University researchers enthusiastically took up formal syntax
during the 1960s, but commercial purveyors of compilers, while aware of
the field, were essentially uninfluenced by it except as a descriptive
tool. In the commercial world, everybody "knew" you could do better by
hand methods than by formalized techniques. Only a few groups tried to
exploit the new knowledge. Massachusetts Computer Associates, for
instance, developed and used precedence parsing and reduction analysis
for their contract software business.
In this exploratory period, various grammar models were tried with
different trade-offs between simplicity and generality. Eventually,
sufficient expertise developed that various research groups undertook
to-prove they could write production compilers with their new
techniques. GE's FORTRAN Y. for example, convincingly demonstrated the
practicality of so-called LL parsing. Later, the so-called LALR model
gained ascendancy. With the technology now fully established, new
languages like CHILL and ADA have benefited from analysis of their
grammars while the languages were being defined.
Grudging Awareness: Theoretical Algorithms in Optimizing Compilers
Optimization and flow analysis remained outside of fashionable
theoretical research for 15 years, until they were perceived as
pragmatically significant examples of the combinatoric problems that
came in the 1970s to dominate the subject of algorithms. Individual
practitioners began to introduce formal techniques into this corner of
combinatoric algorithms over a decade ago, but in that whole period the
subject of optimizing compilers remained outside the domain of
OCR for page 41
41
permanent knowledge, and even today, most optimizing compilers are
undertaken mainly as prodigious, largely heuristic, projects.
Moreover, modest compiler efforts have not seen optimization as a
necessary goal. Nevertheless a transmissible framework now exists, and
code optimization may be expected to join syntatic analysis as a
well-established part of programming language technology.
Scant Contact: Formal Semantics in Languages
The history of formal semantics illustrates a different story. The
primary groundwork of the theory dates to the 1930s. Its specific
application to accurate definition of the meaning of programming
languages began in the early 1960s. By now, a great deal of theory has
developed in this area. Nevertheless, the influence of this theory on
practice has been minimal. No catalyzing event like the ALGOL 60
report has yet occurred to demonstrate its utility to the world at
large. Certain derivative ideas, such as the virtue of applications
languages, have gained wide currency, and numerous ex post facto
unofficial formal descriptions have been published. An attempt to
tighten and debug the informal definition of DOD's ADA language by
formal (and automated) techniques is in progress today. But overall
this theory remains an academic rather than a pragmatically influential
subject.
Contagious Spread: Cryptography and FFT
Spurred by the NBS data encryption standard, cryptographic theory burst
upon computer science theoreticians and practitioners almost
simultaneously. Natural curiosity concerning a type of activity
customarily carried out in deepest secrecy drew the attention of the
scientific and popular press to the theorists' work. Important
theoretical innovations, notably public-key cryptosystems, entered the
general consciousness almost immediately.
The contagious spread of these cryptographic ideas is not unique.
The FFT (fast Fourier transform) caught on even faster upon its
rediscovery by Tukey and Cooley 20 years ago, and knowledge of radix
exchange sort, then Hoare's quicksort, exploded similarly. mese
techniques speeded up particular computing processes dramatically, and,
more importantly, they impressed the community at large with the
importance of considering the complexity of computing processes.
Saturation: NP-Completeness
Originally promulgated by Cook, the importance of the idea of
NP-Completeness was hammered home when Karp published a portfolio on
NP-complete problems with something for everybody. NP-Completeness has
taken its place as a prime concept concerning computing.
OCR for page 42
42
HOW IDEAS SPREAD
The preceding discussion suggests that the fruits of theory are
imprinted in permanent knowledge by catalyzing events, such as the
publication of the Algol report or of the quicksort algorithm, but that
the ultimate influence of the ideas thus introduced may greatly
overshadow the events that initially make them known. m e contributions
of theory have diverse provenances: the FET came from outside the
computer science community, Algol from a committee, quicksort from an
individual, and cryptography from an underground bureaucracy. When the
time is right, ideas can spread more rapidly than the printed word if
there is a network ready and eager to propagate them. The network did
not operate as well for formal semantics: Scott's results were
transmitted quickly and with respect but not with fanfare, and not into
the general consciousness. What accounts for the difference?
First and foremost, one observes that the catalyzing events were
publications on significant demonstration of an idea. The network
advertised them, but did not constitute their prime publication.
Second, though the most pragmatically successful theoretical
contributions were in one sense breakthroughs, in another sense they
were continuous advances--they did not require a totally new way of
looking at the world.
_ _ _ , . _
(Even FORTRAN, so remote syntactically from
machine language, was transparently equivalent to it, and FORTRAN was
comfortably like the mathematics for which machines were being used at
the time.) Third, regardless of any step toward abstraction--often the
most important contribution of a theoretical idea--applicability to
known problems was demonstrated. In short, it is immediately
applicable rather than pure theory that spreads fastest. A further
proviso is that there be a theoretician to popularize it; it is
difficult to find examples where a popularizer, not himself
theoretically inclined, has succeeded in spreading useful theoretical
gospel.
Before Popularity
We should not conclude, however, that theory of value, or theory that
pays off, comes first and foremost from theorists with a bent for
popularization, or that only work from which a dramatic development can
be anticipated pays off. Far from it. Popularization is only the
visible part of the iceberg of theory. m e clarity of the Algol report
derived from a hidden prior decade of theoretical contributions. How
did an astute organizer (Bauer) know that this was the time to act, or
achieve such unity of purpose among a convention, or happen to
cultivate a m omas Jefferson (Naur) to draft this report? Could it be
done again for semantic description, or for data base technique, or for
command languages? The most we can assert is a platitude: Good
results are bound to emerge from creative people of vision and purpose.
Representative terms from entire chapter:
theoretical computer