Click for next page ( 68

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 67
Paper 8 RESEAP~H IN DATA PROCESSING: THE PRI - CY OF PRACTICE Data processing can most appropriately be defined as that branch of computing that is principally concerned with data-intensive applications, i.e., situations in which a data base is the central organizing component for the application as a whole and in which the amount of computation per unit of data is typically small. Histori- cally, the term data processing has been associated with Business applications,. such as basic accounting systems. More generally, data processing is concerned with the use of the computer to support information systems in which the computer serves principally as a repository of information to support the operations, the control, or the planning of an organization or enterprise. The characteristics of data processing systems have changed over time, evolving from isolated programs based on files having rather elementary structure to integrated large-scale information systems based on shared data bases. But the importance of this activity to the field of computing as a whole has remained unchanged. Data processing accounts for by far the greatest proportion of actual computer usage; as organizations become more complex and the demands on them more exigent, data processing software will have to continue to evolve and advance. However, despite the prominence of this pragmatic subfield of computer science, comparatively little research (as opposed to product development) has focused on the problems characteristic of data processing. Very little of this research is classifiable as basic, and of the little research devoted to this field, even less can be counted as truly effective. A quick overview of the development of the data processing field will confirm this assertion. Major milestones include the following: 1. Advances in programming languages, especially COBOL, and more recently a number of higher-level or procedural languages such as NOMAD. 2. m e evolution of the concept of a data base management system and of the technology underlying it, together with such ancillary notions as query languages and data dictionaries. 3. The emergence of decision support systems combining facilities for retrieval analysis and manipulation as a means of providing decision makers with tools to support them. 4. A variety of application development systems, ranging from transaction processors to automatic documentation facilities, which 67

OCR for page 67
68 simplify and streamline the process of building data processing application programs. By and large, these major advances have not resulted from research programs conducted either in universities or in industrial research centers. Rather, as detailed below, they have grown out of field practice. That is, advances in data processing have typically been generated by practitioners with very immediate problems seeking incrementally improved but still feasible techniques for coping with them. For example, COBOL evolved from earlier programming languages for business applications. Mark IV and a number of file management systems are descendants of a sequence of systems developed by a number of commercial organizations, and most current data base management systems (especially IMS and the CODASYL family of systems) were initially developed by user organizations rather than by farsighted hardware or software systems vendors. The leaders in data processing innovation tend to be those groups and organizations most directly and broadly in contact with real field requirements. As noted, research concerned with several subareas of data processing is not entirely absent. In particular, research in data base management systems has been pursued at a number of universities, although, as already noted, the initial identification of this whole field must be credited to industrial and commercial users. Subse- quently, several universities (most notably the University of Michigan and the University of Pennsylvania, but others as well) have built up a substantial tradition of data base research. Similarly, a number of industrial research laboratories (most notably the IBM San Jose Laboratory and the Computer Corporation of America) have contributed to this area. By now, formally organized data management research has grown to the extent that it now has its own journal, the ACM Transactions on Database Systems. Several annual conferences for the presentation of research results have been organized, and a thriving research community has grown up. However, with a few exceptions, the impact of this research on data base management systems used in the field has been quiet and small. The most noteworthy case in which theoretically minded researchers have contributed techniques of real practical interest is in the development of the B*-tree and of various other closely related data access structures. Though its history can be traced back even further, the B*-tree as currently known traces back to initial work by Bayer and McCreight, one from the industrial and one from the academic research community. The superiority of the techniques they introduced was recognized at once, and the B*-tree idea was rapidly and widely disseminated to become the file structure of choice in modern data base systems. Another data management research theme, worked out by theoretically minded researchers during the 1970s, is that of the relational data base model. This idea, first proposed by Cod d of IBM research in 1970, seeks to simplify the user's view of the data base and the access languages with which he utilizes and manipulates it. However, the first relational systems are just now beginning to appear

OCR for page 67
69 on the market; the full impact of this research will probably not be seen for another year or two, and its pragmatic value remains to be proven. Research beyond the two efforts cited, some of which is having a measureable impact on the world of operational data base systems, could be adduced, for example, studies of the design of new query languages, methods for modeling the performance of data base access methods and file structures, and studies of the formal semantics of data base systems. Some of this work can even be characterized as relating to the deeper semantics of data base systems. However, what has charac- terized a trend in recent research in data base management is its distance from the world of applications. It is as though the research community has seized on the concept of data base management as an important issue worthy of study in its own right, and then proceeded to investigate it from purely aesthetic points of view having little to do with the problems that concern data base users. Increasingly ratified, much of this research has lost its vitalizing contact with the practical world of data processing that motivated data base systems in the first place. Other research programs originally inspired by the pragmatic concerns of the practical software builder can claim even less success when measured in terms of impact on applied research and development. Much of the enthusiastic academic work in automatic programming, i.e., the numerous attempts to develop systems with embedded expertise that would enable data applications programs to be built more easily, seems to have miscarried. Despite major funding from the Defense Advanced Research Projects Agency, this research did not achieve any major results in the area that it was addressing (although some potentially important groundwork was laid for subsequent research in natural language processing and formal program specification techniques). A second area of research concerning which a similarly pessimistic conclusion seems justified is found in the area of decision-support systems. Little has been transferred from this research to commercial environments. The main value of this research seems to have been in promoting the notion of decision-support systems as an important direction for data processing and information systems. However, the concept of decision-support systems did not in fact originate in research centers such as the MIT Sloan School of Management and the Wharton School, both of which have been active in this area. Rather, the decision-support notion was initially promulgated by innovative user organizations that were seeking to provide decision makers with more ready access to information. The role played by the research groups was the narrower one of identifying, conceptualizing, formal- izing, and disseminating this innovation, which, like many other innovations in the data processing field, grew directly out of the pressures of practice. We can summarize the preceding observations as follows. Research in data processing exhibits a different pattern from that observed in most other areas of science and many other areas of computing. The research laboratories are not in the position of making independent discoveries that the rest of the world then adopts. Rather, practi-

OCR for page 67
70 tioners are the driving force in the field in terms of identifying the new problems worthy of solution, finding innovative solutions for them, and applying ideas once they have been codified by the research community. The role played by the research community is that of an intermediary: researchers formulate perspectives covering the field as a whole and advertise new and important intellectual trends. When research laboratories working in data processing become disconnected from the practical world of data processing and lose touch with practi tioners, who are both the source and the consumers of the researchers' general formulations, their work falls prey to an occupational hazard: it becomes artificial and irrelevant to practice. Such research can often be the most appealing "academically," in the narrow sense of precision and formality. However, it is often research that bears little fruit and is destined to have little impact. These observations suggest that for the conduct of effective - research in data processing, close working relationships between the practical world of data processing and the research community, including personnel transfer, consulting, etc., are an absolute necessity. The vital advantage of a software builder in the field is that direct pressures tell him what commercial end users want and need to get the job done.