Cover Image


View/Hide Left Panel

networks could provide high speed access to a wide range of mass data using low-cost, high speed, on-line mass-storage coupled to high performance processors feeding graphics workstations. Social scientists, scholars, planners, students. and even the public at large could access, manage, analyze, and visualize larger data sets more easily using a broader range of conventional and graphics tools than heretofore possible. But most of those who use these data today work with the data in ways remarkably similar to methods used by their predecessors three decades ago. Large data sets comprising records for thousands and even millions of individuals or other units of analysis have been, and continue to be, costly to use in terms of the dollars, time, computing facilities, and technical expertise required to handle them. These barriers can now be removed.

This paper looks briefly at the nature of the problem, the opportunities offered by computing and information system technology, and at one effort that has been made to realize the potential of these opportunities to revolutionize the manner in which massive census and survey data sets are handled. As this one example illustrates, realization of these opportunities has the potential for more than just a change of degree in terms of numbers of users, ease of use, and speed of response. Users of demographic, social, economic, behavioral, health, and environmental data can experience a qualitative change in how they work, interacting with data and tools in ways never before possible.

2 The Problem

Demographers, social scientists, and others who work with census and survey data are often faced with the necessity of working with data sets of such magnitude and complexity that the human and technological capabilities required to make effective use of the data are stretched to their limits-and often beyond. Even today, researchers may find it necessary to coordinate three or more layers of support personnel to assist them with their efforts to retrieve information from data sets ranging to gigabytes (GB) in size. Yet, these data are among the most valuable resources available for gaining insight into the social processes that are changing our world. These challenges are compounded today by the recognition that many of our pressing local, national, and international problems require multi-disciplinary approaches if the problems are to be understood and resolved. The success of these multi-disciplinary endeavors will depend in part upon how readily and how effectively researchers and analysts from the social sciences, environment, public policy, and public health can bring data from their disciplines, along with geographic and topological data, to bear on these problems.

Consequently, public data such as the Public Use Microdata Samples (PUMS). Current Population Surveys (CPS). American Housing Surveys (AHS), Census Summary Tape Files (STF), and National Center for Health Statistics mortality files are of greater potential value to a broader range of researchers, scholars, students, and planners than ever before. Yet, these are data that, because of the cost and difficulty in working with them, have historically been underutilized relative to their potential to lend insight into social, economic, political, historical, health, and educational issues. These are data that are relevant at levels ranging from personal to global concerns.

Unlocking information requires more than just access to data. Getting answers to even

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement