APPENDIX
D

Research Uses of Census Data

The 1990 census constitutes the one of the most important sources of U.S. data for basic and applied social research across the decade of the 1990s and will retain its importance into the long-term future. While the primary use of census data may be more limited in some fields (e.g. economics), the census is often used indirectly for weighting sample surveys or for serving as the denominators for demographic rates. Decennial census data are of primary importance in three different respects. First, taken in themselves, they constitute the basic data for extensive research into U.S. society, the economy, the work force, housing stock, and population distribution, as well as the characteristics of the nation and its various regions. The census is the only source of comparable data for all geographic regions in the nation. Similarly, for many purposes it is the only large-scale source of data for minority groups and for other special populations, such as specific income, age, or occupation groups. Second, census data are often necessary for research on noncensus data, such as school, hospital, legal, and administrative records, and provide the necessary denominators for calculating rates and proportions in many arenas of research. Third, the data are vital to the design of research, including the most obvious example—the drawing of survey samples.

Census data are indispensable for a wide range of research questions, programs, and instructional programs. A large cadre of established researchers and newcomers (including graduate students) awaits the release of census data every 10 years. After the 1990 census, these social scientists were ready to mine the data to describe what happened during the 1980s, to confirm or refute the hypotheses



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 259
Modernizing the U.S. Census APPENDIX D Research Uses of Census Data The 1990 census constitutes the one of the most important sources of U.S. data for basic and applied social research across the decade of the 1990s and will retain its importance into the long-term future. While the primary use of census data may be more limited in some fields (e.g. economics), the census is often used indirectly for weighting sample surveys or for serving as the denominators for demographic rates. Decennial census data are of primary importance in three different respects. First, taken in themselves, they constitute the basic data for extensive research into U.S. society, the economy, the work force, housing stock, and population distribution, as well as the characteristics of the nation and its various regions. The census is the only source of comparable data for all geographic regions in the nation. Similarly, for many purposes it is the only large-scale source of data for minority groups and for other special populations, such as specific income, age, or occupation groups. Second, census data are often necessary for research on noncensus data, such as school, hospital, legal, and administrative records, and provide the necessary denominators for calculating rates and proportions in many arenas of research. Third, the data are vital to the design of research, including the most obvious example—the drawing of survey samples. Census data are indispensable for a wide range of research questions, programs, and instructional programs. A large cadre of established researchers and newcomers (including graduate students) awaits the release of census data every 10 years. After the 1990 census, these social scientists were ready to mine the data to describe what happened during the 1980s, to confirm or refute the hypotheses

OCR for page 259
Modernizing the U.S. Census of extant studies, to exploit new opportunities for study of small geographic areas, and to examine temporal changes. RESEARCH VALUE Much of our knowledge of social transformations over recent decades has been obtained from analyses of census data. The decennial census has developed into an indispensable tool of government and a servant of many other purposes. A small number of questions deemed suitable and necessary for the government to ask has sufficed to delineate many of the most important features of the nation's peoples and their principal activities. Age, sex, race, ethnicity, family relationship, parenthood, place of current and previous residence, birthplace, education, employment, income—these topics covered in the census provide raw data for specification and analysis of each of the social and economic transformations mentioned above, and for an endless array of reports and research by public and private agencies. The census of 1940 was the first to go beyond the traditional complete enumeration and employ sampling for some questions. Sampling continued in subsequent censuses, with a roughly 1-in-8 sample used in the 1990 census for most census data. Development of sampling theory and methods and the need for much more frequent temporal detail on many topics gave rise to a wide variety of federal, state, and privately sponsored surveys from the 1940s to the present. These surveys have not replaced the need for the census to provide data for small geographic areas, for small and widely dispersed population groups (such as American Indians), and for a benchmark for intercensal surveys. Forms of Census Data The Census Bureau releases decennial census data in two forms. One form is aggregated data for geographic units and administrative jurisdictions. The aggregated data are for varying geographic sizes ranging from city blocks to metropolitan statistical areas (MSAs) to states. Aggregated data are made available in published form that are widely available in libraries and research centers and in summary tapes. The Census Bureau released a number of different summary tape files containing an immense quantity of census data arranged by geographic category. The data consist of detailed tabulations for each geographic area and tables by conventional categories, such as age and race. A typical table might show the population by age, sex, and race for census tracts for a city. Such a table would allow an educational planner to note the elementary school-age population for Asian children for small areas for purposes of planning the needs for bilingual language instruction. With the aid of the Census Bureau's TIGER software for geographical referencing, aggregated data may be further combined to any collection of geographic units, including local service areas,

OCR for page 259
Modernizing the U.S. Census congressional districts, and areas generated by the researcher. Given further information, changes in areas can be projected. A typical research task is to project the size and characteristics of a particular ethnic or age group using birth, mortality, and migration rates. A second form of census data are Public Use Microdata Sample (PUMS) files. PUMS files provide data for a sample of housing units, as well as for each individual residing in those housing units. Geographic identification in the 1990 PUMS files extends down to areas of 100,000 or more people. This level of identification allows analysis of specific cities, groups of cities, and counties or groups of counties that reach the threshold of at least 100,000 population. PUMS files do not include any individual-level identification and are analogous to the data of common sample surveys of persons or households, but on a much larger scale. PUMS files for 1990 have sample sizes of 1 and 5 percent of the nation's population, including about 2.5 million and 12.5 million persons in the samples, respectively. The large PUMS data files permit research of literally ten of millions of observations, including the creation of samples of minority populations that cannot be found in any other available national sample. Although the PUMS files sacrifice the geographic detail of aggregated census data, they compensate by freeing the researcher to produce tables that do not appear in the aggregated data. For example, race, ethnicity, gender, income, and employment can be related to fertility, age, or any other characteristic—a multivariate association that does not exist in the aggregated data. Census data for 1990 are a snapshot of the structure and characteristics of the nation's population and of its geographic and jurisdictional components as of April 1. The value of the data are greatly enhanced, moreover, by the ready availability of computer-readable data from earlier censuses. Comparable PUMS files are available for earlier censuses of population and housing. Access to historical PUMS census data is described later. Aggregate data for essentially comparable geographic units and jurisdictions for 1980 and 1970 can be combined with 1990 census data for research on changes over time. The research value of these comparable sources is apparent. Data from several censuses and related sources support research into change in size and age structure of population groups, comparison of cohorts over time, and examination of comparable population groups at different times. Thus, period, age, and cohort effects can be examined, and changes in living conditions, income, housing, and other aspects of the well-being of population groups can be traced through time. Identification of patterns and rates of change allows projections into the future size, characteristics, and distribution of the population. These projections have fundamental importance to the formulation and evaluation of public policies. They allow estimation of the kinds and extent of services that will be required or demanded in the future and the magnitude of social resources necessary to meet those needs and demands. As an example, the nature and adequacy

OCR for page 259
Modernizing the U.S. Census of the housing of particular groups can be assessed. Projections of the future size of the population, its household size, and its residential patterns together will present patterns of home ownership, and provide a basis for estimating the housing needs of the future for particular metropolitan and rural areas as well as for the nation. As a further example, census materials provide data on the number of children ever born, marital status, family and household size, and family relationships within households for immigrant groups. Knowledge of time trends in these variables (not available from vital statistics nor in sufficient sample sizes from any national survey) provides a basis for estimating future fertility and family trends of particular population groups and, in this way, facilitate understanding of future national population growth and identification of the local social services and support needs that are likely to confront those population groups in the future. In similar fashion, census data support exploration of the relationship between health and functional abilities, on the one hand, and socioeconomic status on the other. Knowledge of that relationship provides further support for estimation of future service needs and costs. Advantages of PUMS Files Researchers are well aware of the general utility of the census Public Use Microdata Sample (PUMS) files. Research-manipulable data for large numbers of individual units—persons, families, or households—provide an enormously powerful tool in contemporary social science and have fueled and been fueled by new statistical and analytical methods. Where the large data files are of such scale and proven utility as the national censuses, the potential becomes great. By working with individual microdata files, researchers realize several gains over working with published tabulations. Maximum feasible comparability. Data in PUMS files can be recoded or regrouped from existing detailed codes. The existing detailed coding of occupational titles, for example, can be recoded into different groups of occupations, of the researcher's selection, for new research projects. Multivariate analysis. Research can undertake multivariate analysis to explore new hypotheses and models of such complexity as current techniques and sample size allow. Published tabulations usually include only a few variables in each table, with information sacrificed to effect groupings that highlight the data while still fitting on the page, and with combinations of variables that reflect a particular view of which associations among variables would be of general interest.1 Attain comparability in data organization. The most effective time-series analysis, whether the intervals are 1 year or 10 years, requires that the researcher be able to attain comparability in the organization of data and maintain

OCR for page 259
Modernizing the U.S. Census control over treatment of age. In general, there are three distinct forms of analysis that may be used in the study of trends with census data: Aggregate analysis of trends, for which the researcher seeks to attain maximum comparability on all variables. Age is primarily a defining attribute (as in specifying population of labor-force age or reproductive age). Comparisons may be made of single variables or patterns of relationships among variables at each census date. Between-cohort analysis comparing persons of the same age at successive times. If a cohort is defined as those persons born in a given time interval, this style of analysis permits comparison of successive cohorts on successive variables as educational attainment at age 25, or on patterns of relationship such as the number of children ever born at wife's age 45-49 years to a variety of social and economic attributes of wife and husband. Within-cohort analysis tracing life-cycle patterns. For persons born during 1940-1950, for example, measures of male and female labor-force participation may be examined for successive ages as the cohorts grow older—ages 20-30 years in the 1970 census, ages 30-40 years in the 1980 census, and ages 40-50 years in the 1990 census. By combining types (b) and (c), additional interpretative potential is gained. Life-cycle patterns and trends for different cohorts may be compared. Matching these to the specific dates of observation used for aggregate comparisons brings all three styles of analysis together for joint assessment of what are frequently identified as period, cohort, and age effects. Examples of Research It would be impossible to describe or list, for even a small portion, the many research applications that aggregated and PUMS census data support. It is possible to give a few examples as provided by various scholars (taken from descriptions in Rockwell and Austin, 1991). Aging of the Population The aging of the population is an increasingly common experience among industrial nations with lower fertility levels. The "baby boom" generation in the United States will enter late middle age at the beginning of the twenty-first century, transforming the society from a primarily young one (in the 1970s) to a predominantly old one. Heightened research attention is being given to examining the nature of the elderly population, and decennial census data are one of the chief tools used in such examinations. Researchers are increasingly aware of the heterogeneity of the elderly population. A chief topic pursued is the varied

OCR for page 259
Modernizing the U.S. Census retirement situations of older individuals. Retirement has come to be referenced, rather ambiguously, by both the sources of one's income and the extent of one's labor-force activity. Data from the 1990 census on labor-force activity and on income sources are used to distinguish alternative states of retirement among the elderly population, ranging from those who either never worked or previously worked and currently receive a retirement form of income, to those who work part or full time and receive a retirement form of income. Additional research questions concerning the elderly also require the type of geographic specificity presented by decennial census data. The nature and extent of geographic concentration of the elderly have important policy consequences for planning and for cost-effective service delivery. Knowledge of the spatial distribution of older individuals permits the testing of generalizations about ''unique" living conditions of the elderly that have already begun to break down our rather simplistic views about this segment of the population. Included in this nexus of factors are such 1990 census indicators as disability or severe health limitations, living arrangements and households composition, alternative sources of income, and the relative size of the older population groups in various locales. No single other source of research data contains the diversity of indicators or contextual richness necessary for examining the condition of the older population of the nation. Race Relations A major component of the study of race relations in the United States focuses on racial residential segregation. An extensive literature, based on research conducted over the past three decades, has demonstrated variation in the degree to which racial groups live in close proximity only to members of their own racial or ethnic group. The consequences of racial residential segregation, for both racial and ethnic minority groups as well as for society as a whole, are enormous, affecting the distribution of political power in cities, quality of education received, and the differential socialization of children and adults. Decennial census data contain the basic information used to measure racial residential segregation and to construct widely used indexes of racial concentrations and of contact with persons of different racial or ethnic groups. Researchers use the aggregated data from the 1990 census to measure this type of segregation and to compare results with similar data from the 1970 and 1980 censuses. Such measurements permit researchers to investigate the economic consequences of residential segregation, as well as to probe into racial attitudes and discriminatory practices that may promote different levels of racial segregation in various locales. While early results of investigations tend to show a decline in residential segregation since 1980, the pattern is uneven, with some ethnic groups exhibiting increases in segregated living areas over the past decade. A great deal more

OCR for page 259
Modernizing the U.S. Census research will be conducted on this topic, covering both metropolises and less urbanized areas across the whole country. Education Census data are essential for research on the process of education and for examination of the consequences of education on individuals' life chances. Educational researchers use information from the decennial censuses to control their other analyses and research results for demographic variables. Such investigations, for example, use small-area census data to explore the socioeconomic context within which individuals' educational attainment occurs. Attributes such as a neighborhood's median years of school completed, average income, and employment patterns have been found to have a profound impact on educational outcomes of persons residing in them. Among the important research investigations that are conducted with census data are studies of the effect of school desegregation on employment chances of members of minority groups and the degree to which school cohorts are segregated on factors such as racial composition, graduation rates, migration, income, and housing quality. Such studies of the importance of context on individual behavior and actions are greatly enhanced by using maps to display the geographic distribution of various demographic and economic characteristics of the population. Concentrated Poverty In the mid-1980s, analyses of census data revealed a large increase during the 1970s in the number of urban neighborhoods in which more than 40 percent of the households had incomes below the official poverty line. The number of people living in such areas also increased dramatically. A large proportion of these people were blacks living in a handful of the nation's largest cities. These rather straightforward tabulations attracted research attention to a set of issues that were emerging in social policy debates. Is there something about geographically concentrated poverty that is distinctive? Is there a "natural order of the inner-city ghetto" that changes the character of ghetto problems and requires different kinds of interventions? In the 1930s and the 1950s, national social policy included a major component directed toward slum clearance and urban renewal. In the 1960s and 1970s, these programs were less favored than efforts focused on general economic vitality. In the 1980s, an increasing number of social science articles and books raised new concerns about the urban poor, persistent poverty, the growth of an "underclass," and the neighborhood and community problems of the "disadvantaged." Researchers began to use a much richer array of census data in the 1980s, along with noncensus data keyed to census geography, to document levels and changes in a medley of indicators of undesirable social conditions. With the

OCR for page 259
Modernizing the U.S. Census availability of 1990 census data within the past 2 years, these earlier studies are now being replicated, updated, and expanded. One example of the kinds of questions now being researched is: Do poor single parents living in an urban poverty area have different behaviors and life prospects than poor single parents in other circumstances? Census data are essential to the analysis of such issues, although they are rarely sufficient in themselves. During the last two decades, much of the best and most innovative research on issues of poverty, race, and class was fueled from new national sample surveys such the Panel Survey on Income Dynamics (PSID), General Social Survey, Current Population Survey, Survey on Income and Program Participation, and National Longitudinal Survey. The rediscovery in the late 1980s of neighborhoods and slums required social scientists to turn some of their attention to other kinds of data. Because of the limited sample sizes and sample designs, sample surveys are relatively weak for the study of geographically concentrated phenomena. They can be strengthened by the incorporation of contextual data from the census; for example, the PSID has successfully linked its individual and family data to contextual information for census tracts and counties in which its sample families live. Only the decennial census offers the abundant detail needed for research that makes use of small areas throughout the entire nation. The 1990 census offers two dramatic innovations to facilitate this research and also to spur development of new modes of analysis and hence new ways of thinking about these issues. One innovation is delineation of small areas (census blocks) throughout the entire country, with enormous quantities of aggregate data available for each area. The second innovation is new technology for manipulating census spatial data and matching noncensus data to census geography: the TIGER system. Social scientists have rapidly increasing accessibility to powerful personal computer systems and work stations required for use of these new tools. Research that 30 years ago could only be done with one of the world's most powerful computers and with a team of skilled programmers and analysts can now be accomplished by a single researcher at a personal computer. Global Change One of the most pressing fundamental and applied research tasks of the present and coming decades concerns the interrelation between human and natural processes as they affect the natural environment and the quality—or the possibility for some areas in the long term—of human life. These are, of course, issues of global significance, but they must be addressed at the local, regional, and national levels as well. For research into social processes as they affect and are affected by the natural environment, the census provides data on population distribution and density, and limited data on land use, resource consumption, and certain kinds of effluent production. The census data thus allow research into the relation between human habitation, activities, and behavior on the one hand,

OCR for page 259
Modernizing the U.S. Census and environmental hazards and environmental deterioration on the other. The geographically comprehensive nature of the data and their geographic resolution allow research to be conducted at multiple levels and aids in assessing the environmental impact of specific localities on each other and on regional and larger areas. The availability of data from earlier censuses, as well as other sources, allows projections of environmental change and human activities and better identification of the patterns of human behavior that retard or accelerate adverse environmental change. The low level of geographic aggregation characteristic of census data combined with the availability of TIGER files contributes in important ways to the achievement of these goals. At present, the census is probably the only source of national data on social and economic processes that can be effectively linked through geographical information systems to satellite-produced remote sensing data measuring natural processes. In these terms, the census materials afford what may be at this time a unique opportunity to explore directly the relations between natural and human processes. DEVELOPMENT OF CENSUS MICRODATA FILES The incredible volume of raw data produced by census enumeration has compelled the Census Bureau to be a persistent innovator in techniques of mass data processing. Successive developments in punched card technology, machine-readable coding, and computers enabled the Census Bureau to keep up with the processing demands and to issue an ever-expanding shelf of publications. During the 1940s and 1950s there were occasional instances in which published tabulations were deemed inadequate and particular users made arrangements for special tabulations. During the past 30 years, there has been the development and expanded access to large census microdata files containing individual anonymous records. Development of PUMS Files With increased demand for access to census data, the Census Bureau in 1962 released a 1-in-1,000 public-use sample from the 1960 census basic record files (coded so as to preclude breach of confidentiality rules) to the community of researchers (academic, governmental, and business). This first public sample of individual census data provided researchers with their own census database and freedom to tabulate or manipulate without the constraints imposed by a fixed set of printed tables in bound volumes. Access by social scientists to large files of data at the individual and household level is a comparatively recent development of profound significance to social research. At the time of release of the 1960 0.1 percent sample, only a few research centers had access to both computers and programs to make use of the

OCR for page 259
Modernizing the U.S. Census computer tapes, and most researchers either struggled with punched card tabulation machines or neglected the new data resource. By the time of the 1970 census, computer technology had improved, accessibility of researchers to appropriate hardware and software had become more common, and the increasing statistical sophistication of social scientists led to a variety of analytical techniques that depended on computer data processing and analysis. The distinctions between survey research (defined traditionally to exclude the decennial census) and demographic research (traditionally based on work with limited sets of published cross-tabulations) became blurred, to the benefit of both traditions. The use of large census data files for social science research has increased steadily since 1962. Despite a series of initial difficulties with the 1960 0.1 percent public-use sample file, researchers soon became so accustomed to it that they ceased acknowledging the file's sponsors (the Census Bureau) and simply cited the 1960 1-in-1,000 census sample data. Several 1.0 percent public-use sample tapes from the 1970 census were released as routine census products, along with many other alternatives to the traditional bound sets of tables, and a 1.0 percent public-use sample from the 1960 census was subsequently produced by the Census Bureau in the 1970s, providing a large comparable sample for the 1960 and 1970 censuses. Development of Summary Tape Files Summary tape files are selected census items and cross-tabulation arranged by geography. The files contain information for states, places, and small areas down to the census block. The 1970 census was the first census for which summary tape files were released as regular products, in addition to the printed reports (tapes from the 1960 census that had been made solely to generate printed reports were made available on request; relatively few copies were released outside of the Census Bureau). Summary tape files from the 1970 and later censuses covered more subjects and provided more geographic detail than the reports. In 1980 and 1990, the summary tape files were expanded in geography and content (e.g., the long-form data in 1980 were made available for block groups in addition to census tracts) and, in some cases, replaced the published reports entirely (e.g., no block data were printed in 1980 or 1990, but they were available in the summary tape files). Historical Census Files As soon as the social science research community became familiar with the feasibility, flexibility, and virtues of the 1960 and 1970 public-use samples, the idea took hold of producing public-use samples from earlier censuses. Social historians were particularly interested in the nineteenth century censuses, and

OCR for page 259
Modernizing the U.S. Census were spurred by the public availability (without confidentiality restrictions) of microfilm copies of the original manuscript enumeration forms. Demographers and many other social scientists were particularly interested in the two censuses preceding 1960. The 1940 census was the first "modern" census—the first with a question on income and a wide range of other social and economic information, the first to use sampling (to collect long-form data), and the first to be designed and planned by a full-time professional staff that included social scientists. Starting in the 1960s, demographers began to express an interest at professional meetings in constructing public-use samples from the 1940 and 1950 censuses. By the late 1970s, the Census Bureau had prepared cost estimates and preliminary procedural plans for 1940 and 1950 public-use census samples. Eventually, with financial support from several federal agencies and strong encouragement from social science researchers, the first samples from historical censuses (1940 and 1950) were produced. During the past 20 years, work has continued on taking samples from much earlier censuses, from 1850 onward. Just recently, public-use samples have been completed of the 1850 and 1880 censuses (Ruggles, 1993). Work on the 1850 census manuscripts began in September 1992 to take a 1 percent sample of the free population of the United States.2 The 1850 enumeration was the first individual-level census in the United States and is therefore the first census for which a public-use microdata sample is possible. The 1850 census included questions on fertility, urban and rural residence, foreign-born status, occupation, and household relationship. The new public-use sample allows the construction of tabulations on a wide range of topics that are not covered in census publications or were incompletely tabulated. Availability of the 1850 PUMS files will provide an important baseline for historical studies. The availability of the 1850 census microdata will extend the current series of census PUMS data back to a period prior to the Civil War, filling in a critical gap in the study of long-term social change. The 1880 PUMS files is a 1 percent sample of the 1880 census. The 1880 census was innovative in a number of ways: it had improved completeness of coverage, enhanced accuracy of enumeration, and included a greater range and detail of questions. For the 1880 census, the supervision of enumerators was shifted from a part-time responsibility of U.S. Marshals to 150 census supervisors who were specifically appointed for the purpose. It was the first census to inquire about marital status, a critical variable for analysis of fertility and household arrangement. A question on relationship to head of family was added, which makes it possible to distinguish immediate family relatives from secondary individuals and allows construction of a wide variety of variables on family structure. Other valuable new questions in the 1880 census included birthplace of father and mother, condition of health, married within the past year, and number of months unemployed during the census year. The preliminary 1850 PUMS files are now available, including 177,000

OCR for page 259
Modernizing the U.S. Census individuals and representing 90 percent of the final sample. The penultimate version of the 1880 PUMS files are also now available, consisting of information on 503,000 individuals. Work for producing the 1850 and 1880 PUMS files was supported by grants from the National Science Foundation and the National Institute for Child Health and Human Development. At the same time that work has proceeded on censuses from the nineteenth century, researchers have worked with the Census Bureau to take samples from the remaining censuses of the twentieth century. With the recent availability of the 1850 and 1880 censuses, 11 census PUMS files are now available from 1850 to 1990 (1850, 1880, 1900, 1910, 1920, 1940, 1950, 1960, 1970, 1980, and 1990) offering a usable and accessible data series of 140 years of major changes in U.S. society. These historical files are our most important quantitative resource for the study of social change. Altogether, they provide individual-level data on 65 million Americans from the middle of the nineteenth century to near the beginning of the twenty-first. ACCESS TO CENSUS DATA Researchers have access to census data in two ways. One approach is to purchase census data directly from the Census Bureau. The Census Bureau releases census data in a variety of formats, including printed tabulations, computer tapes, floppy diskettes, and CD-ROMs. A second approach is to obtain data from one of several organizations that preprocess census data and release them in a form that is easier to analyze. The Inter-university Consortium for Political and Social Research (ICPSR), a nonprofit organization supported by about 360 member institutions (primarily colleges and universities) that is located at the University of Michigan, provides a common source of census data for researchers. Other sources of census data include state data centers, individual universities, and groups organized to provide data on a consortium basis. (The Association of Public Data Users, for example, purchased and recopied the 1980 and 1990 data for its members.) Role of the Inter-University Consortium for Political and Social Research All census PUMS and summary tape files are maintained at the Inter-university Consortium for Political and Social Research (ICPSR) and are inexpensive and accessible to researchers. Researchers affiliated with the 360 institutional members of ICPSR have access to census data (especially the PUMS data from the census from 1850 to 1990) without charge. Individuals at nonmember institutions also have access to census data for a nominal charge—well below the rate charged by the Census Bureau. In the event that an institution is unable to pay the charge, limited amounts of data are provided by ICPSR to individuals at nonmember institutions without charge.

OCR for page 259
Modernizing the U.S. Census The advantages of census data are well recognized. But an awareness of general potential is not a sufficient specification of benefits to weight in the balance against the rather high costs of producing PUMS files. During the past 30 years the specific benefits of the PUMS data have been demonstrated many times. Financial support has been given by the federal government, by state and local governments, and by research foundations for the preparation, distribution, and use of PUMS data to thousands of researchers and research agencies for several decades. To cite one example for 1990 PUMS data: the ICPSR acquired census data and provided training and assistance for access to the data. ICPSR contributed $270,000 of its membership fees to the project. Additional support was received from the Census Bureau, the National Science Foundation, and jointly from the National Institute of Child Health and Human Development and the National Institute on Aging, each of the three contributing about $250,000 individually. Underutilization In principle, the widespread research use of census data demonstrates that the continued collection and creation of public-use data files are in the public interest. A set of fundamental questions about the character of U.S. society and how it has been changing exists; those questions can receive better answers from social science researchers only if census data files are produced and reasonably accessible. The inventory of topics and questions in the census is, of course, not complete. The census can never be complete or provide the sole data set for research. Because social science, like all science, responds to new techniques and new data sources as well as to the continually cumulating body of prior work, forecasting developments is difficult. As a case in point, when the 0.1 percent sample file from the 1960 census was first released, the sponsors at the Census Bureau were disappointed by the slow rate of purchases and utilization. The subsequent experience has been one of extraordinary frequent and diverse use of the 1960 sample files and of the subsequent census public-use files that have been released. Still, there are difficulties in access to 1990 census public-use files. Census files are large and complicated. Researchers who try to use the large 5 percent PUMS files for the 1990 census find that they receive several dozen computer tapes, that the initial processing of the tapes requires trained computer programming staff, and that there is considerable cost and time required to ready the data for analysis. Online Access One of the most important recent developments in access to census data has

OCR for page 259
Modernizing the U.S. Census been the online availability of 1980 and 1990 1 percent PUMS data. The Consortium for International Earth Science Information Network (CIESIN), located in Saginaw, Michigan maintains both the 1980 and 1990 PUMS data on workstations. Users can dial the telephone number, on one of several lines, for the Saginaw workstations, access either the 1980 or 1990 data, request specific tabulations or descriptive statistics, and receive the requested data back in typically 5-8 seconds. CIESIN's computers are online continuously and allow users to obtain instantaneously tabulations for any population of interest. For example, a user might be interested in the labor-force status for Hispanic and black men and women, aged 18-24 years, who completed 4 years of high school and who live in a specific metropolitan area or a specific state. One might want to compare their unemployment rates to the national average or to compare the 1990 rates to 1980. CIESIN's workstation computer and online software allow researchers to obtain such tabulation almost instantly. The widespread and continuous use of census data on the CIESIN's computer is evidence that there is strong demand for even greater accessibility to census data. There are now more researchers using the CIESIN census data in a single day than used the 1960 PUMS data in the 1960s over the entire decade. The future trends seem to be the development of increasingly easier access to public-use census samples, probably with improved online systems and greater user-friendly CD-ROMs, for the 2000 census. NOTES 1   PUMS data support a great deal of multivariate analysis that can be tailored to specific research questions. Although the summary tape files with aggregated data may limit the particular cross-tabulations available, aggregated data have often been used for studies of residential settlement, especially residential segregation, based on the spatial analysis of particular social groups. 2   Individual microdata were not collected from the slave population in the 1850 census. REFERENCES Rockwell, R.C., and E.W. Austin 1991 1990 U.S. Census Data Project. Proposal submitted to the National Science Foundation. Inter-university Consortium for Political and Social Research, Ann Arbor, Mich. Ruggles, S. 1993 Historical Census Projects: Integrated Public Use Microdata Series. Social History Research Laboratory, University of Minnesota, Minneapolis.