Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Sharing Research Data 9 same occurrence is treated in more than one paper and from varying perspec- iives. As this report frequently points out, different participants in the re- search process have different and sometimes conflicting interests. Even the same individual may view data sharing differently at different times, depend- ing on whether he or she is acting as a primary investigator or a subsequent analyst or, for example, whether the issue is the completion of a research pro- ject or the protection of respondent pnvacy. BENEFITS OF DATA SHARING That sharing data has benefits is manifestly clear and widely accepted. But a brief recounting of its benefits is useful, in particular in weighing them against costs. This section presents a brief summary of some of the major benefits. A variety of terms are used here in connection with the sharing of data.2 A reanalysis studies the same problem as that investigated by the initial investi- gator; He same data base as that used by the initial investigator may or may not be used. If different, independently collected data are used to study the same problem, the reanalysis is called a replication. If the same data are used, the reanalysis is called a verification. In a secondary analysis, data col- lected to study one set of problems are used to study a different problem. Secondary analysis frequently, but not necessanly, depends on the use of multipurpose data sets. Data sharing is essential for all verifications and all secondary analyses; it may or may not be involved in replications. Reinforcement of Open Scientific Inquiry If all science were conducted according to an ideal, referred to by Robert Merton (1973) as the '`ethos of science," then scientific findings would be made available to the entire scientific community. Since the purpose of this availability is to allow others to assess the merits of the research, the need for careful description of study procedures is implicit. We believe ~at, in addi- tion, the availability of the data for scrutiny and reanalysis should be part of the presentation of results. In the past, among Be best investigators and with a journal practice open to extensive description, providing data was an honored tradition. Cavendish's classic paper on the density of the earn is a prime example (Cavendish, 17981. Scientific inquiry must be open, and sharing data serves to make it so. Disputes among scientists are common; without the availability of data, the 2The committee acknowledges He assistance of H.H. Hyman on terminology pertaining to data sharing.
10 Committee on National Statistics diversity of analyses and conclusions is inhibited, and scientific understand- ing and progress are impeded. Verification, Refutation, or Refinement of Original Results When data are shared, they may be used in reanalyses that provide a direct check on reported results. In addition, supplementary or alternative analyses can be done to determine whether conclusions are robust to venous assump- tions. This type of verification can work to bolster the findings of the initial investigator. An attempted reanalysis, however, may expose errors or incon- sistencies in the data Mat cast doubt on the validity of the findings. The latter was the case in the research of Ehrlich (1975) on the deterrent effect of capital punishment: several other investigators (Bowers and Pierce, 1975; Passell and Taylor, 1977; Klein, Forst, and Filatov, 1978; Brier and Fienberg, 1980) subsequently pointed out shortcomings in Ehrlich's analyses.3 Refinement of original results is also a possible outcome of data shanng. Alternative analyses can lead to better adjustment for background variables and to stronger inferences of effects of Reagents In experimental or quasi- experimental studies. Promotion of New Research Through Existing Data Another form of reanalysis is testing the generality of research findings (see, for example, Smith and Rowe, 19791. Investigators need to compare ana- lyses on different data sets across time or across locations" in order to gen- eralize findings about social phenomena. Existing data from several sources may be reexamined from a cross-temporal or international perspective. Treiman (1977:xvi), for example, examined 85 occupational prestige studies from 53 countries and concluded Rat occupational evaluations are fundarnen- tally the same throughout the world: he contended that "now, and for the fore- seeable future, wide ranging secondary analysis of existing data is the only way we will have of achieving a valid comparative sociology." The same data that were gathered by researchers to answer one set of ques- hons can be used by others to answer a new set. This utility especially ap- plies to large-scale data collection. Mason, Taeuber, and Winsborough (1977) summarized ideas of several social scientists for new research based on public-use samples from Me 1940 and 1950 censuses and from Me Current Population Surveys since 1960. 3The data for Ehrlich's research were shared in only one known instance; others had to recon- struct ~em.
Sharing Research Data 11 Sometimes several different data files can be linked to create a new en- larged data base that allows researchers to develop and test new theories. For example, Albert Reiss, Jr., of Yale University, merged the quarterly collec- tion tapes from the National Cnme Survey to provide longitudinal information on victimization over several years. This new longitudinal data base allowed Reiss (1980) and Eddy, Fienberg, and Griffin (1981) to develop new models and analyses of criminal victimization that may improve data collection and reporting. Encouraging More Appropriate Use of Empirical Data in Policy Formulation and Evaluation In policy settings, the models and methods of analysis used for data are often shaped and structured by expectations associated with particular advocacy po- sitions. When errors or incomplete analyses lead to policy conclusions that agree with those expected, the errors may go undetected, and the analyses re- main incomplete. In an evaluation of programs for chronic juvenile offend- ers, Murray and Cox (1979) reported a large 'suppression effect" of criminal behavior that results from incarceration. Their analyses purported to control for alternative explanations of this effect, such as mortality, maturation, and regression. Long before the report was published, it was used to support le- gislative changes in treatment of juvenile offenders in Illinois and other states. Based on a reanalysis of the basic data, which was commissioned by the National Institute for Juvenile Justice and Delinquency Prevention, other re- searchers claimed that the original analyses were faulty and the observed ef- fect could be attributed to other causes. Still others argued that the original and alternative analyses were flawed and that the basic data were of low quali- ty and unsuitable as the basis for a policy decision. If data sharing were antic- ipated, researchers would have greater motivation to plan studies carefully to avoid possible rejection of their data or analyses. Some program evaluation experts have suggested that statistical analyses be earned out by independent teams of evaluators before a program evaluation report is prepared. Alternative analyses may not only confimn findings of the intial evaluators but also detect effects not found by them. The practice, of course, requires data sharing before publication. We believe that such inde- pendent reanalyses should be common practice, especially when important public policies may be affected. Alternatives to complete analyses conducted independently are critical re- views of the analyses of the original investigator by other experts who have access to Me data. An example is a review of Me statistical methodology of Me draft report, Public and Private Schools, by James Coleman et al. The Committee on National Statistics convened a meeting of experts to advise
12 Committee on National Statistics Coleman on Me strengths and adequacy of the sample and Me analytical meth- ods used for inferences in the report and to suggest farther analysis and in- terpretation of the data (Straf, 1981~. Coleman found the experience valuable and suggested that the Committee consider institutional procedures for review of reports relevant to public policy before Hey are publicly released. Improvements of Measurement and Data Collection Methods When Me methods of data collection as well as the data from empirical inves- tigations are scn~tinized by scientists other than the original investigators, suggestions for improved measurement and collection methods often follow. For example, Turner and Krause (1978) compared allegedly equivalent mea- surements of public confidence in national institutions made by two survey or- ganizations and found substantial discrepancies in levels of reported con- fidence and changes over time. Selected analyses of the data suggested that Me differences were due not to technical aspects of He sample design, but probably to the result of differences in measurement techniques, questionnaire design, or field procedures. Longitudinal studies have benefited from suggestions made by subsequent analysts. Recommendations from scientists who reanalyzed data from He National Crime Survey are pardy responsible for current plans to redesign He survey. Two more examples are He national longitudinal surveys of labor force behavior, which is conducted by the Census Bureau for He Department of Labor and planned and analyzed by the Center for Human Resource Research at Ohio State University, and He various waves of interviewing for He negative income tax experiments undertaken in He late 1960s. In these Tree surveys, early availability of public-use tapes was planned, and com- ments and suggestions by over analysts were encouraged. The sharing of re- search data increases He likelihood of suggestions for improvements. This feedback is of special value in continuing surveys, whether cross-sectional or longitudinal. Development of Theoretical Knowledge and Knowledge of Analytic Technique Wider data sharing win better documentation of data sets should contribute to better theories and analytic techniques. Ideas for constructively changing or refining concepts and methods would be obtained sooner and more.frequent- ly, and He interplay between theories and data would be stimulated if well- documented observations were generally at hand. Some of these possibilities are illustrated in teals performed by Hildred and Lu (1960) on 17 data sets Hat had been used by earlier authors to estimate
Sharing Research Data 13 demand relations. A technique to allow for first-order serially correlated dis- turbances was applied to relations previously estimated by a least-squares fit. The results offered useful evidence of the importance of serial correlation, of the possibility of negative serial correlations, and of the inadequacy of rou- tinely using first differences or trends; they also suggested the possibility of higher-order correlations in some cases. Applying new theories to existing data may lead not only to new knowledge but also to improvements in future data collections. When existing data sets are not adequate for applying and testing new theories, the theories may sug- gest what kinds of data sets would be more useful. Wider data sharing com- bined with existing and developing computer technology creates opportunities for comparing results of various techniques on given data as well as results of a given technique on various data. With wider data sharing, more could be learned and in a more timely fashion (Hymen, 1972, 19751. Encouragement of Multiple Perspectives if When data bearing on a variety of topics are generally available and well doc- umented, researchers may find information important to their inquiries in data obtained by researchers in other disciplines. Using data from another disci- pline often proves to be stimulating, especially when it leads to direct contacts between the researchers involved, and significant influences on one field from another can be expected. Users of previously collected data need to know more than just the mechan- ~cs of how information was gathered and processed. The concepts that the collectors tried to quantify and the relevant assumptions underlying their ~n- terpretations are important to users in judging the appropriateness of data for their purposes. Insofar as it is practical, these matters should be explained in Me documentation. Documentation, however, will not always be sufficient for this purpose, and a potential subsequent analyst may need to consult with Dose who collected the data or other scientists in the same discipline. The subsequent analyst may then learn some alternative viewpoints and appro- aches of Me over discipline. Initial investigators also have an interest in the results of secondary analysis of their data. When some of this analysis involves scientists from other disci- plines, useful stimulation and exchanges of conceptual frameworks and tech- niques across fields can result. Provision of Resources for Training ~ Research The availability of a variety of carefully documented data sets can be a great asset to research training. Data on real phenomena provide interesting exam-
14 Committee on National Statistics pies from which students can learn in two ways. First, the process of col- lecting the data can be studied with regard to accuracy, relevance to policy or scientific questions, and efficiency of design. Second, the data can be used as exercises in applying different analytic techniques, in drawing inferences, and in encouraging original approaches to analyses. Multiple use of data sets can clearly reduce the number of data collections that are undertaken, saving the time and effort of respondents who furnish in- formation as well as the time and money of researchers who gather it. In much social science research, expenses for data collection are He predominant research cost. Avoiding such expenses allows research funds to go further. Even when new data are needed, review of existing data and pre- liminary analyses may make for a more efficient collection plan. Protection Against Faulty Data One of the worst frustrations of scientists and decision makers is caused by a revelation or strong suspicion that information that was presumed correct and on which results, recommendations, or decisions were based is faulty. Reactions are particularly bitter when willful fabrication, falsification, or dis- tortion of data is involved. She whole basis for applying knowledge and careful inquiry to decision making is negated. The waste of professional re- sources is serious, but Be consequences of false conclusions or damaging de- cisions may be much worse. People may be hurt by misguided actions, and differences of opinion on public questions may be acerbated. Public con- fidence in the research community will almost certainly be diminished. Data sharing cannot eliminate these problems, but it could provide a defi- nite, perhaps song, preventive influence. Faulty data, whether fraudulent or due to inept collection or processing, are much more likely to be detected if studied by more than one analyst. If several data sets relating to closely re- lated phenomena can be compared, unexpected or unreasonable discrepancies should lead to careful reexarninations. The expectation Cat further analyses and comparisons will be conducted should discourage dishonest manipula- tions. More important, such expectation would encourage greater care in die original analysis. Climate in Which Scientific Research Confronts Decision Making The principal benefits Cat would result from wider data sharing are Rat science would be more efficiently advanced and more effectively applied to making decisions. Wider data sharing must, however, be carefully deve- loped. Feasible arrangements for data sharing might lead to many improve- ments. Our discussions with a number of scientists and administrators indi-