Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Issues and Recommendations INTRODUCTION Data are the building blocks of empirical research, whether in the behavioral, social, biological, or physical sciences. To understand fully and extend the work of others, researchers often require access to the data on which that work is based. Yet many members of the scientific community are reluctant or un- willing to share their data even after publication of analyses of them. Sometimes this unwillingness results from the conditions under which data were gathered; sometimes it results from a desire to carry out further analyses before others do; and sometimes it results from the anticipated costs, in time or money or both. She Committee on National Statistics believes blat sharing scientific data with colleagues reinforces the practice of open scientific inquiry. Cognizant of the often substantial costs to the original investigator for sharing data, the committee seeks to foster attitudes and practices within the scientific com- munity that encourage researchers to share data with others as much as feasi- ble. Some examples illustrate the benefits, problems, controversies, and other consequences of sharing research data. 3
4 Committee on National Statistics Reanalysis of shared data may lead to a conflicting conclusion. Because an original investigator published his raw data on measurements of human cra- nial capacity by race and described his procedures and methods of sllmmariza- tion, reanalysis of the data was possible. A reanalysis more Man 120 years later overturned the original inves~gator's conclusions (Gould, 19781. Confidentiality may be breeched by legally imposing sharing data. Despite promises of confidentiality to respondents, researchers may be in jeopardy of arrest if police or We courts request or demand data. A study headed by James Carroll at Syracuse University on the confidentiality of social science research sources and data identified many such cases (Carroll and Knerr, 1975~; one was the Office of Economic Opportunitr's New Jersey negative income tax experiment, in which a local prosecutor issued 14 subpoenas re- questing We names of welfare families receiving excess payments (Kershaw end Fair, 1976~. When data are not shared, an investigator's results may have a greater in- fluence on public policy than if the data are analyzed by others. An economist prepared a paper on the deterrent effect of capital punishment, in which he concluded that one execution prevents eight murders. A draft version of dais paper was used by the Solicitor General of the United States as an appendix to the government's pro-capital punishment brief in a case before the Supreme Court. Detailed data were not available for reanalysis. Other researchers have now assembled what are believed to be virtually identical data sets, and many analysts believe the data do not support the deterrence hypothesis. Marketing of biomedical research militates against data sharing. Several university researchers have refused to share with colleagues the exact details of how they did experiments that were reported in papers submitted for publi- cation because such details might compromise the profit-~ing potential of their work. Sharing proprietary data may be forbidden by the originator of the delta. A distinguished professor of business is carrying out research based on data from a firm that not only does not want others to see the data, but is not even willing to be identified. The professor considers We research useful, but is disturbed because We conditions under which he obtained We data preclude the possibility of anyone verifying his statistical analyses. These and other situations fuel an ongoing debate in We research communi ty on what are appropriate principles and practices of data sharing.
Sharing Research Data s Issues in Data Sharing The Committee on National Statistics convened a conference on sharing so- cial science research data in October 1979, chaired by Clifford Mildred (see Committee on National Statistics, 1980; see the appendix for a list of partici- pants). The participants were in substantial agreement regarding the exigen- cies faced by social science researchers and how these often conflicted with goals of greater access to data. The issues they considered included whether there is ever justification for refusing or unduly postponing access to data; the impact on data access of data collectors' responsibility for maintaining the pri- vacy of respondents and the confidentiality of records; the professional re- sponsibility of researchers to promote access; and procedures under which ba- sic data should be released to others. The conference participants presented the Committee on National Statistics with the following conclusions: 1. Guidelines on data sharing need to be developed. Desirable practices may vary with Me source of the data and whether the research is publicly or privately funded. 2. A variety of institutions could be helpful in promulgating guidelines for desirable practices. The institutions include professional associations and their journals, consortia for data archiving, and foundations and other organi- zations Mat fund research. 3. Government policy on access to data is important. Much social science research relies heavily on data provided by the government directly or in- directly through grants and contracts for research. 4. Many problems of access to data in the natural sciences are sinular to those in the social sciences. 5. Standards for classifying, documenting, and archiving data would great- ly facilitate access to data. In response to the conclusions of We conference, this report suggests guide- lines for appropriate sharing of data and how government agencies and other institutions can encourage and foster such sharing of data. Scope of the Report The exploratory conference focused on Me sharing of social science research data. Most people believe Mat natural scientists have fewer problems in shar- ing data than do social scientists. The need for shared data may be less acute for natural science experiments, which usually are replicable a situation that occurs more rarely in Me social sciences. Nonetheless, data-sharing prob-
6 Committee on National Statistics lems have existed in the natural sciences that are really not much different from those in We social sciences, such as instances in which only some obser- vations are reported rather Man all. Selective reporting of experimental results In Me physical sciences is not uncommon. For example, Millikan's 1910 Science paper on the oil drop ex- periment (see Holton, 1978) gave results based on 27 observations, although 40 observations were available; the most extreme 13 values were dropped. Similarly, in a 1919 report to Me Astronomical and Royal Societies on expedi- tions to test predictions of Einstein's general theory of relativity, Eddington chose not to mention the results of one complete set of measurements that pro- duced a value for the deflection of starlight consistent with the Newtonian, rather Man the Einstein, prediction (see Eastman and Glymour, 19801. Some data-sharing problems in the biomedical sciences are also similar to those in the social sciences: for example, problems associated with large- scale, controlled clinical trials closely resemble those associated win large- scale social surveys. For these reasons, and because of the interests of the Committee on National Statistics in areas such as clinical trials, public health, and environmental monitoring, this report looks beyond the social sciences and addresses the issues of data sharing more broadly. The emphasis of He report remains on problems and practices in He social and behavioral sciences, but occasional links and parallels to the natural and biomedical sciences are identified and pursued. This report specifically does not address two kinds of research. The first is research with nonquantitative data. Researchers often depend on materials other than quantitative information, such as anthropological field notes, oral histories, photographs, or videotape records. Problems of access to research archives In university libraries have occurred (see, for example, Halberstadt, 19821. Although such materials are research data, the principles and prac- tices recommended in this report are not intended to cover them, primarily be- cause their consideration was beyond the resources of the committee. It does not mean, however, that access to such research materials is not important or that this report may not help in clarifying relevant issues. The second kind of research not specifically addressed is research pertain- ing to national security matters. Recently the National Security Agency has requested that some scientists who are not employed by He government sub- mit their papers on He mathematical theory of codes to He agency for review prior to publication. The purpose of such reviews is to prevent the publica- tion of information damaging to national security. One government spokes- man has proposed Hat reviews be extended to fields such as computer hard- ware and software and crop projections (Hilts, 1982a, 1982b). Although pri- or review militates against free and open research, He Committee believed that to recommend guidelines for such review was beyond its scope. This re-
Sharing Research Data port, however, notes the existence of such pressure affecting the environment in which data sharing occurs. The sharing of research data occurs in many ways. Sometimes data are pu- blished as appendices to papers and books. Sometimes data are made avail- able in response to requests from other investigators. More formal methods for exchange often involve archives and data libraries, which may be particu- larly appropriate for the massive data files from surveys and experiments. Careful documentation is important to facilitate data sharing. Poor documen- tation or its absence inhibits replication and thereby allows some researchers to make bolder claims than they otherwise might. This report pays special at- tention to the needs for and costs of good documentation, but the formal tech- nical aspects of data archives and the documentation required to make data of use to others are not covered. The principles and guidelines for data sharing in this report are addressed not only to researchers in academia and government but also to institutions that provide funds for research. Over the past 20 years, government agencies and private and public foundations have underwritten social science research to collect and analyze substantial bodies of data. Social science data col- lected by the government in particular have been analyzed extensively by many researchers. This report, however, does not treat the special case of transfer of large data sets usually general-purpose statistics or data from ad- ministrative records among different agencies of the federal government, al- though many of the findings and suggestions in the report may be applicable. Such transfers were not included in the scope of this study because they are governed by specific statutes and regulations. This report summarizes some of the benefits and costs of sharing research data with qualitative statements based on judgment that is bolstered by anec- dotal evidence. Although quantitative estimates of benefits and costs are highly desirable, the committee unfortunately did not have the time or re- sources for assembling such estimates. Quantitative estimates of the benefits of data sharing are related to an assessment of the benefits of data generally, an issue that the committee has been and will continue exploring (National Research Council, 1976; Committee on National Statistics, 1980~. Parties to Scientific Research Many different parties are involved in or affected by scientific research, from the initial investigator to the public. These parties have different, sometimes conflicting interests. Initial investigators scientists who first collect data for analysis. These scientists may work alone or in teams and in academic, commercial, nonprof- it, or government settings. They have an interest in being the first to examine
8 i] Committee on National Statistics and analyze their data and to publish results of their research. Subsequent analysts scientists who analyze one or more data sets col- lected by others, for purposes of verification of the original analysis as well as for analysis of new problems.) These scientists have an interest in obtaining data of others for analysis. Scientif c comn~unity ill scientists who engage in research. Their interest ~ the advancement of science through new knowledge is promoted by the sharing of data. Agencies and four~atior~s that fund research public and private groups that give grants or contracts for research to be performed by others. Their interest is in advancing science rawer than in commercial gain. Organizations that conduct research universities, nonprofit institutions, commercial organizations (such as biophannaceutical concerns), individuals, and government agencies that conduct research, whether they use their own funds or are supported by others. Their interest in shanug data can be those of initial investigators, subsequent analysts, the scientific community, or any combination of ~em. Respondents to surveys and participants in experiments those who agree to participate in a survey or experiment, whether voluntarily or whether they receive remuneration or other direct benefit. Respondents have an interest in Me protection of We confidentiality of information they have given, in lirnit- ing the invasion of Weir privacy, in reducing their time and effort required to participate in surveys and experiments, as well as in the advancement of science resulting from such investment of time and effort. The public society generally. The public interest is served by open, free, productive, and efficient science. The different parties involved in or affected by scientific research have differ- ent and sometimes conflicting interests when it comes to issues of data shar- ing. The report and We papers in this volume address He interests of these groups, and many of Be committee's recommendations reflect a balancing of conflicting interests. Occasionally in the report and frequently in the papers, cases are mentioned in which data were shared or in which unsuccessful attempts were made to ob- tain data from pnocipal investigators. These cases are included to illustrate various aspects of data sharing He benefits, the costs, He bamers. The cases are not included to assess blame on particular principal investigators or over parties. Sometimes an incomplete account is given; sometimes the 'By this definition, subsequent analysts include secondary analysts. A definition of second- ary analysis is provided by Hyman (1972:1): "extraction of knowledge on topics other than those which were the focus of the ongmal surveys."