Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
24 Committee on National Statistics Act (H.R. 109) to tighten restrictions on exchange of information in such fields as computer technology (Kolata, 19811. (3) It has also been proposed to have scientific work reviewed by federal agencies on a voluntary basis prior to publication. Such a voluntary review system is now in effect in the field of cryptanalysts. Although published unclassified data are exempt, researchers fear restraint of scholarly inquiry, and professional societies, among others, are objecting, since information presented at scientific meetings may not be exempt (Marshall, 1981). The conflicting pressures of national security and open science have recent- ly aroused much interest in He general press as well as in scientific circles. The National Academy of Sciences announced in March 1982 He appoint- ment of a broadly based panel of senior policy makers and researchers to ex- amine the relationship between university research and national security in light of the growing concern that foreign nations are gaining military advan- tages from American research. The panel's September 1982 report recom- mended guidelines that would allow government-funded, academically based scientific research to be performed without restriction, except for research in narrowly defined areas of technologies Hat could not justifiably be either clas- sified or completely open (Committee on Science, Engineering, and Public Policy, 1982~. In an assessment of policy developments 18 months after the panel report was issued, Wallerstein (1984) concluded that "the reach of res- trictions either proposed or in force go considerably beyond the panel's recommendation." Since then, the Department of Defense has indicated that it would not furler restrict publication of militarily sensitive but unclassified research: control of fundamental research in science and engineering at uni- versities and federal laboratories is to be achieved Cough classification. Some scientists fear, however, Hat more research will be classified (Goodwin, 19841. CONCLUSIONS AND RECOMMENDATIONS ". . . the best security for the fidelity of mankind is to make their interest coincide win their duty." Alexander Hamilton The Federalist Papers, No. 72 Most scientific advances are not solely the result of separate, individual ef- forts. As society turns to science win ever more problems, solutions are in- terdisciplinary and require the contribution of many investigators. At the same time, scientists are becoming more specialized. Sharing data can pro- vide opportunities for interdisciplinary approaches to problems and, even
Sharing Research Data 25 within the same discipline, the sometimes synergistic result of different peo- ple thinking about the same or similar problems. Because of the promise for eventual solutions to important problems, as well as the benefits of increased knowledge and understanding, society sup- ports science. Sharing data offers efficient use of research funds by allowing further discoveries to be recovered from data that have already been collected at great expense and that otherwise would not be used further. There are many other important benefits to science from sharing data. A primary one is that sharing data provides for further theones, methods, and results. Sharing data also tends to correct inadvertent error and to discourage fraud. But there are potential costs for an investigator who provides data to others: costs of time, money, and inconvenience; fears of possible criticism, whether justified or not; possible violations of trust by a breach of confidentiality; and forgoing recognition or profit from possible further discoveries. In some circumstances initial investigators are required to share data in ac- cordance win the rules of their employing institutions or the terms of their grants. In many cases, however, whether data are shared and the extent to which they are shared depend on the decisions of individual scientists. Professional societies, organizations that publish scholarly journals, research institutions, and foundations and over organizations that fund research can encourage, facilitate, and even reward the sharing of data, although they sel- dom prescribe Me behavior of individual scientists. These considerations led the Conarnittee on National Statistics to make the following general recommendations. Recommendation I . Sharing data should be a regular practice. The advantages of data sharing are sufficient to warrant considerable atten- tion to ways to share data without imperiling privacy or breaching We con- fidentiality promised to data providers. We share the views of Jowell (1981:141: Planers (1979, p. 307), in his definitive international survey of measures to en- hance the confidentiality of niicrodata, concludes that an "ultimate goal of public policy in every county should be to encourage custodians to disseminate data and researchers to use it." As long as the individual is adequately protected, wider ac- cess to data will surely serve rather than Greaten the interests of civil liberties and open government. The Committee recommends a number of guidelines for researchers, for funding agencies, for professional journals, for research training institutions, and for over participants in research Mat should facilitate and encourage shar- ing data for research purposes.
26 When to Share Data Committee on National Statistics Recommendations for Initial Investigators Data are collected in a variety of circumstances in controlled laboratory ex- periments, by observation in the field, through interviews, from accumula- tions of records, or by combinations of these methods. In some cases, data to which access is desired may have developed through one investigator's efforts and be entirely at his or her disposal to share. In other cases, the nature of the data, promises of confidentiality, laws or regulations, contractual require- ments, or proprietary rights may preclude or at least militate against shanng. In still other cases, raw data may be available to all (for example, from public records or from public-use tapes, which are samples of anonymous statistical data specifically designed for widespread research use), and the researcher's contribution may be in the compilation procedures and methods of analysis. In the latter instance, it is the edited and categorized data, an explanation of the analytical methods used, and documentation of how the data were handled to which access may be requested. Analyzing data and reporting discoveries are clearly more glamorous tasks to many scientists than collecting data. The motivation of possible discover- ies is needed even to contemplate data collection, and science is served well by this motivation. Thus, initial investigators are entitled to be the first to ex- amine, summarize, and analyze their data. There may, however, be excep- tions, for example, when data collection is a joint effort or when public funds are used to pay for data collection with the intent that He data be available to many in a timely manner. Although scientists surely deserve, in most cases, first claim to data compiled under their direction, Be practice of withholding data until all possible analyses are exhausted is unnecessarily resmctive and too self-serving to advance science. A balance is needed. Recommendation 2. Investigators should share their data by the time of publication of initial major results of analyses of the data except in compelling circumstances. It should also be noted that, if data are made available when the results of research are submitted for publication, the submitted manuscript can be more carefully and more fully reviewed. The benefits of sharing data appreciably increase upon publication, since other researchers can then test the same and other theories and methods. We encourage researchers to make every effort to share data as soon as it is feasible.
Sharing Research Data Data Relevant to Public Policy 27 Scientists have a special responsibility to share data as quickly and as widely as possible when the data are or will become relevant to public policy. Withholding such data risks the use of wrong results or of ineffective analysis of important issues. Recommendation 3. Data relevant to public policy should be shared as quickly aM widely as possible. This recommendation is not intended to support the public release of ana- lyses prior to appropriate review. Planning for Data Sharu~gas Part of Research Researchers can more effectively share data if they keep Mat objective in mind in all stages of their research. Planning to share data from the outset not only helps achieve the goal of data sharing but also may improve the quality of the research. For example, adequate documentation of data helps initial investi- gators as well as subsequent analysts. Data files should include the unedited raw data as well and documentation on edits, handling of nonresponse, and similar problems (see Straf, 1981; Madow et al., 19831. Not all data can be shared in a situation in which confidentiality must be preserved. For example, photographs, oral histories, detailed notes on inter- views of well-known people, and some types of proprietary information are data Mat could not be shared if confidentiality is to be maintained. Some per- sons or organizations may be unique or come from such a small group Mat it may be impossible to share data and not identify them. There are, however, ways to share many types of data and still maintain confidentiality (see Campbell et al., 1975~. Recommendation 4. Plans for data sharing should be an integral part of a research plan whenever data sharing is feasible. Researchers might benefit by first considering whether Hey could be subse- quent analysts: data might already have been collected that are sufficiently useful to warrant forgoing new data collection.
28 Committee on National Statistics Keeping Data A bailable Part of a research plan should include maintaining the data for a reasonable period following the completion of research for possible use by subsequent analysts. Some data collections may be small or so specialized that only lim- ited use by others can be expected, and the initial investigator can handle re- quests without undue burden. Other data sets may be of such general purpose and in such demand over a considerable period that the initial investigator may find it difficult or impossible to handle the requests of subsequent analysts. Particularly in the latter case, researchers might consider submitting data to an appropriate archive that not only would assume responsiblity for much of the handling of data to be shared, but also would encourage fisher use of the data by lounging them to the attention of a wider community of researchers. Cataloging of machine-readable data fees and citing such data in a standard way (Dodd, 1982) would also encourage further use. Recommendation 5. Investigators should keep data available for a rea- sonable period after publication of resultsfrom analyses of the data. Recommendations for Subsequent Analysts It is neither practical nor equitable to expect initial investigators to pay all costs of transfemng their data to others. It is reasonable to expect subsequent analysts to reimburse animal investigators at least for the extra costs involved in data transfer. Recommendation 6. Subsequent analysts who request data from others should bear the associated incremental costs. Recommendation 7. Subsequent analysts should endeavor to keep the burdens of data sharing on initial investigators to a minimum and explic- itly acknowledge the contribution of the initial ir~vesiigators. Explicit acknowledgment of the initial investigators and their contributions would encourage data sharing. Subsequent analysts who discover eITors in data should inform the data col- lectors or the appropriate archive so Mat the data may be corrected for We use of others. Cnticism of a data collection or analysis should be made in a pro- fessional manner. With few exceptions, it is desirable that subsequent ana- lysts also inform initial investigators or data archives promptly of the results of new analyses, even those that are unrelated to We original analysis. This
Sharing Research Data scientific courtesy may also help to avoid future duplications of efforts. Recommendations to Institutions that Fund Research 29 A scientist is recognized and rewarded through the scientific community and its institutions. Researchers will have greater incentive to share data if the community and its institutions foster the idea that the practice advances science and is part of what is recognized as necessary and proper scientific be- havior. We suggest that foundations, federal agencies, and other organiza- tions that fund research provide encouragement and rewards for sharing data. In many instances, funding organizations would be justified in requiring that data be shared. Government funding agencies, in particular, should re- quire applicants to guarantee data sharing or to justify explicitly in their pro- posals why sharing would be inappropriate. Unless data sharing is a condi- tion of a grant or contract whether of public or private funds applicants who have budgeted to share are at a disadvantage when costs are compared with the budgets of those who have not. If plans to share data are given as much weight as the sample design, meth- ods of analysis, and over aspects of proposed research in deciding on an award, researchers would then plan for sharing data at an early stage. A re- searcher might request funds to make important data available to others. In any case, he or she could be encouraged to describe in the application how the content and structure of the data would be documented, how invitations for subsequent analysis would be extended, and how requests for data could be honored at minimal cost. The referees of the research proposal could judge the importance of support for making the data available to others. For research projects involving large data sets, investigators could request funds for a person with responsibility to document data files; update and cor- rect data entries; produce data files for those who request them; consult with users on interpretations, limitations, and other important aspects of He data; and preserve the confidentiality of respondents. Even for small data sets, however, a funding organization Mat encourages reasonable standards for documentation will aid not only subsequent analysts, but also the initial inves- tigators. Funding organizations that require, in rules or by contracts, unnecessarily excessive protection of privacy and confidentiality hinder the sharing of data. Society benefits from the accessibility of data as well as from the protection of privacy and confidentiality. A reasonable balance between these often con- flicting values cannot be achieved by exclusive attention to one. When funding agencies anticipate that research results will be directly rele- vant to public policy, the agencies should be alert to the need for sharing data so Rat conclusions can be verified or contested through reanalysis. Federal
30 Committee on National Statistics funding organizations can ensure the availability of data for such uses by in- cluding in original contracts or grants a requirement that, on completion of re- search, data will be delivered to the sponsoring agency. The data would then be subject to the Freedom of Information Act. Recommendation 8. Funding organizations should encourage data sharing by careful consideration and review of plans to do so in applica- tionsfor research funds. Initial investigators whose data sets prove to be of wide interest to subse- quent analysts may not be in a position to manage and disseminate data to many others for a long time. Even if initial investigators are paid for the addi- tional time and other costs involved, sharing data may impinge too severely on other scientific activities. Intermediate research archives have been deve- loped in some fields to meet this problem (see Clubb, in this volume, for more details). Organizations funding large data collections that are expected or lat- er found to be of considerable general interest should be alert to this problem. If existing data archives are not suitable or are inadequately funded, funding organizations should consider supporting appropn ate ones. Recommendation 9. Organizations funding large-scale, general- purpose data sets should be alert to the need for data archives and con- sider encouraging such archives where a significant need is not now be- ing met. Recommendations to Editors of Scientific Journals The editorial policies of scientific journals have a significant effect on scien- tific practice, since the publication of research results in respected, refereed journals is one of the principal rewards of scientific research. Journal editors should adopt editorial policies designed to encourage data sharing. Providing Access to Data for Peer Review Access to data during Me review process, a practice already in use by some journals, provides reviewers an opportunity to replicate the analysis and dis- cover possible errors. Reviewers can use alternate assumptions or analytic models to test the robustness of authors' conclusions. Recommendation 10. Journal editors should require authors to pro- vide access to data during the peer review process.
Sharing Research Data Publishing Reanalyses and Secondary Analyses 31 If researchers know that reports of replications, whether confimnatory or not, and of secondary analyses will be welcomed under journal editorial policies, such research would be encouraged. Recommendation 11. Journals should give more emphasis to reports of secondary analyses and to replications. Giving appropriate credit to data collectors should serve to encourage oth- ers to share data as a matter of good scientific practice. Criticism of the on- ginal data collection should be factual, temperate, and made in the light of reasonable standards of data collection. Recommendation 12. Journals should require full credit and appropri- ate citations to original data collections in reports based on secondary analyses. Encouraging Accessibility to Data It should be standard practice for small data sets to be published with the re- search reports Cat use them. For larger sets, the availability might be an- nounced in the research report win an explanation of where the data may be obtained: from the journal editor, from an intermediate archive, from the on- ginal investigator, or elsewhere. Recommendation 13. make detailed data accessible to other researchers. Journals should strongly encourage authors to Recommendations to Other Institutions Other participants in the scientific research process can promote data sharing. Academic institutions can exercise leadership in encouraging data sharing both in training future scientists and by example. Professional associations can also play a part, as can funding agencies and archives. Providing Training for Sharing Data Instruction and training on data-sharing policies and practices should be in- cluded in the education of many research scientists. Professional societies might organize meeting sessions or workshops on data sharing. The techni-
32 Committee on National Statistics Cal aspects of data shanng, especially documentation and archiving methods, should be taught in specialized courses either as a part of academic curricula or in continuing education programs. Instruction in data sharing should also include how to find and adapt existing data for research (Myers and Rockwell, 1984) and how to prepare data for secondary analysis (Fortune and McBee, 1984~. In some disciplines, emphasis on sharing data could be a recognized part of graduate training. Recommer~tion 14. Opportunities to provide training on data shar- ing principles and practices should be pursued arm expanded. Researchers should be encouraged to use data collected by others for schol- arly research when appropriate. Actual data should be used in teaching whenever practical, a practice that depends on data being shared. Reference Service for Soc~al Sc~ence Data A centralized reference service for computer-readable social science data would promote Me use of data already collected. A start can be made with existing archives and with some federal statistical agencies. The Social Science Research Council (1983) has recently issued a compendium of brief descriptions of about 100 national data bases available for use in social science research. By allowing sufficient funds for adequate documentation of original studies and by funding research based on the use of shared data, funding agencies could foster Me grown and efficient use of such a service. The National Science Foundation might take a leading role in promoting it. Recorr~nendation 15. A comprehensive reference service for computer-readable social science data should be developed. Providing Recognition for Date Sba~g The scientific reward structure could be strengthened to achieve more sharing of data and more innovative subsequent analyses. In addition to our recom- mendations to journal editors, we suggest Rat academic inshtudons encour- age data sharing by granting appropriate professional recognition to We data- shanng activities of teaching and research staff members in such matters as salary and promotion policies. Recommendation 16. institutions and organizations through which scientists are rewarded should recognize the contributions of appropri- ate data-sharing practices.