Summary

Advances in digital computing, communications, sensors, and storage technologies are revolutionizing nearly every area of scientific, engineering, and medical research. Today, researchers are employing sophisticated technologies to generate, analyze, and share data to address questions that were unapproachable just a few years ago. They are carrying out detailed simulations to guide theoretical approaches and to validate new experimental approaches. They are working in interdisciplinary and often international teams on complex integrative problems that require inputs from a multitude of perspectives. They are using data generated by others to augment their own data and sometimes to address problems that the original researchers could not have envisioned. Digital technologies have fostered a new world of research characterized by immense datasets, unprecedented levels of openness among researchers, and new connections among researchers, policy makers, and the public.

Even as these new capabilities are expanding the power and reach of research, they are raising complex issues for researchers, research institutions, research sponsors, professional societies, and journals. Digital technologies can complicate the process of verifying the accuracy and validity of research data, in part because of the enormous rate at which data can be generated and the intricate processing those data undergo. The high rate of innovation in digital technologies, a lack of standards, and issues such as privacy, national security, and possible commercial interests can inhibit the sharing of data, which can reduce the ability of researchers to verify results and build on previous research. Huge increases in the quantity of data being generated, combined with the need to move digital data between successive storage media and software environments as technologies evolve, are creating severe challenges in preserving data for long-term use. And these issues are not restricted to large-scale research



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Summary Advances in digital computing, communications, sensors, and storage tech- nologies are revolutionizing nearly every area of scientific, engineering, and medical research. Today, researchers are employing sophisticated technologies to generate, analyze, and share data to address questions that were unapproach- able just a few years ago. They are carrying out detailed simulations to guide theoretical approaches and to validate new experimental approaches. They are working in interdisciplinary and often international teams on complex inte - grative problems that require inputs from a multitude of perspectives. They are using data generated by others to augment their own data and sometimes to address problems that the original researchers could not have envisioned. Digital technologies have fostered a new world of research characterized by immense datasets, unprecedented levels of openness among researchers, and new connections among researchers, policy makers, and the public. Even as these new capabilities are expanding the power and reach of research, they are raising complex issues for researchers, research institutions, research sponsors, professional societies, and journals. Digital technologies can complicate the process of verifying the accuracy and validity of research data, in part because of the enormous rate at which data can be generated and the intricate processing those data undergo. The high rate of innovation in digital technologies, a lack of standards, and issues such as privacy, national security, and possible commercial interests can inhibit the sharing of data, which can reduce the ability of researchers to verify results and build on previous research. Huge increases in the quantity of data being generated, combined with the need to move digital data between successive storage media and software environ - ments as technologies evolve, are creating severe challenges in preserving data for long-term use. And these issues are not restricted to large-scale research 

OCR for page 1
2 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA projects; they can be especially acute for the small-scale projects that continue to constitute the bulk of the research enterprise. This report examines the consequences of the changes affecting research data with respect to three issues: integrity, accessibility, and stewardship. Because of the enormous range in the detailed procedures and styles of research from field to field, it is impossible to formulate specific recommendations for every field. Instead, for each of the three issues examined in this report, the authoring committee has developed a fundamental principle that applies in all fields of research regardless of the pace or nature of technological change. The report then explores the implications of these three central principles for the various components of the research enterprise.1 Developing the policies, standards, and infrastructure needed to ensure the integrity, accessibility, and stewardship of research data is a critically impor- tant task. It will require sustained effort on the part of all stakeholders in the research enterprise. The committee believes that the broad principles stated in this report provide the appropriate framework for this undertaking. ENSURING THE INTEGRITY OF RESEARCH DATA The fields of science, engineering, and medicine span the totality of physi - cal, biological, and social phenomena. Research in all these fields is based on certain fundamental procedures and convictions. However, each research field has its own characteristic methods and scientific style. Consequently, research is too broad an enterprise to permit many generalizations about its conduct. One theme, however, threads through its many fields: the primacy of scrupu- lously recorded data. Because the techniques that researchers employ to ensure the integrity—the truth and accuracy—of their data are as varied as the fields themselves, there are no universal procedures for achieving technical accuracy. The term “integrity of data” also has a structural meaning, related to the data’s preservation and presentation. This is the subject of Chapter 4. There are, how- ever, broadly accepted practices for generating and analyzing research. In most fields, for instance, experimental observations must be shown to be reproducible in order to be credible. Even this fundamental principle can have exceptions. For instance, observations with an historical element, such as the explosion of a supernova or the growth of an epidemic, cannot be reproduced. Other general practices include checking and rechecking data to confirm their accuracy and validity and submitting data and research results to peer review to ensure that the interpretation is valid. In addition, some practices may be employed only within specific fields, such as the use of double-blind clinical trials. Many of the traditional methods for ensuring the integrity of data—whether universal or discipline specific—are being modified as digital technologies alter 1 In this Summary, the principles appear in boldface type and the recommendations drawn from the principles are presented in italic type.

OCR for page 1
 SUMMARY capabilities and procedures. Because of the huge quantities of data generated by digital technologies, an increasing fraction of the processing and commu - nication of data is done by computers, sometimes with relatively little human oversight. If this processing is flawed or misunderstood, the conclusions can be erroneous. Documenting work flows, instruments, procedures, and measure - ments so that others can fully understand the context of data is a vital task, but this can be difficult and time-consuming. Furthermore, digital technologies can tempt those who are unaware of or dismissive of accepted practices in a particular research field to manipulate data inappropriately. Several recent incidents and trends provided an impetus for this study, such as the challenge journals face in preventing inappropriate manipulation of digi - tal images in submitted papers and well-publicized, albeit rare, cases of research misconduct involving fabricated or manipulated data. Assessing the broad set of institutions, policies, and practices that have been put into place to prevent and detect research misconduct, including the fabrication or inappropriate manipulation of data, was beyond the scope of this study. Nevertheless, the committee recognizes that the advance of digital technologies presents special challenges to the individuals and institutions charged with ensuring responsible conduct in research. Since these individuals and institutions will continue to play a critical role in ensuring the integrity of research data, it is important that they adapt their procedures in order to function effectively in the digital age. The most effective method for ensuring the integrity of research data is to ensure high standards for openness and transparency. To the extent that data and other information integral to research results are provided to other experts, errors in data collection, analysis, and interpretation (intentional or uninten - tional) can be discovered and corrected. This requires that the methods and tools used to generate and manipulate the data be available to peers who have the background to understand that information. The traditional way for submitting data and results to the scrutiny of other researchers is through peer review, which allows the validity of data and results to be judged for quality by a research community before dissemina- tion. Although traditional peer review practices remain essential for evaluating the importance and validity of research, it has become clear that these have limitations when it comes to ensuring that digital data have been appropriately collected, analyzed, and interpreted. Fortunately, it has also become clear that the advance of digital technologies is providing new opportunities to ensure data integrity through greater openness and transparency. The emergence and growth of accessible databases such as GenBank and the Sloan Digital Sky Survey illustrate these opportunities in widely disparate disciplines. 2 Yet in 2 Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and David L. Wheeler. 2006. “GenBank.” Nucleic Acids Research 34(Database):D16–D20. Available at http://nar. oxfordjournals.org/cgi/content/abstract/34/suppl_1/D16. See also Robert C. Kennicutt, Jr., 2007. “Sloan at five.” Nature 450:488–489.

OCR for page 1
 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA many fields, a lack of technological infrastructure, cultural norms and expecta - tions, and other factors act as barriers to openness and transparency. The integrity of data in a time of revolutionary changes in research practice is too important to be taken for granted. Consequently, this report affirms the following general principle for ensuring the integrity of research data: Data Integrity Principle: Ensuring the integrity of research data is essential for advancing scientific, engineering, and medical knowledge and for maintaining public trust in the research enterprise. Although other stakeholders in the research enterprise have important roles to play, researchers themselves are ultimately responsible for ensuring the integrity of research data . This straightforward principle leads to several specific recommendations. Recommendation : Researchers should design and manage their projects so as to ensure the integrity of research data, adhering to the professional standards that distinguish scientific, engineering, and medical research both as a whole and as their particular fields of specialization. Some professional standards apply throughout research, such as the injunc - tion never to falsify or fabricate data or plagiarize research results. These are fundamental to research, and have been confirmed by leading organizations and codified in regulations.3 Other standards are relevant only within specific fields—such as requirements to conduct double-blind clinical trials. Researchers must adhere to both sets of standards if they are to maintain the integrity of research data, and they can adhere to professional standards only if they fully understand the standards. Recommendation 2: Research institutions should ensure that eery researcher receies appropriate training in the responsible conduct of research, including the proper management of research data in general and within the researcher’s field of specialization. Some research sponsors proide support for this training and for the deelopment of training programs. Researchers, research institutions, research sponsors, professional societies, and journals all are responsible for creating and sustaining an environment that supports the efforts of researchers to ensure the integrity of research data. In some cases, digital technologies are having such a dramatic effect on research practices that some professional standards affecting the integrity of 3 National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 1992. Responsible Science: Ensuring the Integrity of the Research Process. Washington, DC: National Academy Press.

OCR for page 1
 SUMMARY research data either have not yet been established or are in flux. The recent recognition of the inappropriate manipulation of digital images submitted in journal articles illustrates the need for the research enterprise to continue to set clear expectations for appropriate behavior and effectively communicate those expectations. Recommendation : The research enterprise and its stakeholders—research institutions, research sponsors, professional societies, journals, and indiidual r esearchers—should deelop and disseminate professional standards for ensuring the integrity of research data and for ensuring adherence to these standards. In areas where standards differ between fields, it is important that differences be clearly defined and explained. Specific guidelines for data management may require reexamination and updating as technologies and research practices eole. Although all researchers should understand digital technologies well enough to be confident in the integrity of the data they generate, they cannot always be expected to be able to take full advantage of new capabilities. In an increasing number of fields, professionals with expertise specifically in the generation, analysis, storage, or dissemination of data are playing an essential role in taking advantage of digital technologies and ensuring the integrity of research data. Recommendation : Research institutions, professional societies, and journals should ensure that the contributions of data professionals to research are appropri- ately recognized. In addition, research sponsors should acknowledge that financial support for data professionals is an appropriate component of research support in an increasing number of fields. ENSURING ACCESS TO RESEARCH DATA Advances in knowledge depend on the open flow of information. Only if data and research results are shared can other researchers check the accuracy of the data, verify analyses and conclusions, and build on previous work. Further- more, openness enables the results of research to be incorporated into socially beneficial goods and services and into public policies, improving the quality of life and the welfare of society. Despite the many benefits arising from the open availability of research data and results, many data are not publicly accessible, or their release is delayed, for a variety of reasons. Data may be withheld because they are being used to generate a commercial product or service, because of confidentiality considerations, or because of national security concerns. Furthermore, in some fields it is acceptable for researchers to have a limited period of exclusivity in which the data are used only by the principal investigators and their immediate

OCR for page 1
 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA associates. In areas of potential commercial applications, patenting consider- ations, contractual restrictions, and technological constraints also can limit or delay the accessibility of data. Legitimate reasons may exist for keeping some data private or delaying their release, but the default assumption should be that research data, methods (including the techniques, procedures, and tools that have been used to collect, generate, or analyze data, such as models, computer code, and input data), and other information integral to a publicly reported result will be publicly acces - sible when results are reported, at no more than the cost of fulfilling a user request. This assumption underlies the following principle of accessibility: Data Access and Sharing Principle: Research data, methods, and other infor- mation integral to publicly reported results should be publicly accessible. Although this principle applies throughout research, in some cases the open dissemination of research data may not be possible or advisable. Grant- ing access to research data prior to reporting results based on those data can undermine the incentives for generating the data. There might also be technical barriers, such as the sheer size of datasets, that make sharing problematic, or legal restrictions on sharing as discussed in Chapter 3. Nevertheless, the main objective of the research enterprise must be to implement policies and promote practices that allow this principle to be realized as fully as possible. This principle has important implications for researchers. Recommendation : All researchers should make research data, methods, and other information integral to their publicly reported results publicly accessible in a timely manner to allow erification of published findings and to enable other researchers to build on published results, except in unusual cases in which there are compelling reasons for not releasing data. In these cases, researchers should explain in a publicly accessible manner why the data are being withheld from release. This principle may seem to apply only to publicly funded research, but a strong case can be made that much data from privately funded research should be made publicly available as well. Making such data available can produce societal benefits while also preserving the commercial opportunities that moti - vated the research. As discussed earlier, differences in technological infrastructure, publication practices, data-sharing expectations, and other cultural practices have long existed between research fields. In some fields, aspects of this “data culture” act as barriers to access and sharing of data. With the growing importance of research results to certain areas of public policy, the rapid increase of interdisci - plinary research that involves integration of data from different disciplines, and

OCR for page 1
 SUMMARY other trends, it is important for fields of research to examine their standards and practices regarding data and to make these explicit. Data accessibility standards generally depend on the norms of scholarly communication within a field. In many fields these norms are now in a state of flux. In some fields, researchers may be expected to disseminate data and conclusions more rapidly than is possible through peer-reviewed publications. Digital technologies are providing new ways to disseminate research results— for example, by making it possible to post draft papers on archival sites or by employing software packages, databases, blogs, or other communications on personal or institutional Web sites. Data sharing is greatly facilitated when a field of research has standards and institutions in place that are designed to promote the accessibility of data. Recommendation : In research fields that currently lack standards for sharing research data, such standards should be deeloped through a process that inoles researchers, research institutions, research sponsors, professional societies, jour- nals, representaties of other research fields, and representaties of public interest organizations, as appropriate for each particular field. If researchers are to make data accessible, they need to work in an environ- ment that promotes data sharing and openness. Recommendation : Research institutions, research sponsors, professional societies, and journals should promote the sharing of research data through such means as publication policies, public recognition of outstanding data-sharing efforts, and funding. Recommendation 8: Research institutions should establish clear policies regard- ing the management of and access to research data and ensure that these policies are communicated to researchers. Institutional policies should coer the mutual responsibilities of researchers and the institution in cases in which access to data is requested or demanded by outside organizations or indiiduals. PROMOTING THE STEWARDSHIP OF RESEARCH DATA Research data can be valuable for many years after they are generated. Data that led to initial insights can sometimes be used to generate new findings in the same or entirely different research fields. Existing data can be reanalyzed or combined with new data to verify published results or arrive at new conclu - sions. In some research areas, accessible databases have become essential parts of the research infrastructure, comparable to laboratories, research facilities, and computing devices and networks. Maintaining high-quality and reliable databases can be costly, especially

OCR for page 1
8 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA over long time periods. Obviously not all data should be preserved, but decid- ing what to save and what to discard becomes more difficult as increasing quantities of data are generated. Because the future uses of data are difficult to predict, returns on investments in stewardship can be uncertain. Furthermore, in many fields of research, there is no consensus as to who should maintain large databases or who should bear the costs. These problems can be especially dif - ficult for investigators involved in small projects, who can face great challenges in deciding which data will be useful, in documenting those data thoroughly for future uses, and in finding funds from limited budgets for data preservation. The value of data for long-term use suggests the following general principle for the stewardship of data: Data Stewardship Principle: Research data should be retained to serve future uses. Data that may have long-term value should be documented, ref- erenced, and indexed so that others can find and use them accurately and appropriately. Curating data requires documenting, referencing, and indexing the data so that they can be used accurately and appropriately in the future. Data steward - ship must start at the beginning of the project, not partway through or at the end of the project. Recommendation 9: Researchers should establish data management plans at the beginning of each research project that include appropriate proisions for the stewardship of research data. Because data without accompanying information about how they were derived can be useless, arranging for preserved data to be annotated so that they retain their long-term value is among the most important tasks for researchers establishing a data management plan. This recommendation is not meant to imply that individual researchers are responsible for ensuring indefinite preservation of their own data, but that they ensure that data that are judged to have potential long-term value are prepared and transferred to the appropriate archives or repositories. Researchers should work in partnership with their institutions, sponsors, and fields to formulate and implement their plans. Researchers need to participate in the development of policies and stan - dards for data annotation, preservation, and long-term access. Data need not be annotated in such detail that nonspecialists can immediately use them, but guidelines should exist for the degree of expertise required to use a data collec - tion. Researchers also need to develop procedures for error reporting, tracking, and correction. These policies and standards will vary greatly from field to field because they depend on the nature and potential uses of data. Nevertheless,

OCR for page 1
9 SUMMARY establishing such policies is the collective responsibility of the researchers in each field. Recommendation 0: As part of the deelopment of standards for the manage- ment of digital data, research fields should deelop guidelines for assessing the data being produced in that field and establish criteria for researchers about which data should be retained. Researchers need a supportive institutional environment to fulfill their responsibilities toward the stewardship of data. Recommendation : Research institutions and research sponsors should study the needs for data stewardship by the researchers they employ and support. Working with researchers and data professionals, they should deelop, support, and imple- ment plans for meeting those needs. The problem of paying for long-term stewardship of research data and other digital scholarly work is difficult, and solutions need to be developed over time. It is important that requirements for improved data management practices not be imposed as unfunded mandates. In the digital age, data man- agement needs to be integrated into research program funding as an essential component of the conduct of research. Where appropriate, grant applications should include costs for data stewardship. Many issues regarding the integrity, accessibility, and stewardship of research data are common across the research enterprise. Bodies that oversee multiple fields of research should disseminate lessons learned and help to foster interdisciplinary cooperation. Within the U.S. federal government, a recent report by the Interagency Working Group on Digital Data explores the needs for preservation and dissemination of publicly funded research data. 4 At the nongovernmental level, the National Research Council recently established a new Board on Research Data and Information that will address emerging issues in the management, policy, and use of research data at the national and international levels. 4 Interagency Working Group on Digital Data. 2009. Harnessing the Power of Digital Data for Science and Society. Washington, DC: National Science and Technology Council, Executive Office of the President.

OCR for page 1