Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Prepublication copy, uncorrected proofs. 1 INTRODUCTION Reproducibility and replicability are often cited as hallmarks of good science. Being able to reproduce the computational results of another researcher starting with the same data and replicating a previous study to test its results or inferences both facilitate the self-correcting nature of science. A newly reported discovery may prompt retesting and confirmation, examination of the limits of the original result, and reconsideration, affirmation, or extension of existing scientific theory. However, reproducibility and replicability are not, in and of themselves, the end goals of science, nor are they the only way in which scientists gain confidence in new discoveries. Concerns over reproducibility and replicability have been expressed in both scientific and popular media. A few years ago, a cover story in the Economist (2013) invited readers to learn âHow Science Goes Wrong,â and Richard Harrisâs popular book, Rigor Mortis, provides many examples of purported failures in science. An earlier essay by John Ioannidis in PLOS Medicine carried the provocative title, âWhy Most Published Research Findings Are Falseâ (Ioannidis, 2005). And just a few years ago, a large-scale replication study of psychological research reported that fewer than half of the studies were successfully replicated (Open Science Collaboration, 2015). As these concerns about scientific research came to light, Congress responded with Section 116 of the American Innovation and Competitiveness Act of 2017. The act directed the National Science Foundation to engage the National Academies of Sciences, Engineering, and Medicine in a study to assess reproducibility and replicability in scientific and engineering research and to provide findings and recommendations for improving rigor and transparency in scientific research: see Box 1-1 for the full statement of task. BOX 1-1 Statement of Task The National Academies of Sciences, Engineering, and Medicine will assess research and data reproducibility and replicability issues, with a focus on topics that cross disciplines. The committee will: 1. provide definitions of "reproducibility" and "replicability" accounting for the diversity of fields in science and engineering, 2. assess what is known and, if necessary, identify areas that may need more information to ascertain the extent of the issues of replication and reproducibility in scientific and engineering research, 3. consider if the lack of replicability and reproducibility impacts the overall health of science and engineering as well as the publicâs perception of these fields, 4. review current activities to improve reproducibility and replicability, 5. examine (a) factors that may affect reproducibility or replicability including incentives, roles and responsibilities within the scientific enterprise, methodology and 17
Prepublication copy, uncorrected proofs. experimental design, and intentional manipulation; (b) as well as studies of conditions or phenomena that are difficult to replicate or reproduce, 6. consider a range of scientific methodologies as they explore research and data reproducibility and replicability issues, and 7. draw conclusions and make recommendations for improving rigor and transparency in scientific and engineering research and will identify and highlight compelling examples of good practices. While the committee may consider what can be learned from past and ongoing efforts to improve reproducibility and replication in biomedical and clinical research, the recommendations in the report will focus on research in the areas of science, engineering and learning that fall within the scope of the National Science Foundation. In addressing the tasking above, the committee may consider the following questions: â Using definitions of âreproducibilityâ and âreplicabilityâ endorsed by the committee, explore what it means to successfully reproduce/replicate in different fields? Which issues (e.g., perhaps pressures to publish, inadequate training) are common across all or most fields when there are failures to replicate results? â What is the extent of the absence of reproducibility and replicability? Is there a framework that outlines the various reasons for lack of reproducibility and replicability of a study? â What strategies have scientists employed other than replicating/reproducing findings to gain confidence in scientific findings (e.g., in situations where replicating/reproducing is not possible, such as studies of ephemeral phenomena), and what are the advantages/shortcoming of those approaches? â What cost-effective reforms could be applied? Where would they be best applied? What would their anticipated impact be? Early in the process and throughout the study, scientific and engineering societies, communication experts, scientific tool developers, and other stakeholders will be engaged in the work of the committee as part of the data gathering process. These same stakeholder groups will be tapped at the end of the study in the planned release event, to ensure a wide distribution of the report. The National Academies appointed a committee of 13 experts to carry out this evaluation, representing a wide range of expertise and backgrounds: methodology and statistics, history and philosophy of science, science communication, behavioral and social sciences (including experts in the social and behavioral factors that influence the reproducibility and replicability of research results), earth and life sciences, physical sciences, computational science, engineering, academic leadership, journal editors, and industry expertise in quality control. In addition, individuals with expertise pertaining to reproducibility and replicability of research results across a variety of fields were selected. Biographical sketches of the committee members are in Appendix A. 18
Prepublication copy, uncorrected proofs. The committee held 12 meetings, beginning in December 2017 and ending in March 2018, to gather information for this study and prepare this report. At these meetings, the committee heard from scientific society presidents and their representatives, representatives from funding agencies, science editors and reporters from different media outlets, researchers across a variety of sciences and engineering, experts (scientific journal editors and researchers) focused on reproducibility and replicability issues, and those with international perspectives. The agendas of the committeeâs open committee meetings are in Appendix B. The scope of the committeeâs taskâto review reproducibility and replicability issues across science and engineeringâis broad, and the time to conduct the study was limited. Therefore, the committee sought to identify high-level, common aspects of potential problems and solutions related to reproducibility and replicability of research results across scientific disciplines. The committee interpreted âengineeringâ to refer to engineering research rather than engineering practice and âtopics that cross disciplinesâ as topics that are broadly applicable to many disciplines rather than topics focused on intersections of two or more disciplines. The committee intends its findings, conclusions, and recommendations to be broadly applicable across many scientific and engineering disciplines and was not able to deeply investigate any particular field of science. In assessing and examining the extent of replicability issues across science and engineering, the committee focused on identifying characteristics of studies that may be more susceptible to non- replicability of results.1 This report comprises seven chapters; following this introduction: Chapter 2 introduces concepts central to scientific inquiry and outlines how scientists accumulate scientific knowledge through discovery, confirmation, and correction. Chapter 3 provides the committeeâs definitions of reproducibility and replicability; it highlights the scope and expression of the problems of non-reproducibility and non-replicability (Task 1 in our charge, see Box 1-1). Chapter 4 focuses on the factors that contribute to the lack of reproducibility (Task 5(a)). In accordance with the committeeâs definitions (see Chapter 3), reproducibility relates strictly to computational reproducibility and non-reproducibility. Non-reproducibility can refer to the absence of adequate information to reconstruct the computed results or, in the presence of adequate information, can mean the failure to obtain the same result within the limits of computational precision. In this chapter, the committee assesses the extent of non-reproducibility and discusses the implications (Task 2). Chapter 5 focuses on replicability and reviews the diverse issues that bear on non- replicability in scientific results. Replicability is a subtle and nuanced topic, ranging from efforts to repeat a previous study to studies that confirm or build on the results obtained or the inferences drawn from a previous study. This chapter reviews evidence to assess the extent of non- replicability (Tasks 2 and 5(a)). Chapter 6 reviews efforts to improve reproducibility and to reduce unhelpful sources of non-replicability (Task 4). 1 Because the terms used to describe similar activities across science and engineering differ, the committee selected generic terms to describe the components of scientific work, and they are used consistently throughout the report: âstudyâ refers to work on a specific scientific question; âresultsâ refer to the output of a study but does not include conclusions that are derived based on the results. 19
Prepublication copy, uncorrected proofs. Chapter 7 examines the larger context of how various fields of science validate new scientific knowledge. While reproducibility and replicability are important components in the ongoing task of validating new scientific knowledge, other approaches, such as syntheses of available evidence on a scientific question, predictive modeling, and convergent lines of evidence, are prominent features in a variety of sciences (Task 5(b) and 6). The chapter concludes with a focus on public understanding and confidence in science (Task 3). Instructive examples of good practices for improving rigor and transparency are highlighted throughout the report in boxes (Task 7). Finally, in addition to Appendices A and B, noted above, Appendix C presents the committeeâs recommendations grouped by stakeholder, and Appendices D and E elaborate on specific aspects of the report. There is also an electronic archive of the set of background papers commissioned by the committee.2 2 The papers are available at: https://www.nap.edu/catalog/25303. 20