Reproducibility and replicability are often cited as hallmarks of good science. Being able to reproduce the computational results of another researcher starting with the same data and replicate a previous study to test its results or inferences both facilitate the self-correcting nature of science. A newly reported discovery may prompt retesting and confirmation, examination of the limits of the original result, and reconsideration, affirmation, or extension of existing scientific theory. However, reproducibility and replicability are not, in and of themselves, the end goals of science, nor are they the only way in which scientists gain confidence in new discoveries.
Concerns over reproducibility and replicability have been expressed in both scientific and popular media. In 2013, a cover story in The Economist invited readers to learn “How Science Goes Wrong,” and Richard Harris’s popular 2017 book Rigor Mortis provided many examples of purported failures in science. An earlier essay by John Ioannidis in PLOS Medicine carried the provocative title, “Why Most Published Research Findings Are False” (2005). And recently, a large-scale replication study of psychological research reported that fewer than half of the studies were successfully replicated (Open Science Collaboration, 2015).
As these concerns about scientific research came to light, Congress responded with Section 116 of the American Innovation and Competitiveness Act of 2017. The act directed the National Science Foundation to engage the National Academies of Sciences, Engineering, and Medicine in a study to assess reproducibility and replicability in scientific and engineering research and to provide findings and recommendations for improving rigor and transparency in that research. See Box 1-1 for the full statement of task.
The National Academies appointed a committee of 15 experts to carry out this evaluation, representing a wide range of expertise and backgrounds: methodology and statistics, history and philosophy of science, science communication, behavioral and social sciences (including experts in the social and behavioral factors that influence the reproducibility and replicability of research results), earth and life sciences, physical sciences, computational science, engineering, academic leadership, journal editors, and industry experts in quality control. In addition, individuals with expertise pertaining to reproducibility and replicability of research results across a variety of fields were selected. Biographical sketches of the committee members are in Appendix A.1
1 Two committee members resigned during the course of the study.
The committee held 12 meetings, beginning in December 2017 and ending in March 2018, to gather information for this study and prepare this report. At these meetings, the committee heard from scientific society presidents and their representatives, representatives from funding agencies, science editors and reporters from different media outlets, researchers across a variety of sciences and engineering, experts (e.g., scientific journal editors and researchers) focused on reproducibility and replicability issues, and those with international perspectives. The agendas of the committee’s open meetings are in Appendix B.
The scope of the committee’s task—to review reproducibility and replicability issues across science and engineering—is broad, and the time to conduct the study was limited. Therefore, the committee sought to identify high-level, common aspects of potential problems and solutions related to reproducibility and replicability of research results across scientific
disciplines. The committee interpreted “engineering” to refer to engineering research rather than engineering practice and “topics that cross disciplines” as topics that are broadly applicable to many disciplines rather than topics focused on intersections of two or more disciplines. The committee intends its findings, conclusions, and recommendations to be broadly applicable across many scientific and engineering disciplines, although it was not able to deeply investigate any particular field of science. In assessing and examining the extent of replicability issues across science and engineering, the committee focused on identifying characteristics of studies that may be more susceptible to non-replicability of results.2
This report is comprised of seven chapters following this introduction. Chapter 2 introduces concepts central to scientific inquiry and outlines how scientists accumulate scientific knowledge through discovery, confirmation, and correction.
Chapter 3 provides the committee’s definitions of reproducibility and replicability; it highlights the scope and expression of the problems of non-reproducibility and non-replicability (refer to Task 1 in Box 1-1).
Chapter 4 focuses on the factors that contribute to the lack of reproducibility (see Task 5(a)). In accordance with the committee’s definitions (see Chapter 3), reproducibility relates strictly to computational reproducibility and non-reproducibility. Non-reproducibility can refer to the absence of adequate information to reconstruct the computed results or, in the presence of adequate information, can mean the failure to obtain the same result within the limits of computational precision. In this chapter, the committee assesses the extent of non-reproducibility and discusses its implications (see Task 2).
Chapter 5 focuses on replicability and reviews the diverse issues that bear on non-replicability in scientific results. Replicability is a subtle and nuanced topic, ranging from efforts to repeat a previous study to studies that confirm or build on the results obtained or the inferences drawn from a previous study. This chapter reviews evidence to assess the extent of non-replicability (see Tasks 2 and 5(a)).
Chapter 6 reviews efforts to improve reproducibility and reduce unhelpful sources of non-replicability (see Task 4).
Chapter 7 examines the larger context of how various fields of science validate new scientific knowledge. While reproducibility and replicability are important components in the ongoing task of validating new scientific
2 Because the terms used to describe similar activities across science and engineering differ, the committee selected generic terms to describe the components of scientific work, and they are used consistently throughout the report: “study” refers to work on a specific scientific question; “results” refer to the output of a study but does not include conclusions that are derived based on the results.
knowledge, other approaches, such as syntheses of available evidence on a scientific question, predictive modeling, and convergent lines of evidence, are prominent features in a variety of sciences (see Tasks 5(b) and 6). The chapter concludes with a focus on public understanding and confidence in science (see Task 3).
We highlight instructive examples of good practices for improving rigor and transparency throughout the report in boxes (see Task 7).
Finally, in addition to Appendixes A and B, noted above, Appendix C presents the committee’s recommendations grouped by stakeholder, and Appendixes D and E elaborate on specific aspects of the report. There is also an electronic archive of the set of background papers commissioned by the committee.3