As the workshop progressed, the discussions transitioned from examining the current state of transparency in preclinical biomedical research to describing opportunities for action (see Box 6-1 for corresponding workshop objectives). Panelists offered their reflections on the workshop thus far and discussed potential stakeholder actions to harmonize guidelines and develop minimal reporting standards.
Benedict Kolber, associate professor at Duquesne University, shared his perspective on what transparent reporting means for reviewers of grants and manuscripts. Richard Nakamura, retired director of the Center for Scientific Review at the National Institutes of Health (NIH), discussed some of the opportunities to review research for reproducibility, and shared several points to keep in mind moving forward. Valda Vin-
son discussed some of the challenges that journals face as a stakeholder promoting culture change. Franklin Sayre, STEM Librarian at Thompson Rivers University, emphasized the value of engaging research support staff, including librarians, in efforts to increase reproducibility. Melissa Rethlefsen, associate dean, George A. Smathers Libraries and Fackler Director, Health Science Center Libraries at the University of Florida, expanded on the discussion of librarians as partners in leveraging existing resources and driving change within institutions. Michael Keiser, assistant professor at the University of California, San Francisco, shared lessons from developing and testing machine learning models that could be applied to designing and implementing transparent reporting strategies. Steven Goodman discussed the Patient-Centered Outcomes Research Institute (PCORI) Methodology Standards as a case example of an effort to develop minimal standards for the design, conduct, analysis, and reporting of research and the limitations of checklists in changing behavior.
Benedict Kolber, Associate Professor, Duquesne University
“Transparency will be the legacy of this rigor, reproducibility, transparency movement,” Kolber said. Bad science will happen, and the key is to be transparent and honest about what was done. Moving toward better experimental design is important, he said, but reporting guidelines can be implemented to improve transparency now, regardless of how an experiment was designed. Kolber shared his perspective as an academic researcher and faculty member on what transparent reporting means for reviewers of grants and manuscripts.
Grant Reviewer for a Funder
Guidelines provided by funders to grant reviewers vary widely, Kolber said. He reiterated the point by Shai Silberberg that some review processes now require applicants to discuss the rigor of the data on which they are basing their proposal. Kolber said that as a grant reviewer, however, he often believes he needs to decide how much weight he should give to elements of rigor.
Kolber suggested a starting point could be for NIH to add a rigor attachment to grant applications that is similar to the authentication attachment. NIH requires grant applicants to attach a document describing how chemical and biological resources included in the proposal will be authenticated. This information is not taken into account in scoring, he
noted. He suggested that an attachment requiring discussion of the rigor of the experimental design could be added, and initially not included in scoring, to inform discussion of how grant reviewers could evaluate rigor in funding proposals.
Manuscript Peer Reviewer
As mentioned earlier, as transparency in reporting improves and more information is provided in manuscripts, the burden on the reviewers increases, Kolber said. “Reviewers are the last gatekeepers” of scientific quality and being a reviewer has become increasingly more difficult and time intensive as reviewers must apply checklists and review detailed methods. This is essential, but Kolber said that other mechanisms are needed to keep from overburdening peer reviewers.
One approach could be having separate reviewers for different sections. Kolber noted that having separate reviewers for statistics has been suggested. He said a separate reviewer could assess the methods against a checklist before the manuscript is sent to the other reviewers, allowing them to then focus on reviewing the rest of the content for what was done well and what might be missing.
Richard Nakamura, Former Director of the Center for Scientific Review, National Institutes of Health
Several factors have negatively impacted reproducibility in recent years, Nakamura said. As background, he said that after the congressional effort to double the total NIH budget over the course of 5 years1 ended in 2003, “all of science in the United States underwent somewhat of a recession.” As a result, grant success rates were low and cuts to grant funding were high. This meant, he explained, that researchers had less money for each study, and looked for ways to “cut corners.” In addition, he said that researchers continue to face “long and busy waits for research grants, protocol approval, and publication.” He also noted that there is “intense pressure” for both researchers and journal editors to improve performance metrics. For example, editors are often rewarded for actions that increase the impact factor of the journal.
1 See detailed information about NIH appropriations at https://www.nih.gov/about-nih/what-we-do/nih-almanac/appropriations-section-1 (accessed February 19, 2020).
Opportunities to Review Research for Reproducibility
Nakamura listed some of the opportunities to review research for reproducibility or for adherence to guidelines or checklists. One approach, he said, would be to redraft grant applications as protocols, which could then be judged for reproducibility, but this approach is not widely preferred by the scientific community. Another opportunity is during protocol review by an Institutional Review Board or an Animal Care and Use Committee. As discussed, however, there are concerns about the impact of increasing the burden on reviewers on the timeliness of approvals.
For the review of grant applications, a general strategy is to have the Principal Investigator commit to follow a set of guidelines (e.g., Consolidated Standards of Reporting Trials [CONSORT]). Another opportunity, Nakamura said, is to have grant reviewers evaluate the reproducibility or adherence to guidelines of the published papers cited in support of the proposed research. To understand the potential impact of this increased burden on reviewers, the Center for Scientific Review surveyed reviewers about the extent to which they look at the primary literature cited by grant applicants. Nakamura said that 90 percent of reviewers responded that they had checked the original papers cited. This suggests that the imposition would be minimal, he said, and researchers might be motivated to more carefully consider the rigor of the publications they cite in grant applications if they know reviewers are taking this into account.
Another opportunity for review of reproducibility and adherence to guidelines is, as discussed, peer and editor review for publication by journals. Nakamura agreed with the comments made that the responsibility does not rest solely with journals. Nonetheless, journal publishers, and particularly high-impact journal publishers, “play a critical role in ensuring that strong papers are the ones that get published,” he said.
Nakamura made several points to keep in mind in moving forward with developing minimum standards for reporting. First, he supported the use of guidelines and checklists and underscored the need to coordinate guidelines for efficiency, and to prioritize the most important checklist items as discussed by Silberberg. He also underscored the need to “keep funding and publication space available for exploratory, discovery, and replication studies.” Awarding funding only for protocols will impact exploration and creative ideas. He added that exploratory studies should be transparently reported so that the limitations are clear. Nakamura concurred with Macleod that the impact of interventions on the science ecosystem must be assessed. “Explicit measures of success” are needed, he said, such as workload, cost, and replicability of important findings.
Valda Vinson, Editor of Research, Science
Much of the workshop discussions focused on the need for culture change in scientific research, so Vinson reflected on the need for culture change within scientific publishing as well. Two decades ago, as an associate editor, it was instilled in Vinson that the scientific community set the standard, and publishers upheld the standard. Journals did not lead, she said; they followed the norms set by the research community. However, as discussed by Brian Nosek (see Chapter 3), all stakeholders contribute to effecting cultural change. She said the research community and publishers need to “be very mindful of one another” in working collaboratively toward change.
In reflecting on the discussions thus far, Vinson highlighted the idea that science is cyclical and cumulative. Journals strive to publish those papers that they believe will allow science to move forward, Vinson said. The primary goal of a journal is “the communication of science to scientists.” She recalled that some of the discussions called for journals to change how they decide what to publish. If there is agreement that the overarching goal of a journal is to disseminate high-value scientific information to a broad readership, then a question for discussion, she said, is whether journals are publishing the right research. She also observed that exploratory and confirmatory research are often discussed in the context of one being better or worse than the other, and she suggested different terminology might also be needed for culture change.
Thinking specifically about papers published by her journal, Science, Vinson observed that additional studies done in response to a reviewer, as a condition of publication, are often underpowered and of lower quality. A resubmitted manuscript might have three figures showing data from well-powered in vitro studies, for example, and a fourth figure with new data from an underpowered in vivo study, because a reviewer comment said that the paper should include in vivo data. The resubmitted manuscript then meets the reviewer’s requirement. Vinson suggested that changes to the publishing culture should be done in partnership with the research community. Vinson said this type of culture change could evolve in the publishing community, but not without the same culture change within the research community (i.e., with support from reviewers and researchers).
Franklin Sayre, STEM Librarian, Thompson Rivers University
Basic and clinical researchers are supported by a cadre of research support staff, including statisticians, computer scientists, librarians,
archivists, and others. Sayre shared his perspective as a science, technology, engineering, and mathematics (STEM) librarian supporting evidence-based medicine. He pointed out that many of the issues related to reproducibility involve “scholarly communication” (e.g., data sharing, checklists, preregistration, preprints, code sharing, incentives, metrics). The research support community and research libraries have expertise to contribute to the discussions on these issues.
As a STEM librarian, Sayre said that he regularly works with graduate students and postdoctoral fellows who are seeking guidance on how to implement a required checklist, or who are interested in designing reproducible research. He described his role as happening within a “black box” that sits among research policy, incentives, and infrastructure on one side, and reproducible, rigorous research on the other. He said research support staff and the work they do in that black box are often missing from the conversations about reproducibility.
Sayre considered why there has not been more uptake of rigorous and reproducible research methodologies. Guidelines and checklists are available, as well as tools and infrastructure, such as open source frameworks and data repositories. He said what may be needed is not more repositories, but rather, better funding and support for existing resources. Sayre noted that the researchers he has worked with often believe that using checklists early in the research process gives them confidence that they are not missing something that will impact their ability to publish. He suggested that one reason for the lack of uptake, as has been discussed, is the current incentive structure. Another reason is that designing reproducible research can be complicated as it may require knowledge and technical skill in areas of scholarly communication, such as programming, data sharing, data curation, research policy, checklists, guidelines, preregistration, and publishing issues.
Sayre also considered what lessons can be learned from the successful implementation of reporting guidelines such as Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and CONSORT.2 He suggested that one commonality of successful guidelines is that they facilitate team science, bringing together investigators, collaborators, and research support staff, and sharing the burden.
Workshop participants previously raised the idea of creating a new profession to fill the black box, but Sayre pointed out that most institutions already have “a constellation of experts” who can advise on study design, statistical analysis, data management (e.g., curation, repositories, sharing), policies, and other elements of reproducible research. These
experts work within departments, computing centers, and libraries, for example. In closing, Sayre said that research support staff should establish an identity as a stakeholder group so they are included in the discussions about enabling reproducibility in biomedical research and can contribute to solutions.
Melissa Rethlefsen, Associate Dean, George A. Smathers Libraries, and Fackler Director, Health Science Center Libraries, University of Florida
“Institutions drive the publish or perish, funding or famine culture,” said Rethlefsen, and they play a role in changing that culture and promoting reproducibility of research. Although lack of reproducibility and transparency is of particular concern to the field of preclinical biomedical research, all disciplines, even the humanities, face problems with reproducibility of research.
Institutions should find ways to help researchers succeed, Rethlefsen said, and one approach may be to engage libraries and librarians as partners. As Sayre mentioned, librarians have expertise in scholarly communications and understand the research life cycle. Librarians are transdisciplinary, skilled at working with faculty, staff, and students in all disciplines, including researchers, educators, and clinicians. Many of the tools to support reproducible research are already available through institutional libraries, she said, such as institutional repositories, and support for data management and data curation. In addition, libraries are “natural partners” with other research resources such as the institutional Office of Research, Clinical and Translational Science Awards Program Hubs, high-performance computing centers, and biostatistics cores. Rethlefsen described two case examples that illustrate how librarians are helping to drive institutional change by serving as faculty members and by leveraging tools and services and supporting curricular integration, professional development, advocacy and outreach, and coalition building.
University of Utah
While working at the University of Utah, Rethlefsen became aware that the vice president for research was interested in the reproducibility of preclinical research and it was decided that the library would plan and host a research reproducibility conference in 2016. The conference explored ways in which the library could support reproducibility of research, including leveraging existing resources and relationships. For example, the library had partnered with the Center for Clinical and Trans-
lational Science at the university to establish a systematic review core, and the library had supported an event raising awareness of sex and gender differences in research. The convergence of these and other resources (e.g., the Study Design and Biostatistics Core) enabled the library to support more rigorous research in general and to assist with addressing rigor and reproducibility in preparing grant applications. As awareness of the library’s resources for reproducibility grew, Rethlefsen said librarians were asked to teach classes, assist with lectures, and develop partnerships. For example, she said the library helped to establish the university’s first JupyterHub server to teach reproducible Python scripting. The library was asked to teach the reproducibility sessions of the DeCart summer program in biomedical data science and teaches part of the Research Administration Training Series.
Rethlefsen said that feedback after the 2016 conference indicated that stakeholders across disciplines were eager to connect in a neutral forum such as the library. This illustrates the importance of grassroots initiatives. The library continues to scale its efforts and has launched a Grand Rounds Reproducibility Series (a weekly lecture on reproducibility in research in different disciplines) and an interdisciplinary Research Reproducibility Coalition to push for policy change at the institutional level. A second Research Reproducibility Conference was held in 2018, designed specifically to teach researchers the skills needed for reproducible research, including working with reporting guidelines and minimum reporting standards.
University of Florida
At the University of Florida, where Rethlefsen currently works, she is deploying the same strategy to identify existing resources, establish partnerships, and drive change. One existing library resource is the Academic Research Consulting and Services group, which has a data management librarian, informatics and bioinformatics librarians, a clinical and translational science institute liaison librarian, and a research impact librarian. To more effectively support reproducibility and reduce the burden on researchers, the library is hiring new faculty, including a reproducibility librarian, and, in partnership with the university’s Clinical and Translational Science Institute, a systematic review librarian.
As before, Rethlefsen said, library faculty are also involved in teaching, curriculum development (e.g., rigor and reproducibility training as required by NIH training grants), and professional development (e.g., how to use Python, Open Science Framework, reporting guidelines). The library collaborated with Research Computing at the university to host a Research Bazaar, which is a worldwide event to promote digital literacy.
Planning is under way for a research reproducibility conference in 2020, she said, that will focus on best practices for education about research reproducibility.
In closing, Rethlefsen said there are existing resources and practices, some of which may be grassroots efforts, which can be leveraged by institutions. She emphasized that sustaining grassroots or “volunteer” efforts is challenging, and support from institutional leadership is needed for success.
Michael Keiser, Assistant Professor, University of California, San Francisco
Keiser shared his perspective on transparent reporting as an early career researcher, drawing on Platt’s systematic and transparent approach to science, which Platt termed “strong inference”—a model of inquiry that relies on alternative hypotheses rather than a single hypothesis to avoid bias (Platt, 1964). Keiser described an example of Platt’s approach which, while systematic, allows for creativity and exploration (see Figure 6-1). The approach begins with devising a hypothesis and a set of alternative hypotheses—feasible and falsifiable statements that can be tested experimentally. The second step is to design one or more experiments to disprove or exclude one or more of the hypotheses. Platt’s third step is to conduct the experiments. This three-step process is then refined and repeated until only one hypothesis remains. Keiser cautioned that measurements (e.g., numbers, statistics, calculations) can be misleading depending on how they are framed. There is also the risk that researchers may substitute correlation with causation. Keiser emphasized that a hypothesis can never be proven or confirmed, but it can certainly be disproven.
With this as background, Keiser then transitioned to application of strong inference toward machine learning. “We must be our own adversaries to the models we develop,” Keiser said. He described controls for use in computational sciences (including machine learning) that he said could be applied more broadly to data science and analysis (Chuang and Keiser, 2018).
First, Keiser argued that there is no black box for computational models. There are techniques to investigate computational models and, similar to other types of research, it is important to ask whether the model is logical. As an example, Keiser described one of his own studies using machine learning to detect the presence of different types of amyloid
plaques in the brains of deceased Alzheimer’s disease patients (Tang et al., 2019). Keiser explained how his team trained a neural network to rapidly classify plaques (e.g., diffuse or cored) based on image analysis. Keiser added that a preprint of the paper was posted on bioRxiv and the data were posted to the open access repository, Zenodo. This study had already been replicated by others using different datasets before the paper was accepted for publication.
When considering minimal reporting standards, Keiser suggested applying Platt’s strong inference approach when choosing scientific methods that are appropriate for a given problem. Transparent reporting should include information on the logic and reasoning that went into a study analysis, he said. Data science tools are already available to encode and share relevant information, including preregistration in Registered Reports, software version control using Git, data repositories through Zenodo, and logic models using Jupyter notebook.
In closing, Keiser said researchers should be their own adversaries. Drawing on lessons from the field of cybersecurity: a “red team” is a group of good actors tasked with attacking digital infrastructure to test an organization’s defenses. Keiser suggested that one approach for the biomedical field could be to establish a similar type of a red team within research groups or institutions in which scientists perform regular checks
on each other’s work. Perhaps this could be a potential research support career path.
Steven Goodman, Professor of Medicine and Health Research and Policy and Co-Director of METRICS, Stanford University
Goodman briefly shared his perspective as a research educator on some of the critical gaps in the training of research scientists. Many laboratory scientists, early career as well as some senior investigators, have a limited understanding of the “basic elements and formal logic and purpose of experimental design,” he said, including blinding, randomization, sample size determination, and other aspects. Laboratory scientists often have limited training in the “foundations of statistical inference and the meaning of basic statistical summaries,” he continued. Reiterating his comment from earlier in the workshop, he said that doctoral students are often enrolled in advanced analysis courses without understanding the concepts covered in introductory courses. Many researchers do not understand the links among “the question, the design, the measurements, the conduct, the analysis, the inference, the conclusions, and the generalizations” in the chain of experimentation, he said. Lastly, he said that “virtually every gap in training or understanding is created or reinforced by the literature they read.” He asserted that it is extremely challenging to train new scientists to conduct rigorous science when that is not what they are seeing published in high-profile journals.
PCORI Methodology Standards
Goodman discussed the PCORI Methodology Standards as a case example of an effort to develop minimal standards for the design, conduct, analysis, and reporting of research. The law authorizing PCORI mandated the establishment of a Methodology Committee and the development of methodology standards for patient-centered outcomes research by the committee, with input from stakeholders and the public.3 The standards are used to assess the rigor of studies proposed in funding applications received by PCORI, and to monitor the conduct and reporting of funded
3 Further information about PCORI’s methodology research, including the PCORI Methodology Report, and the members of the Methodology Committee, is available at https://www.pcori.org/research-results/about-our-research/research-methodology (accessed November 20, 2019).
studies, Goodman said.4 A total of 65 standards for patient-centered outcomes research were developed in 16 topic areas, including 5 cross-cutting areas and 11 for specific elements of research (see Box 6-2).
Goodman listed the four Standards for Preventing and Handling Missing Data (MD) and provided excerpts from the explanation of the second standard (PCORI, 2019):
- “MD-1: Describe methods to prevent and monitor missing data.”
- “MD-2: Use valid statistical methods to deal with missing data that properly account for statistical uncertainty due to missing-ness.… Estimates of treatment effects or measures of association should … account for statistical uncertainty attributable to missing data. Methods used for imputing missing data should produce valid confidence intervals and permit unbiased inferences.… Single imputation methods, such as last observation carried forward, baseline
observation carried forward, and mean value imputation, are discouraged…” [emphasis Goodman].
- “MD-3: Record and report all reasons for dropout and missing data, and account for all patients in reports.”
- “MD-4: Examine sensitivity of inferences to missing data methods and assumptions and incorporate into interpretation.”
“These are basic principles” and seem relatively “minimal and obvious,” Goodman said. However, they are not necessarily easy to assess. As an example, he challenged participants to consider exactly how they might assess compliance with the standard that reads, “Single imputation methods, such as last observation carried forward, baseline observation carried forward, and mean value imputation, are discouraged.” He added that assessing applicable standards can require “a fair amount of sophisticated judgment.”
The adherence of final reports to the PCORI Methodology Standards was evaluated and presented at the Eighth International Congress on Peer Review and Scientific Publication (Mayo-Wilson et al., 2017). None of the 31 final reports assessed had adhered to all of the standards, Goodman reported. He highlighted that “many reports did not use appropriate methods for handling missing data,” and “most reports examined heterogeneity with subgroup analyses, but few studies conducted confirmatory tests for heterogeneity.” This shows that simply having the standards in place was not sufficient, Goodman said. He observed that although PCORI “has substantial leverage and resources” as a funder, it still faces challenges in influencing practice. PCORI is now conducting a portfolio review of applications and final reports to determine if potential issues in final reports can be detected and prevented early. He added that it is much more difficult to implement true policy solutions that change practice than to develop technical solutions (i.e., standards).
Implications of a “Simple Checklist”
Goodman read excerpts from a 2009 commentary by Pronovost and colleagues on the interest in and implications of the checklist intervention Pronovost developed in 2006 to reduce central line infections in the Michigan Keystone ICU program.5 The checklist was hailed in the media as a simple solution to a serious patient safety problem. According to Pronovost and colleagues, however, “the mistake of the ‘simple checklist’ story is in the assumption that a technical solution (checklists) can solve
5 The original article describing the intervention is available at https://www.nejm.org/doi/full/10.1056/NEJMoa061115 (accessed November 20, 2019).
an adaptive (sociocultural) problem” (Bosk et al., 2009, p. 444). Goodman emphasized two sections of the commentary for participants to reflect on as they considered the development and implementation of guidelines.
- “Widespread deployment of checklists without an appreciation for how or why they work is a potential threat to patients’ safety and high-quality care” (Bosk et al., 2009, p. 444).
- “Indeed, it would be a mistake to say there was one ‘Keystone checklist.’ There was not a uniform instrument, but rather, more than 100 versions” (Bosk et al., 2009, p. 445).
Goodman summarized that technical solutions (e.g., checklists, minimal reporting standards) can serve as reminders, but they are not sufficient for solving adaptive sociocultural problems and do not substitute for knowledge or understanding. In the absence of knowledge and understanding, enforcing minimal reporting standards may require significant effort and produce limited results. “Pressure and legitimacy need to be exerted at all levels, from funders, journals, regulators, and professional societies, but change has to occur on the ground level, and must include education and the means to operationalize it,” Goodman said. “Improving research practices must be driven by scientists reforming their own fields with the help of experts in rigor and reproducibility, impelled by institutional leadership, manifest by structures and metrics,” he added. He emphasized the importance of partnering with sociologists and organizational experts who study institutional and disciplinary change.
Increasing Rigor and Enhancing Transparency
Harvey Fineberg observed that the some of the suggestions raised during the workshop discussions were specific to increasing scientific rigor, while others focused on enhancing transparency, and some suggestions covered the intersection of the two. He noted the need to keep both the distinctions and connections between rigor and transparency in mind when discussing potential solutions for improving reproducibility and the roles of stakeholders, including researchers, institutions, funders, publishers/editors, and the larger scientific community.
Shai Silberberg said that, in his opinion, rigor and transparency are the same in the sense that transparency leads to rigor. Alexa McCray agreed and said that “transparency and rigor are two sides of the same coin” and that being transparent from the start, and transparent throughout, can reduce the burdens associated with reproducibility because transparency
facilitates assessment by the broader scientific community. The benefits of openness and transparency are discussed in the National Academies consensus study report Open Science by Design, McCray said (NASEM, 2018).
Arturo Casadevall agreed that, while transparency can promote rigor, the two concepts are distinct, as discussed by Fineberg. Highly rigorous science can be conducted in secrecy (i.e., without transparency), as might be done for military weapons research, for example. However, transparency can “promote rigor, independently of the tenants that define rigor,” he said.
Goodman countered that rigor and transparency are inextricable in that “we can’t trust science that we can’t see.” Science must be transparent to be convincing. Science that is rigorous, but not transparent, is often not reproducible or translatable and, in the absence of confirmation, does not lead to a consensus among scientists of what might be considered “fact” or “truth.”
Kolber observed that definitions for reproducibility and replicability had been discussed earlier in the workshop, but transparency as it applies to research reporting had not been fully defined. He encouraged participants to consider what would be required to develop a fully transparently reported manuscript. Deborah Sweet suggested that involving trainees and postdoctoral fellows in the review process would be helpful given that they are the scientists who are actually carrying out the laboratory experiments and therefore best suited to determine if there is sufficient information provided in a manuscript to allow them to reproduce or replicate the study.
Addressing Underpowered In Vivo Studies
Thomas Curran asserted that it is “unethical to conduct a bad animal experiment.” He reiterated a point made several times during the workshop that researchers may add an underpowered animal study or use an inappropriate animal model in response to a request by a peer reviewer. He called on journal editors to intervene when reviewers ask for such studies. Vinson responded that journals do not intentionally publish animal studies that are underpowered or are done in inappropriate models, but editors rely on the reviewers, who are the experts. She observed that there is now increased awareness of this issue and that journals are implementing statistical reviews to establish thresholds for publication of such studies.
Nosek suggested that one approach to addressing this issue could be for journals not to require additional in vivo studies for publication. Nosek suggested that this could be an opportunity to use Registered Reports—a publishing format in which protocols are provisionally accepted for
publication regardless of whether the result is positive or negative if the authors follow through with the registered methodology. Authors who are asked by reviewers to conduct additional experiments could submit a study design and protocol to Registered Reports, which the journal would review and commit to publish the results. This approach would help alleviate pressure on the researcher to generate a positive result for publication, Nosek said. Furthermore, he suggested that a journal could compare current practice to this approach to determine whether the intervention has an impact on reproducibility of results. Vinson added that this type of randomized assessment would likely require participation of more than one journal.
Considering Peer Review
Silberberg observed that publication in high-profile journals often requires that manuscripts include a host of different techniques to address a scientific question from multiple angles. He added that it is unrealistic to expect a single investigator to have such broad expertise, which is why many of the studies published in high-profile journals are collaborations. The result is that some authors may not fully understand all of the content in a given manuscript or may not be able to critically evaluate the contributions of other authors. More importantly, Silberberg continued, reviewers may not have the breadth of expertise to critically evaluate the entirety of a manuscript. He shared that during the National Institute of Neurological Disorders and Stroke stakeholder workshop held in 2012 (see Landis et al., 2012), participants discussed the approach of enlisting multiple reviewers with expertise in different domains. Another approach, he suggested, would be for journals to allow publication of manuscripts that are more narrowly focused. He described a case example in which a paper in a high-profile journal was retracted due to concerns about a single image in a panel of dozens. He postulated that the image, related to an animal experiment, may have been added in response to peer review.
Conducting Reproducibility and Replicability Studies
Kolber said investigators are focused on discovery, and “the idea of replicating another finding is not interesting.” He noted that small replication studies to bring a new model into the laboratory are common and are not generally published, even if they fail. Keiser added that, currently, if a researcher finds a problem in a published paper, they might contact the author about the disagreement, publish a commentary piece, or engage in other types of public back-and-forth discussion, all of which takes a lot of effort and a long time. Perhaps there could be support for finding prob-
lems of irreproducibility, somewhat similar to the “bug bounties” used to identify security vulnerabilities in technology products and services, he said. He suggested that training grants could cover attempts by trainees to reproduce studies in their field of research and could even require it as a way to enhance training in rigorous research.