Public policy on research misconduct, which has developed contentiously in the United States and a few other countries over the past thirty years, remains largely untested as to whether it yields clearly specific outcomes; alternative policies that might reach those outcomes remain unexamined. Each widely publicized case of research misconduct creates a new scandal, leading to questions about whether current regulation is effective or just, and whether it supports the progress of science.
—Barbara Redman (2013)
Synopsis: Research misconduct and detrimental research practices are addressed in several ways. Addressing misconduct and detrimental research practices through the implementation of standards and best practices, such as effective mentoring at the lab level, requirements for data and code sharing at the disciplinary level, and implementation of greater transparency in reporting results, can strengthen the self-correcting nature of science. Efforts to prevent them through education are described in Chapter 10. In the United States, uncovering, establishing, and responding to misconduct in publicly funded research mainly takes place within the context of the federal research misconduct policy. The current policy framework assigns specific responsibilities to institutions and to sponsoring agencies. While the current framework has achieved stability and effectiveness in ensuring that misconduct allegations involving federally funded work are investigated, there are gaps and inconsistencies. Other countries have different policy frameworks for addressing misconduct, which has implications for the United States due to the growing number of international research collaborations. Addressing detrimental research practices may involve even greater challenges than does addressing misconduct.
Chapter 4 provides a broad overview of how the U.S. policy framework for addressing research misconduct has evolved and describes some basic elements of that framework, most notably the federal definition of research misconduct.
Relevant international developments and policies are also covered. To assess the strengths and weaknesses of current approaches, it is necessary to explore how the policy framework operates in practice.
Several decades ago, a fairly widespread viewpoint among scientists was that federal policies to deal with research misconduct were not necessary, since misconduct was extremely rare and the self-correcting nature of science would ensure that any misconduct would be quickly discovered (Gunsalus, 1997). One basis for this viewpoint was the social cohesion of research fields and subfields at a time when researchers were much likelier to know each other than is the case today. In an environment where personal relationships and reputations play an important role in professional success and advancement, senior researchers have strong incentives to be effective mentors, since successful students would enhance their reputations. Likewise, if one’s current or former student is caught plagiarizing or fabricating data, the mentor or supervisor’s reputation will suffer.
While subfield communities still play a very important role in research, and misconduct certainly causes supervisors and collaborators to suffer embarrassment, it is unrealistic to think that these social forces are a sufficient deterrent to actual misconduct. As described in Chapter 3, the conditions in which less formal approaches to fostering integrity or to uncovering and addressing misconduct might have been expected to effectively protect the research record and the health of the research enterprise, to the extent that they ever existed, are certainly long gone.
Education in the responsible conduct of research (RCR) is another mechanism for addressing research misconduct that has been widely advocated for the prevention of misconduct and detrimental practices. Chapter 10 features an extensive discussion of RCR education and what is known and not known about its effectiveness and benefits. The federal mandates related to RCR education reflect the logic that a significant percentage of research misconduct might be committed due to a lack of understanding, and that addressing this could prevent some research misconduct. Certainly one can appreciate funding agency frustration with cases in which an early career respondent might claim that he or she was never taught that behavior such as copying large blocks of text from other work is wrong, and where the institution counters that it did train the respondent not to plagiarize. RCR education mandates might at least prevent or significantly reduce claims of ignorance as to the basic values and practices of science. In addition, as discussed in Chapter 10, while most experts and practitioners believe that RCR education is necessary and worthwhile, perhaps particularly in discouraging detrimental research practices, the evidence of its effectiveness is limited. This issue is discussed in more detail in Chapter 10 and in Appendix C.
It is possible that RCR education has prevented some number of acts of research misconduct. However, the experience of the past several decades shows that it may be insufficient to rely on classroom or online education as the primary tool to address research misconduct.
As described in Responsible Science, before the mid-1980s, allegations of research misconduct were investigated and addressed by institutions. Institutions would employ a variety of procedures, sometimes confidential, with no requirement that the institution notify the sponsoring agency of the investigation or the results (NAS-NAE-IOM, 1992). Federal policies instituted since that time have required research institutions to report to the sponsoring agencies when initial inquiries yield enough evidence to justify a full investigation. According to the current federal research misconduct policy
Agencies and research institutions are partners who share responsibility for the research process. Federal agencies have ultimate oversight authority for Federally funded research, but research institutions bear primary responsibility for prevention and detection of research misconduct and for the inquiry, investigation, and adjudication of research misconduct alleged to have occurred in association with their own institution. (OSTP, 2000)
For misconduct allegations covering research sponsored by federal agencies such as the National Science Foundation (NSF) and the National Institutes of Health (NIH), institutions are responsible for notifying the cognizant offices—NSF’s Office of the Inspector General (NSF-OIG) for NSF-funded research, and the Department of Health and Human Services’ (HHS’s) Office of Research Integrity (ORI) for research funded by NIH and other Public Health Service (PHS) entities—when investigations are launched and when they conclude. Differences between NSF-OIG and ORI in how cases are addressed, as well as issues that arise for misconduct alleged in research sponsored by other agencies, will be explored in more detail below.
As described in Chapter 5, much remains unknown about the incidence of research misconduct and trends. For example, how many cases of misconduct go unreported and/or are not investigated is unknown. In addition, detailed information is lacking about the circumstances of many cases where misconduct has been found. For example, ORI posts summaries of the cases where “administrative actions were imposed due to findings of research misconduct” (ORI, 2015), and NSF-OIG provides a searchable database of closeout memos from all of its investigations, including research misconduct investigations (NSF-OIG, 2015). These case descriptions may not include information about how the misconduct was uncovered and other details that could be useful to institutions and other stakeholders seeking to improve approaches to preventing misconduct and to uncovering the misconduct that does occur. As a result, cases that have achieved enough notoriety to attract media reports tend to be the primary source of information about how research misconduct is uncovered and addressed.
Research misconduct cases regularly emerge in the current environment, including investigators who have fabricated data underlying tens or even hundreds of publications over the course of lengthy careers. Dealing with allegations and
correcting the research record are significant activities for institutions, sponsors, journals, and other stakeholders.
Uncovering Research Misconduct
Examining how allegations arise and are dealt with provides a useful window into the system for addressing misconduct, making it clear that science’s broad tendency toward self-correction or other mechanisms such as traditional prepublication peer review cannot be relied on as the primary mechanisms for uncovering misconduct. Such examination also provides insights on some of the system’s weaknesses and some possible clues as to how the system might be strengthened.
The discovery of research misconduct often depends on good-faith whistleblowers who observe the wrongdoing and come forward to report it. In a study by Stroebe et al. (2012), an examination of 40 “notorious” cases of research misconduct from 1974 through 2012, defined as those cases that were prominent enough to receive media attention and where the mode of discovery could be ascertained from media or other reports, found that about half were uncovered due to whistleblowers. Only a few were uncovered through a failure to replicate or in the process of peer review. Several of the cases included in Appendix D are included in the sample.
Another analysis examined cases of research misconduct as well as cases of other misconduct—such as failure to follow rules governing human subjects or laboratory animal protection—and medical practice misconduct (e.g., undertaking unnecessary procedures) (DuBois et al., 2013b). Of the research misconduct cases analyzed, 28 percent involved a “failed attempt at reporting research misconduct (i.e., the wrongdoing continued for some time following an initial report).” A large percentage of research misconduct whistleblowers worked within the wrongdoer’s institution, including 23 percent who were subordinates.
While the actions of whistleblowers play a central role in uncovering research misconduct, particularly in cases of data fabrication, they are not the only way that misconduct is discovered and reported. For example, several technological methods of detecting misconduct have emerged over the past decade. A well-known example is software that detects and flags text overlap that can then be checked for possible plagiarism. It is important to use this tool in a sophisticated way, since standard equations and citations may set off the software. According to one report, scientific publishers that began screening submissions on a trial basis in 2010 found from 6 to 23 percent of articles had to be rejected due to unacceptable levels of text duplicated from other articles, depending on the journal (Butler, 2010). Software has also been developed to detect the inappropriate manipulation of image data, a form of data falsification that has emerged regularly in the life sciences (Rossner, 2006).
In addition, Uri Simonsohn of the University of Pennsylvania has developed a methodology that enables the detection of fabricated or falsified data
through the analysis of datasets (Shea, 2012; Simonsohn, 2013). The methodology uses statistical analysis to determine the probability that a given large dataset was generated by an experiment as opposed to being fabricated. Using this methodology, Simonsohn uncovered data fabrication by University of Michigan psychology professor Lawrence Sanna and by Erasmus University (Netherlands) psychology professor Dirk Smeesters. Methods for analyzing clinical data to detect fabrication predated Simonsohn’s efforts, having appeared in the late 1990s (Buyse et al., 1999; Lock et al., 2001). Statcheck is a new tool, developed at Tilburg University in The Netherlands, that can check the reported statistical results in articles for consistency (Epskamp and Nuijten, 2016).
Technological approaches to detecting research misconduct are not foolproof. For example, inventive fabricators might be able to devise ways of defeating statistical analysis of datasets. Still, the existence and use of these tools is encouraging. Improving on them and building new approaches to detecting misconduct will rely heavily on improved transparency throughout the research process, particularly in the availability of data and code. For example, the effectiveness of Simonsohn’s method depends on access to data, and on the fact that fabricated data will differ from data generated by an experiment in discernible ways. The importance of increasing transparency is a key theme of this chapter and underlies several of the committee’s recommendations intended to prevent or reduce misconduct and detrimental research practices (DRPs) and to more effectively detect the misconduct that does occur.
As discussed above and in Chapter 5, the failure to replicate work has historically not been a primary means for uncovering misconduct, for several reasons. First, while there are incentives for extending and building on previous work, the incentives to replicate work that has already been reported are weak. In addition, the standards of some fields for sharing data and methods may not currently be robust enough to ensure that all the information necessary to replicate or validate a study is provided. In the biosciences, in particular, it may be difficult to account for a variety of nuances that may be important in result replication.
However, in recent years there have been several cases where doubts or suspicions about groundbreaking or otherwise newsworthy results have appeared almost immediately, leading to the findings either quickly falling apart or prompting more thorough investigations. For example, two papers by Haruko Obokata of Japan’s RIKEN research institute and an international group of coauthors on reprogramming mature stem cells into embryonic stem cells by using an acid bath were published by Nature in 2014 (Obokata, 2014a,b). Within a few weeks, outside researchers who were unsuccessful in replicating or extending the work were questioning the results (Cyranoski, 2014b). A RIKEN investigation found that Obokata had intentionally falsified data (Ishii et al., 2014) (see Appendix D). In a second example, a paper published in Science in late 2014 purported to show that canvassers could be highly successful in changing the minds of voters opposed to same-sex marriage, in many cases with a single conversation (LaCour
and Green, 2014). The paper was subsequently retracted after replication efforts failed and one of the authors, a graduate student in political science at the University of California, Los Angeles, admitted to destroying the raw data, leading to an investigation (McNutt, 2015).
One analysis of retractions indicates that the frequency of retractions is positively associated with a journal’s impact factor—that is, that more prestigious journals have to retract articles at a higher rate (Fang et al., 2012). This may be partly due to the perceived risk-reward balance for potential fabricators (the higher rewards that come from publishing in a high-prestige journal lead to stronger incentives to cheat). In addition, articles in high-prestige journals will generally receive greater attention and scrutiny, implying that misconduct is more likely to be discovered. Perhaps, in some cases, pressure to expedite publication means that corners are cut in the review and publication processes. These journals may also be more sensitive to the need for timely retractions and have greater resources to investigate issues. The cases from recent years are somewhat encouraging in illustrating that self-correction in science can work where the community has sufficient information and where outside researchers have strong incentives to replicate and extend the work.
The extent of research misconduct that is never uncovered, reported, or investigated is unknown by definition. Chapter 5 discusses the existing evidence on the incidence of research misconduct. Surveys of researchers on their own behavior and on the behavior of colleagues whom they have observed or heard about generate much higher estimates of the incidence of research misconduct than is reflected in the findings of research misconduct investigations reported by NSF-OIG and ORI. Over time, surveys have become more sophisticated in addressing issues that would tend to inflate the reported incidence of misconduct, such as possible multiple counting of the same incident by different respondents. According to one assessment, the majority of misconduct cases are not reported (Titus et al., 2008). The number or percentage of research misconduct cases that are not investigated cannot be pinpointed; nevertheless, it is important to try to understand as much about these cases as possible.
In addition to not knowing the true incidence of research misconduct, the circumstances and outcomes of research misconduct cases that may be reported or detected but may not be officially investigated remain largely unknown. Yet some useful information does exist. For example, there are anecdotal accounts by journal editors of what they have done when they, their peer reviewers, or outside whistleblowers have raised concerns and suspicions about submitted work (White, 2005). These accounts illustrate what can happen at different points in the process to forestall an investigation, other than the journal receiving a clarification or additional information that allays the suspicion. They also illustrate
some of the reasons why the journal peer review and editorial processes are not as effective in uncovering misconduct as might be expected or hoped for.
For example, the guidelines of the Committee on Publication Ethics specify that journal editors should “inform institutions if they suspect misconduct by their researchers, and provide evidence to support these concerns” (Wager and Kleinert, 2012). While journal editors are not equipped to actually perform investigations themselves (Wager, 2015a), journals are advised to go to the authors first and to contact the institution if the response is inadequate. In cases where suspicions have been raised, editors may not believe that they have sufficient evidence to go to the institution, and this determination may depend on the experience and attitudes of the editor. One editor reported having been hesitant to raise suspicions with institutions early in his career, unless he had compelling evidence of misconduct, but had become less hesitant over time (Smith, 2006).
In cases where the journal editor goes to the institution but the institution does not reply, the editor would have no way of knowing whether the institution had undertaken a preliminary inquiry, proceeded to a full investigation, or not taken any action. In these cases, journal editors are advised to be persistent and to contact the funder or national research integrity office if the institutional response is inadequate. In some countries, institutions do not have a formal responsibility to investigate research misconduct or report it to sponsors, and there may be no national research integrity organization. The journal may have little or no ability in these cases to put pressure on institutions to respond, or even to prevent the author from submitting the article to another journal with less rigorous editorial practices after rejecting it. Journals might notify institutions that do not respond to credible concerns and allegations that future submissions from researchers affiliated with the institution would not be considered for publication until the issue was addressed. Obviously, more prestigious journals are likely to find greater success with this approach than less prestigious ones. Success may also depend on the institutional official who has been notified. Journals might also refrain from such an approach because it punishes the innocent along with the guilty.
Journals also have the responsibility to respond to institutional requests to retract fabricated or falsified work, and they sometimes fail in this responsibility. Retractions and related issues are discussed in Chapter 5 (Wager, 2015b).
A 2015 report examined 57 published clinical trials undertaken over the period 1998–2013 in which Food and Drug Administration (FDA) inspections of clinical trial sites had found significant evidence of one or more problems, such as protocol violations, inadequate record keeping, and failure to protect patient safety (Seife, 2015; Steinbrook and Redberg, 2015). Significant evidence for falsification or submission of false information was found in 39 percent of the trials. However, of the 78 publications resulting from the 57 clinical trials, only 3 mentioned the problems that had been discovered in the FDA inspections. It is not clear from the inspection documents or the publications how the problems were communicated to institutions and whether inquiries or investigations of research
misconduct or violations of human subjects protection regulations were ever performed. The cases where evidence of falsification was found potentially represent examples of research misconduct being uncovered “in the act,” so to speak, prior to publication, and not being investigated, with the results published as if nothing had happened. In 2012 the FDA published new regulations strengthening its ability to disqualify clinical investigators who falsify data or commit other violations (HHS, 2012). The existence of these regulations does not necessarily ensure that the findings of FDA investigations that appear to justify research misconduct inquiries or investigations on the part of institutions are followed up. To be sure, the legal authorities and implementing regulations that govern how FDA exercises its responsibilities have evolved separately from the federal research misconduct policy and related regulations, so it is not surprising that they might be out of synch in some areas. As indicated in this discussion, not much is known about how these policy frameworks interact in practice and what sorts of changes or adjustments might be needed.
Blogs, Websites, and Community Postpublication Review
Over the past few years, several blogs and websites have emerged that focus on research misconduct and related issues. The best known of these efforts is the blog Retraction Watch, which was launched by two science journalists in 2010 (Oransky and Marcus, 2010). The blog’s authors expressed several goals in starting it, such as gaining a better understanding of the scientific process, serving as an informal repository and notification site for retractions, providing information to journalists seeking to uncover research misconduct, and evaluating the performance of journals. Retraction Watch has gained a wide readership within the research enterprise.
The effectiveness and impact of Retraction Watch have not been formally evaluated, but it is plausible to argue that the blog has advanced the goals that the authors set out for it. For example, the specific mechanisms by which retractions are communicated and retraction notices are maintained by journals are not standardized. Since retractions have not traditionally been widely publicized, prior to the emergence of Retraction Watch it was possible that an individual retraction might not be noticed immediately by other researchers in the field or by other journals that have published work by the authors in question. This could delay examination of other work by those authors and correction of the literature. In those cases where authors have fabricated or falsified data in multiple papers, having a report of a retraction appear in Retraction Watch can accelerate this process of examining the researchers’ broader body of work. It might be possible to look at cases that emerged before and after the advent of Retraction Watch in order to establish or quantify possible effects. Other issues related to retractions are discussed further below and in Chapter 5.
Besides drawing greater attention to issues and barriers in publication prac-
tices that may delay retractions or prevent a clear explanation of the cause, Retraction Watch has highlighted some of the information deficits around research misconduct and detrimental research practices, such as a lack of data on some issues that cover all disciplines. For example, much of the recent literature that examines retractions relies on searches of retraction notices in PubMed, which focuses on the biomedical literature and does not comprehensively cover the physical sciences (e.g., Fang et al., 2012).
Several web-based initiatives have aimed at facilitating the discussion of suspicious publications and uncovering research misconduct. For example, the website Science-Fraud.org was operated by Paul Brookes, a medical professor at the University of Rochester, during 2012 (Couzin-Frankel, 2013). The site provided a forum for reporting and discussing suspicious images in published work. Brookes and the contributors to the site operated anonymously, and Brookes claims that information provided on the site led to 16 retractions and 47 corrections (Pain, 2014). However, Brookes shut the site down in early 2013 after his identity was revealed in an e-mail sent to his university and many of the researchers whose work was questioned on Science-Fraud.org. The strident tone of the website, which went beyond raising questions about published work to accusing researchers of misconduct, opened Brookes to threats of legal action.
Another example is the website PubPeer, which provides a forum for commenters to critique published work and is moderated anonymously. Content made available on PubPeer has also led to corrections and retractions. In 2014, Fazlul Sarkar, a cancer researcher at Wayne State University, sued several PubPeer commenters on his papers, claiming that their posts constituted defamation and caused him to lose a job offer. Sarkar sought identifying information on the commenters from PubPeer via subpoena (Servick, 2015). It was later revealed that a tipster who had raised concerns and issues regarding numerous journal articles with editors over the years and goes by the pseudonym “Clare Francis” was one of the PubPeer commenters on Sarkar’s work (Oransky, 2015). While Francis’s communications have sometimes led to retractions or corrections, journal editors have also asserted that some of the tips did not actually uncover mistakes or wrongdoing and that investigating them wasted time (Grens, 2013b).
The phenomenon of websites such as PubPeer and whistleblowers such as Clare Francis raise questions about the role of anonymous whistleblowers and about how the community, and journals in particular, should treat such accusations and concerns. The topic of knowingly false allegations is discussed below. Journal editors need to exercise judgment in evaluating the credibility of expressions of concern and accusations they receive, and anonymity deprives them of important information in making an evaluation. However, the desire for anonymity on the part of whistleblowers is also understandable, particularly in cases where exposure of their identity could open them to possible retaliation. How can science best encourage experts to develop and share information that may reveal research misconduct without also encouraging the spread of meritless
accusations and personal attacks? Can journals and agencies do more to provide tools and information that speed the correction of the scientific record? One interesting experiment is PubMed Commons, a forum for postpublication peer review where commenters have to reveal their identities.1
A recent analysis of the role of social media and other nontraditional communications in several recent episodes in chemistry provides an optimistic view of the potential for these methods and tools to strengthen the self-correcting tendencies of science:
The existence and vigorous participation of these forums in analyzing, challenging, and enhancing dialogue about the chemical literature and the human elements in research raise interesting questions with which the chemical community will have to grapple for the foreseeable future. Given the nature of transformational change over generations, it is also reasonable to predict that the younger generation which has grown up in the milieu of the breakthrough technology of the Internet will adapt and respond much more quickly to the changing norms of research and review discussed above. (Jogalekar, 2015)
Investigating Misconduct and Taking Corrective Action
As discussed above, the U.S. federal research misconduct policy and its implementation in agency regulations place the primary responsibility for investigating research misconduct allegations on research institutions (HHS, 2005; NSF, 2002; OSTP, 2000). For extramural research funded by NSF and NIH, institutions are generally responsible for undertaking an initial inquiry into allegations to determine if a full investigation is warranted, to notify the agencies when such investigations are initiated, and to provide the agencies with the investigation reports, findings, and recommended actions when they are concluded for review. The U.S. federal policy specifies that a “preponderance of the evidence” standard be used to determine whether research misconduct has occurred, meaning it is necessary for 51 percent of the evidence to point toward misconduct in order to support a finding. The agencies evaluate the investigation reports, decide whether additional information is needed or not, and—in cases where they find that research misconduct has occurred—determine the remedies to be imposed.
NSF-OIG and ORI have several differences in how policies related to inquiries and investigations are implemented through their respective regulations. For example, NSF-OIG can perform inquiries and investigations itself when it chooses to or when an institution requests that it do so, since its authority comes from the Inspector General Act of 1978 (P.L. 95-452, 5 U.S.C. App.). ORI was created by the NIH Revitalization Act of 1993 (P.L. 103-43). ORI does not have the authority to perform its own investigations, although its staff assists institutions in their investigations and reviews the resulting reports. ORI may recom-
mend that HHS undertake its own investigation. HHS requires institutions that receive PHS funding to keep an assurance on file with ORI specifying that they have policies and procedures in place that comply with HHS regulations, and that they follow their own policies, or to file a Small Organization Statement if they lack the necessary resources to provide an assurance. Institutions also need to file an annual report to ORI to keep their assurances active. During 2011, 6,714 assurances were on file with ORI, including 425 from foreign institutions (ORI, 2012). NSF-OIG has no requirement similar to ORI’s assurance program. There are also differences in the processes utilized for appealing research misconduct findings, with HHS specifying a more formal appeals framework than NSF. Details of NSF and HHS policies are contained in their implementing regulations, cited above, and on their websites.
Interactions between NSF-OIG, ORI, and institutions related to investigations go beyond formal oversight and reporting requirements. Both offices regularly send speakers to conferences and events to share information about their programs. In addition, ORI undertakes programs to train institutional research integrity officers (RIOs) and maintains a Rapid Response for Technical Assistance program to help institutions with advice, referrals, and assistance with forensic tools related to investigations. NSF-OIG can also provide advice and, as mentioned above, has the authority to undertake investigations itself.
As discussed in Chapter 5, the number of research misconduct inquiries and investigations has increased in recent years. For example, ORI received 423 allegations of research misconduct in 2012, far above the average of 198 received over the years 1992–2007 (ORI, 2012). A more recent annual total of 342 allegations for 2013 may indicate that the number of allegations is leveling off or even declining somewhat (ORI, 2014). For NSF-OIG, the number of allegations investigated grew from 45 in 2004 to 75 in 2014, and the number of research misconduct findings by NSF grew from 2 in 2004 to 20 in 2014 (NSF-OIG, 2015).
Information about the operation and performance of the inquiry and investigation systems overseen by NSF-OIG and ORI is available from several sources, such as the semiannual reports of NSF-OIG and the annual reports of ORI. NSF-OIG has made available a searchable database of case closeout memoranda, including memoranda from research misconduct cases and other types of cases that NSF-OIG investigates, such as financial fraud related to grants (NSF-OIG, 2015). ORI puts the summaries of completed cases that have resulted in findings of research misconduct on its webpage (ORI, 2015). Media reports of specific, notable cases are another source of information, but issues related to investigations may be covered and actual investigation reports may be released only when something has gone wrong with an institutional response. Only a limited amount of research has been done on institutional policies and capabilities.
In addition to these sources of information, this study benefited from briefings by agency officials (see Appendix B) and from responses to follow-up requests for information and clarification about specific issues. Undertaking a
comprehensive assessment of institutional and agency capabilities and performance related to research misconduct investigations would require a focused effort and access to a significant amount of information that is not currently available outside the institutions and agencies themselves. Nevertheless, it is possible to identify several issues where there is sufficient information to develop findings and recommend improved approaches, or at least to raise questions for future study and analysis.
Different Approaches to Plagiarism
As noted in Chapter 4, the U.S. federal research misconduct policy defines plagiarism as “the appropriation of another person’s ideas, processes, results, or words without giving appropriate credit” (OSTP, 2000). Differences between NSF-OIG and ORI in their approaches to plagiarism raise questions of whether the unified federal definition of misconduct is really “unified” and whether harmonization in the two approaches to implementation should be sought.
While both agencies state that they exclude “authorship disputes” as possible cases of misconduct, it appears that they draw the boundary between plagiarism and authorship disputes in different places. For example, ORI explains its policy as follows:
Many allegations of plagiarism involve disputes among former collaborators who participated jointly in the development or conduct of a research project, but who subsequently went their separate ways and made independent use of the jointly developed concepts, methods, descriptive language, or other product of the joint effort. The ownership of the intellectual property in many such situations is seldom clear, and the collaborative history among the scientists often supports a presumption of implied consent to use the products of the collaboration by any of the former collaborators.
For this reason, ORI considers many such disputes to be authorship or credit disputes rather than plagiarism. Such disputes are referred to PHS agencies and extramural institutions for resolution. (ORI, 1994)
The treatment of a case of apparent plagiarism from several years ago involving PHS-funded research raises questions about the implications of these differences in implementation. In 2011, postdoctoral fellow Heather Kling and her professor Karen Norris accused two other researchers at the University of Pittsburgh, Jay Kolls and Mingquan Zheng, of claiming credit for work that Kolls became aware of while serving on Kling’s dissertation committee (Roth and Schackner, 2013). Kolls and Zheng applied for two federal grants and attempted to patent Kling and Norris’s finding of a “vaccine against a lung disease known as pneumocystis,” representing it as their own work (Roth, 2014). Arthur Levine, dean of the University of Pittsburgh School of Medicine, found Kolls and Zheng guilty of research misconduct; however, a faculty committee reduced
the finding to research impropriety, stating that it was “difficult to determine who first developed the idea” (Roth, 2014). Norris and Kling’s lawsuit against Kolls and Zheng argued “that all the key lab work on the potential vaccine was carried out in the Norris lab,” but that Koll’s position as the head of a well-funded children’s hospital may have played a role in the decision (Roth, 2014). Norris and Kling were added to the pending patent application, while Kolls and Zheng were subsequently removed (Roth, 2015). It is not clear whether or how the case was reported to ORI. Later communication between the university’s research integrity officer and Norris indicated that ORI had “not taken an interest in the past in disagreements between investigators at the same institution” (Rosenberg, 2011). The university’s Tenure and Academic Freedom Committee raised a number of concerns with how the allegations were handled (TAFC, 2013).
NSF-OIG appears to be more open to considering allegations of plagiarism against former collaborators than ORI is, including allegations involving collaborators with significant power differentials such as senior investigators and graduate students or postdoctoral fellows (see, e.g., NSF-OIG, 2013). In reviewing the closeout memos from NSF-OIG investigations dealing with allegations of “intellectual theft,” it is clearly more difficult to establish that plagiarism has occurred in these cases than in plagiarism cases involving simple copying of text.
These apparent differences in policy implementation contribute to a different mix of case types handled by the agencies. NSF-OIG’s largest category is plagiarism, while ORI’s is data fabrication and falsification (Resnik, 2013). Contributing to this disparity is the fact that ORI handles significantly more fabrication and falsification allegations than NSF-OIG does. Changing ORI’s approach to match that of NSF-OIG might have implications for the total number of cases handled by ORI. A more focused assessment of the two approaches, as well as those of other agencies, with access to more information than what was available to this committee would be needed to determine what specific changes are needed.
It is important to recognize the potential damage of maintaining a perception that a researcher can perhaps use the work of a student or another researcher at the same institution without permission or credit with near impunity while performing NIH-supported work—as apparently happened in the Kolls case described above—but would be investigated for misconduct if he or she took the same actions on NSF-funded work. Such inconsistency could contribute to a sense that norms and practices are not firm and clear. Again, a great deal of information is lacking, but the implications of the case are not encouraging.
Institutional Capabilities and Performance
Since research institutions bear the primary responsibility for investigating research misconduct in the current U.S. system, their effectiveness in fulfilling this responsibility plays a significant part in determining how well the process of uncovering and investigating misconduct works overall. Effectively undertak-
ing inquiries and investigations includes a number of important elements, such as collecting and sequestering hard drives and other physical evidence, gaining necessary information from interviews with complainants, respondents, and others; observing confidentiality and due process protections for respondents; and ensuring that whistleblowers are not retaliated against. The Ryan Commission (discussed in Chapter 4) recommended that institutions have processes that are “accessible from multiple entry points,” “overseen by individuals or by committees whose members are free from bias and conflict of interest,” “based on independent investigation,” “overseen by bodies that are separated in their investigatory and adjudicatory functions,” “balanced in advocacy,” “capable of preventing retaliation against participants,” and “open” to the extent possible (Commission on Research Integrity, 1995). As noted above, the available information about institutional capabilities and performance is fairly limited. Still, some themes and lessons emerge from the information that is available.
Unevenness in institutional policies and capacity to investigate and address research misconduct allegations is an important challenge examined by the committee. As discussed in Chapter 4, institutions use a variety of definitions of research misconduct for internal purposes, even as they use the federal definition for the purpose of reporting misconduct to federal sponsors (Resnik et al., 2015). Differences in policies have been documented in other areas. For example, a 2000 survey of 156 institutions whose policies had been approved by ORI found that many institutional policies did not explicitly require researchers who encountered research misconduct to report it (CHPS Consulting, 2000).2 A 2010 survey of medical schools and medical school researchers, a somewhat different target population from the 2000 survey, found that about one-third of the institutional policies do not explicitly require reporting of misconduct (Bonito et al., 2010). Many of the medical school policies do not contain clear guidance on the information that should be included in a research misconduct allegation. Most are clear about the particular institutional official or position that should receive the allegation. Almost all the medical school policies also have provisions for avoiding conflicts of interest, and most address the need to protect respondent and complainant rights.
A survey of medical researchers undertaken as part of the study of institutional policies showed that a majority are at least somewhat familiar with institutional and federal policies toward misconduct (Bonito et al., 2010). However, most made at least one error in going through a list of behaviors and identifying which ones constituted misconduct and which ones did not. Current information on the institutional policies for the full range of U.S. research institutions
2 This is not to imply that the committee believes that this requirement should be in institutional policies. It was an issue of interest to ORI, and the responses illustrate institutional differences. Whether concerns are actually reported or not may have more to do with whether multiple entry points and other systems are in place to encourage reporting rather than a requirement of the institution’s policy.
and how well those policies are understood by administrators, faculty, students, postdoctoral fellows, and others who could be affected would be helpful input to those working to assess and improve institutional performance.
Another salient aspect of an institution’s capacity to investigate and address research misconduct allegations is the experience and ability of the institutional officials—including administrators as well as faculty members serving on investigation committees—responsible for implementing the institution’s policies. Faculty investigation committees play a crucial role in overseeing investigations. At the same time, having competent and knowledgeable administrators is necessary to ensure that the committee has the necessary expertise and that other aspects of the investigation, such as evidence sequestration and documentation of interviews, are performed correctly. For some institutional officials tasked with these responsibilities but whose backgrounds and experience are primarily in research, this can pose a challenge. They may not have deep expertise in handling the complex administrative issues that can be encountered in research misconduct investigations (Gunsalus, 1998b).
A briefing by an ORI official during this study described the various ways that investigations can go wrong and provided anonymized examples (Garfinkel, 2012). For example, if relevant institutional personnel are not adequately trained regarding proper sequestration procedures for notebooks and data, sequestration of evidence may be inadequate or untimely. Institutional officials experience turnover and, given the low incidence of reported cases of misconduct, are rarely experienced at conducting the complex reviews required to resolve allegations of research misconduct. They generally carry myriad other duties, and research misconduct investigations can be very time-consuming and costly (Michalek et al., 2010). Institutional standing committees might not have the appropriate expertise to evaluate allegations in certain fields, leading to poor analysis and mistaken findings. Interviews may be poorly conducted or not be annotated. Investigation reports might fail to include sufficient evidence or rationale for findings. New allegations uncovered during the course of the investigation might not be followed up properly.
ORI also sponsored several surveys of research integrity officers aimed at learning more about the knowledge and preparedness of these institutional officials. In the first survey, the results of which were reported in 2009, RIOs were asked to complete an online survey recording the actions they would take in response to three scenarios that involved, respectively, sequestering evidence, protecting a researcher who had made allegations, and coordinating their own actions with those of the institutional review board (IRB) (Bonito et al., 2009). The responses were compared with model responses developed with two expert consultants. Several results of this survey were disquieting. For example, 97 percent of the respondents to the online survey identified fewer than half of the potentially appropriate actions for the three scenarios that had been given by the expert consultants. This indicates that a potentially significant proportion of RIOs
were not adequately prepared to fulfill their responsibilities. In addition, length of tenure in their positions was not positively correlated with greater knowledge, meaning that RIOs were not becoming more knowledgeable over time, on average. Having experienced specialized training, such as ORI’s RIO “boot camp” seminars, was associated with greater knowledge. Another result of the survey was that less than one-fifth of respondents were formally designated as an RIO or compliance officer by their institutions. Many of those who were performing the functions of a RIO had other responsibilities in areas such as grants management.
A second survey of RIOs commissioned by ORI focused on how the RIOs interacted with those making allegations of misconduct (Greene et al., 2011). ORI originally wanted to survey those who had made allegations but found that it is impossible to identify, locate, and survey the complainants of closed cases due to the current interpretation of regulations protecting confidentiality. The survey involved interviews with 102 RIOs. They were asked whether they discussed four key topics with complainants: “the resolution process, anonymity and confidentiality, institutional responsibilities, and potential adverse consequences” of coming forward with an allegation. Less than half reported discussing all four topics with those considering making an allegation. The report pointed out that it would be helpful for RIOs to discuss this issue with potential complainants because many of those who come forward with research misconduct allegations have reported experiencing retaliation or other adverse consequences (Lubalin et al., 1995). One recommendation made in the 2011 report of the survey is that ORI encourage RIOs to use a prepared script or other memory aid to ensure that all the topics are covered.
Another problem that arises in some research misconduct cases and their handling by institutions is delay in reaching the inquiry and investigation stages. As noted earlier in this chapter, a 2013 analysis of 120 well-known cases of research misconduct found that there was a failed attempt to report misconduct in 28 percent of the cases (DuBois et al., 2013b). Publicly available details of several notable cases of research misconduct provide additional insights. For example, in the translational omics case at Duke University, the system failed at several points (see Appendix D). At the laboratory level, Joseph Nevins did not thoroughly check the data reported by Anil Potti until Potti’s misrepresentations in his resume were uncovered and publicized, even though the data had been questioned by experts over the course of several years (CBS News, 2012). At the institutional level, the department did not perform a thorough audit of the data after a graduate student raised concerns and asked that his name be taken off articles that would be submitted based on the work. This graduate student was assigned to another lab. When Duke ultimately launched an investigation sometime later, the external investigation committee was not given all the relevant information, a circumstance that was probably at least partly responsible for the committee recommending that clinical trials based on the work should continue.
Two factors present in the Duke case are sometimes seen in other cases
where the launch of an inquiry or investigation is delayed. First, at the outset, the concerns and evidence of internal and outside researchers were not brought forward as formal allegations of misconduct but, rather, as concerns and questions about possible errors. Second, the researcher whose work was being questioned was closely associated with a high-prestige researcher. As is seen in other contexts such as financial or political misconduct, officials may have biases that filter how they hear concerns or lead to reluctance to make or aggressively pursue allegations of wrongdoing against powerful people in their own organization or against people closely associated with them. The researchers raising the concerns or questions may hesitate to move forward to a formal allegation, and the absence of an allegation may override for a time the suspicions of institutional officials based on an impartial assessment of the evidence. The path of least resistance might be to continue to delay action.
Additional information about how institutions address research integrity issues more broadly has emerged via administration of the Survey of Organizational Research Climate (SOURCE), reported as part of the Project on Scholarly Integrity (PSI) undertaken by the Council of Graduate Schools and in other contexts (CGS, 2012). A Research Integrity Inventory Survey was also administered as part of PSI.3 SOURCE was developed partly in response to the recommendations contained in the report Integrity in Scientific Research (IOM-NRC, 2002; Thrush et al., 2007). SOURCE’s questions focus on institutional resources to foster responsible conduct, policies and regulations, subunit (i.e., departmental) norms, advisor-advisee relations, and integrity inhibitors and expectations (Martinson et al., 2013). The final report of the PSI gave aggregated results of the six institutions that administered SOURCE as part of the project, and a subsequent article authored by several of the participants in the PSI has reported more detailed results for a subset of three of the participating institutions (Wells et al., 2014). SOURCE indices have also been shown to correlate with a broad range of research-related behaviors (Crain et al., 2013). SOURCE is available to institutions that wish to utilize it, and institutions can also contract with Ethics CORE (nationalethicscenter.org/sorc) at the University of Illinois to administer it and compile the data.
The coverage of institutional investigations also affects the ability of journals to correct the research record and of sponsors to take corrective action. Do institutions have an obligation to investigate a researcher’s work beyond the specific publications or proposals that are subject to allegations of research misconduct? In an international example, the three Dutch universities where Diederik Stapel was educated and employed came together and investigated all the work that he produced in his career, from his PhD dissertation onward (Levelt et al., 2012). The resulting report, which was published in its entirety and translated into
3 Additional information on these resources is available at www.scholarlyintegrity.org/ShowContent.aspx?id=400#.
English, is a significant contribution to social psychology and to the broader understanding of research misconduct. In other cases, institutions may stick to investigating the work that is subject to allegations due to resource constraints, barriers to involving other institutions, or other reasons.
One important conclusion from this discussion of institutional capabilities and performance is that significant gaps exist in the information available to institutions as well as to the rest of the research enterprise about how allegations are handled, what challenges arise, and how successful institutions are able to ensure effective performance. Several items in the institutional best practices checklist discussed in Chapter 9 are aimed at filling in this information deficit at the institutional level. The occasional surveys supported by ORI have shed light on important aspects of institutional responses, but additional research to assess aggregate trends and needs could yield valuable insights that would enable the entire system of investigating research misconduct in the United States to operate more effectively.
Taking Corrective Actions
Corrective actions may be taken in response to research misconduct findings, and these take several forms. The employing institution should notify journals that have published articles based on fabricated or falsified data so that they can be retracted. The institution will determine whether to send a letter of reprimand to the guilty researcher, suspend him or her, or terminate employment. Federal actions can be taken if the research in question was performed with federal government support. An agency can suspend or terminate the award, institute requirements that the researcher’s actions be supervised, or debar the researcher from receiving support or from participating in agency review or advisory activities in perpetuity or for a period of time (OSTP, 2000).
Both NSF-OIG and ORI processes have avenues for appeal available, and when these are exhausted, accused researchers can go to court. Several examples reported in the press in recent years involved researchers who appealed research misconduct findings and had some measure of success in having penalties overturned or reduced (Cossins, 2013; Kuta, 2014).
As noted above, agencies differ in their implementation of federal policy. ORI’s policies and procedures in handling investigations generally involve more formal requirements than do those of NSF-OIG. For example, ORI publishes the names of all researchers found guilty of research misconduct on its website. NSF-OIG only allows the public to ascertain whether researchers have been debarred or suspended—about 25 percent of NSF’s cases—and the names are not published. The names of those who have been debarred or suspended are entered into the System for Award Management, a public database (www.sam.gov/). Although the database is searchable, one needs to enter a name to perform a search, so it is a useful tool for universities that might be hiring a faculty member or an
agency checking on a grant applicant to determine if he or she has been debarred or suspended. However, the database will not generate a list of names of researchers who have been found guilty of misconduct, and the entries do not mention the reason for debarment or suspension. Researchers can be debarred for misappropriating award funds and for other causes in addition to research misconduct.
One consequence of this difference in implementation between agencies is that a researcher who is found guilty of misconduct when performing NSF-funded research is unlikely to suffer from public disclosure during that researcher’s subsequent professional life unless the case was reported in the media, while an NIH-funded offender will certainly be exposed. Since the disclosure itself represents a significant consequence—perhaps the most significant consequence—this difference in policy implementation between NSF-OIG and ORI in fact constitutes a clear disparity in the severity of corrective actions (see discussion of the relatively new Department of Veterans Affairs policy below). Research exploring the long-term consequences of being found guilty of misconduct and having that judgment publicly disclosed found that while many offenders left research, 43 percent of those who had been in academia and could be traced still had academic research jobs some years later (Redman and Merz, 2008). Some efforts have been made to develop educational programs aimed at rehabilitating researchers who have been accused of misconduct (Cressey, 2013; DuBois et al., 2016).
The continued existence of this disparity is problematic both for individual researchers, who are held to different standards of accountability based on their sources of funding, and for the research institutions employing the researchers, which must implement policies for such a heterogeneous aggregate of researchers. A federal effort to bring about greater consistency between agencies in the implementation of the research misconduct policy could address this issue, as could other initiatives such as reconciling differences in the handling of plagiarism allegations. However, it is not obvious how implementation should be made consistent. The appropriate approach might depend on what one sees as the ultimate goals of corrective action. Both approaches—public reporting and maintaining anonymity—have positive and negative aspects (Parrish, 2005). Should those found guilty of research misconduct have their research careers ended, or are there some cases where errant researchers can be rehabilitated? Should younger researchers be treated differently from those with more experience? What are the risks to future research and the potential damage to future collaborators in cases where the identity of those found guilty is not disclosed? Is it possible to craft an approach where those found guilty of the most egregious offenses are exposed, while those whose misconduct is less consequential, particularly younger researchers, are not? Policy makers and members of the research enterprise differ on these questions. One provision of the America COMPETES Reauthorization Act passed by the House of Representatives in 2015 would have required NSF-OIG to make the names of “principal investigators” public in cases of misconduct, effectively harmonizing implementation around ORI’s current
practices. This provision was not included in the version of the bill that was ultimately passed by both houses of Congress and signed by President Obama in 2016 (American Innovation and Competitiveness Act of 2016).
A complicating factor in efforts to harmonize the approaches of federal agencies is that institutions have different policies and approaches to identifying employees who commit various types of offenses, including research misconduct. For example, some institutions do not normally publicize the fact that researchers have been found guilty of misconduct, and may make investigation reports publicly available only in response to Freedom of Information Act requests. The University of Kansas has taken a different approach, occasionally publishing “public censure” items in its employee newsletter in response to cases of research misconduct and other prohibited behavior (University of Kansas, 2013).
In addition to the administrative actions that can be taken by institutions and research funding agencies, researchers who commit misconduct can face criminal prosecution under certain circumstances. For example, in 2015, Iowa State University researcher Dong-Pyou Han was prosecuted and convicted for fabricating and falsifying data in HIV vaccine trials (Reardon, 2015). The prosecution occurred after the institution had completed its investigation and ORI had issued its findings and administrative actions, and after Iowa Senator Charles Grassley called attention to the case. Han received a sentence of 57 months in prison and $7.2 million in fines. Over the years, several other notable cases of misconduct have led to prosecutions, including that of anesthesiologist Scott Reuben (discussed above), although these cases are unusual (Bornemann-Cimenti et al., 2015). Decisions on whether to prosecute such cases will depend on the likelihood of success and how possible misconduct cases rank versus other potential uses of available prosecutorial resources. It is important to note that the standard of proof in criminal prosecutions—proof “beyond a reasonable doubt”—is significantly higher than the “preponderance of the evidence” standard that prevails in institutional research misconduct investigations and federal oversight of these investigations. In recent years, there have been calls for research misconduct to be treated as a crime more frequently than it has been up to now (Smith, 2013).
Finally, researchers who commit misconduct may face civil liability, and institutions may face civil penalties if they are negligent in their oversight or responses. One avenue for pursuing such penalties is the False Claims Act, which allows the federal government to recover damages and penalties from those who make false claims on the government (Kalb and Koehler, 2002).
Research Misconduct and Other Regulatory Frameworks
The implementation of policies to address research misconduct by federal agencies and institutions can sometimes be intertwined with and affected by other regulatory frameworks that govern certain types or aspects of research. Although reviewing these issues and related evidence contributes to an understanding of
some of the tasks and challenges facing agencies and institutions, developing solutions or new approaches is largely outside the scope of this study.
The most obvious example of regulatory intertwining concerns the regulations designed to protect the human subjects of research in clinical trials and other settings. The basis of federal policy in the area of human subjects protection is the “Common Rule,” covered in 45 CFR Part 46, of the Code of Federal Regulations, “Protection of Human Subjects,” which covers research supported by federal agencies or subject to federal regulation, such as privately funded clinical trials that are subject to oversight by FDA. Institutions performing research on human subjects are required to undertake a prospective ethical review of proposed research through a standing institutional review board or other mechanism, ensure that human research subjects provide “informed consent” to participation in the research, and promptly report any unanticipated risks or failure to comply with regulations during the course of the research.
A 2014 report outlines the differences between the research misconduct and human subjects protection regulations and explains the complexities and challenges that can arise for institutions as they seek to comply with both (Bierer and Barnes, 2014). For example, fabrication or falsification of data in a federally funded research project involving human subjects may trigger fact-finding and enforcement processes under both sets of regulations. In general, the research misconduct regulations are more detailed and specific for investigation procedures (including opportunities for appeal), confidentiality requirements, standards of proof, and other issues than are the human subjects protection regulations. Examples of issues and questions that may arise in cases that fall under both sets of regulations include how to provide an IRB with access to data that have been sequestered as part of a research misconduct investigation and what weight (if any) a research misconduct investigation should give to an IRB finding that allegations of noncompliance with Common Rule standards have not been substantiated (Bierer and Barnes, 2014).
Additional issues arise in connection with reporting and information flows between officials working to ensure human subjects protection and those responsible for investigating research misconduct allegations. As illustrated by the discussion above of clinical trials where an FDA investigation found significant problems such as falsification of data, but where the research was published with no indication of a problem and there is no record of a research misconduct finding, there appear to be shortcomings in how information flows between the two regulatory and compliance systems.
Another area where human subjects protection regulations overlap with research misconduct regulations is in education, since human subjects protection is included as one of the nine core areas of responsible conduct of research (RCR) education as defined by NIH (NIH, 2009). RCR education is discussed further in Chapter 9.
Starting in 2011, the Department of Health and Human Services embarked
on a process of revising the human subjects protection regulations that was in process during most of the time when this study was being undertaken (HHS, 2011a). In 2015 a Notice of Proposed Rulemaking was published describing the major changes suggested to the Federal Policy for the Protection of Human Subjects. The Notice of Proposed Rulemaking proposes changes to the rules regarding informed consent, including revisions to consent forms and research subjects providing “broad” consent for secondary research, and to the oversight system, through “making the level of review more proportional to the seriousness of the harm or danger to be avoided” (HHS, 2016). It is unclear how the resulting changes will affect agencies and institutions as they seek to manage areas of overlap between the research misconduct and human subjects regulatory frameworks.
Another area of regulation that has some relationship with research misconduct policies involves the requirements for disclosing and managing possible financial conflicts of interest in research. Conflicts-of-interest reporting is an issue currently in flux, with many differences between existing policies, which are not uniform between agencies, and compliance generally does not raise issues of overlap with research misconduct policy. In response to research showing that conflicts-of-interest reporting can have perverse effects by providing a “strategic reason and moral license” to exaggerate advice, policies, and regulations aimed at ensuring that financial conflicts of interest do not adversely affect research or skew results have been changing in recent years, and the impacts on how research misconduct is addressed may change in the future (Cain et al., 2005, 2011; Koch and Schmidt, 2010; Loewenstein et al., 2011). For example, HHS revised its policies toward financial conflicts of interest in PHS-funded research in 2011, which changed some of the reporting requirements of researchers and institutions (NIH, 2011). In NSF-funded research, institutions are required to certify as part of the proposal process that they have a policy covering conflicts of interest and that the proposed research complies with that policy (NSF, 2014). The National Academies’ report Optimizing the Nation’s Investment in Academic Research recommends a federal government–wide financial conflicts-of-interest policy that differentiates “requirements for financial interest disclosure and management for research that does and does not involve human subjects” in an effort to reduce the time and cost burdens of multiple existing policies (National Academies of Sciences, Engineering, and Medicine, 2016). As discussed in Chapter 4, some countries treat failure to disclose potential conflicts of interest as a form of research misconduct. Conflicts-of-interest regulations are largely outside the scope of this study and so, in this report, addressing potential conflicts of interest in research is treated as an issue to be addressed through best practices, as discussed in Chapter 9.
Correcting the Research Record: Journals and Retractions
One important aspect of addressing research misconduct is correcting the published research record through the retraction of journal articles. Retractions are discussed in several other places in this report, including Chapter 5, which discusses the significant increase in retractions that has occurred in recent years and the extent to which statistics on retractions are a useful proxy for the incidence of misconduct (Grieneisen and Zhang, 2012). Retractions, while not rare at this point, are still relatively unusual. Here, it is important to note that retracting articles is not always a consistent or straightforward process and to identify issues that might be addressed by journals or by other stakeholders.
One way to gain insight into retractions is to examine a case where a researcher was found to have committed misconduct and where a number of his or her articles needed to be reanalyzed and possibly retracted. An analysis of how journals responded following a finding of research misconduct against University of Vermont obesity and aging researcher Eric Poehlman is one such case (Sox and Rennie, 2006). In this case, 4 of the 10 articles identified by ORI as being based on fabricated or falsified data had not been retracted more than a decade after the finding of misconduct (McCook, 2015). There is a mix of reasons for why individual papers have not been retracted, with several having only been subject to a correction notice.
Although journals are not equipped to investigate allegations of research misconduct, they may have strong evidence of misconduct developed through the use of software that detects image manipulation or through other technological tools. In the absence of a finding of misconduct or a request by an institution to retract an article, a journal might hesitate to move forward. Some retractions “can involve unavoidable delays of years because of some combination of the complexity of the science, disputes between co-authors, the need to await outcomes of lengthy investigations, and disputes over these proceedings” (Nature, 2014). In the absence of an institutional finding, a journal may be concerned that citing misconduct as the cause of a retraction would open the door to a libel suit or other legal action, although it is unclear if such legal action has ever been taken by an author (Wager, 2015b).
Another issue that arises with retractions is that retracted work may continue to be cited. For example, a recent analysis of 25 retracted papers by Scott Reuben found that, 5 years after the retractions, nearly half the papers were still being cited, with most of the citations not mentioning that the work had been retracted (Bornemann-Cimenti et al., 2015).
Chapter 9 discusses best practices that should be adopted by journals in the area of retractions. Technological tools that allow researchers to identify the publisher-maintained version of an article and the development of master databases of retractions will likely reduce the phenomenon of retracted work being cited in the future.
Other Issues, Gaps, and Inconsistencies
Privately Funded and International Research
As mentioned in the Chapter 4 discussion of research misconduct definitions, the federal research misconduct policy only applies to federally funded research. The federal requirement that institutions report investigations and their results to funders does not apply to privately funded research, including research supported by international sponsors. Institutional policies do not make a distinction between funding sources in how allegations should be handled, and there appears to be no evidence indicating whether institutions make such distinctions in practice or not. The results of these investigations may not be made public, so it is not possible to track incidence or trends at the aggregate level. However, some cases of misconduct where work needs to be retracted do become publicly known.
There are several notable cases where misconduct in privately funded research has been investigated and addressed. One example is the data fabrication and falsification by Jan Hendrik Schön of Bell Laboratories (Goodstein, 2010). Results of his seemingly groundbreaking research on semiconductor materials were published in a number of prestigious journals, mainly between 2000 and 2002. In early 2002, other researchers within Bell Labs and outside began to raise questions about Schön’s work, and Bell Labs set up a committee to investigate. The committee released its report later that year, finding that Schön had committed scientific misconduct (Beasley et al., 2002). The report served as the basis for retraction of numerous papers. The committee stated that there was no evidence that any of Schön’s coauthors were aware of or involved with the misconduct, noting that in only a few cases had coauthors had any involvement in fabricating the devices in question, designing or performing the experiments, observing the reported phenomena, or collecting or analyzing the data. The Schön case raises the question of whether coauthors bear responsibility for reviewing or confirming the work of their collaborators; this issue has appeared in several high-profile cases since that time, such as the stem cell case of Hwang Woo-suk (see Appendix D).
Sabotage as Falsification
Chapter 4 contains a discussion of whether cases where researchers sabotage the experiments of others or abscond with vital data should be considered research misconduct. In at least one case, ORI has treated sabotage of experiments as research misconduct. In a case from several years ago, Vipul Bhrigu, a postdoc at the University of Michigan Medical School, was found to have tampered with the experiments of Heather Ames, a graduate student in his lab, which caused false results to be reported in the research record (HHS, 2014a; Maher, 2010). The tampering had been videotaped. Bhrigu also was convicted of malicious destruction of personal property (Maher, 2010).
In another incident reported in the media, Polloneal Jymmiel Ocbina, a postdoc at Yale, was videotaped tampering with the zebrafish experiments of another postdoc, Magdalena Koziol, and left Yale without being charged with a crime (Enserink, 2014). Koziol later sued Yale and her supervisor, Antonio Giraldez, for not allowing her to speak about the case to sponsors in explaining why she had not made more progress in her work.
It is well established that tampering with data and experiments to obtain false-positive results constitutes falsification. Given that the Bhrigu case has established a precedent and conditions under which tampering to cause another researcher to obtain false-negative results also constitutes falsification, a useful way to ensure greater consistency in federal agency implementation of the research misconduct policy might be to examine how institutions treat cases of sabotaging experiments and absconding with data, perhaps through a survey or other mechanism.
Issues Raised by the Policies and Practices of Federal Agencies Other Than ORI and NSF-OIG
Much of the discussion of policies and policy issues in this report focuses on ORI and NSF-OIG, which oversee the handling of research integrity issues by grantees of the Department of Health and Human Services (the bulk of these being grantees of the National Institutes of Health) and the National Science Foundation, respectively. Looking at a few statistics shows why these agencies have a disproportionate importance in the implementation of federal research misconduct policy. NSF and HHS account for about 80 percent of the federal research and development funding that is provided to academic and private nonprofit organizations (NSB, 2014b), and authors affiliated with academic and private nonprofit organizations account for about 80 percent of the research articles published by U.S. authors, with authors affiliated with industry, federal agencies, and federally funded research and development centers accounting for most of the rest (NSB, 2016). NSF and HHS clearly play leading roles in federal support for research that results in published articles.
Despite the lack of federal government-wide statistics or reporting on research misconduct investigations and findings, the available evidence indicates that NSF-OIG and ORI account for the vast majority of total activity. Also, federal agencies other than ORI and NSF-OIG do not appear to produce regular public reports on how many investigations have been launched and their resolution, as ORI and NSF-OIG do. As described below, agencies follow a variety of approaches toward making information about research misconduct investigations public.
Despite the understandable focus on NSF-OIG and ORI, other federal agencies that perform and/or support research are also obliged to implement the federal research misconduct policy. These agencies may face different challenges
depending on whether their research programs are mainly intramural or extramural and on other factors. Also, in some cases the handling and resolution of misconduct allegations affecting research supported or performed by other agencies have led to questions or controversy. Although a detailed review of how all agencies are implementing the research misconduct policy is beyond the scope of the study, examining several examples serves to illustrate that efforts to assess and improve performance by federal agencies would contribute to fostering research integrity within the federal government and beyond.
Part of the context is the scientific integrity initiative that the Obama administration undertook during its first term, described in Chapter 3. Executive branch agencies were instructed to develop policies to ensure the credibility of government research and prevent bias in how science is used in policy making. As part of the initiative, some agencies reviewed their existing policies. For example, a 2010 review at the Department of the Interior (DOI) found no comprehensive scientific integrity policy at the department level, although an effort to develop a policy and code of conduct to implement the 2000 federal research misconduct policy had been attempted and failed (DOI, 2010). In 2007, one of the DOI’s constituent agencies, the U.S. Geological Survey, issued a scientific integrity policy that implemented the 2000 federal research misconduct policy. Following up on the 2010 review, DOI developed a comprehensive department-wide policy that was implemented in 2011 and updated in 2014 (DOI, 2014).
DOI agencies such as the U.S. Geological Survey and the Fish and Wildlife Service both perform intramural research and support extramural research. Investigations of possible research misconduct and other breaches of scientific integrity are overseen by scientific integrity officers appointed by DOI’s constituent agencies. DOI also posts summary results of the research misconduct investigations that it has undertaken and concluded since 2011 (DOI, 2015). The most controversial DOI scientific integrity cases of recent years have revolved around establishing and reporting the scientific basis for agency policies and positions, rather than fabrication, falsification, and plagiarism. For example, disputes have emerged, and investigations of alleged breaches in integrity have been undertaken, over the development and presentation of the scientific evidence used to predict the impacts of such actions as building the proposed Keystone XL pipeline and removing dams from the Klamath River. A case of data falsification at a USGS laboratory that the agency investigated and confirmed, as described in a later report of the DOI Office of Inspector General, illustrates that research misconduct may occur in research performed at government laboratories (DOI-OIG, 2016).
The Department of Defense (DOD) is an important performer and sponsor of research. DOD issued a directive in 2004 that delegated to component agencies the responsibility for developing and implementing procedures to foster research integrity, including procedures for addressing allegations of research misconduct (DOD, 2004). The DOD directive also defines standards and requirements for
those procedures, referring to the definitions set out in the 2000 federal policy. In response to the federal scientific integrity initiative of 2010, DOD developed a separate policy that covers the utilization of science in policy making, media relations, and other issues distinct from addressing research misconduct.
A research misconduct investigation concluded in 2007 shows that challenging issues may arise in connection with addressing misconduct allegations in DOD-sponsored research (Godfrey, 2007). In that case, an engineering team from the Massachusetts Institute of Technology’s Lincoln Laboratory that evaluated a 1998 ballistic missile defense flight test was accused of research misconduct. The investigation was delayed for several years when DOD refused to allow access to classified information deemed essential to undertaking the investigation. Once access was granted, the investigation proceeded, resulting in a finding that research misconduct had not occurred and exoneration of the Lincoln Lab authors, Ming-Jer Tsai and Charles Meins (Godfrey, 2007). In addition to summarizing the investigation, the final report contains suggestions for improvements in conducting future investigations.
The Department of Veterans Affairs (VA) undertakes a large program of clinical and discovery research programs, budgeted at $1.8 billion for fiscal 2015, combining the VA’s own dedicated research budget, medical care support, other federal resources, and nonfederal resources (VA, 2015). The VA’s Program for Research Integrity Development and Education oversees training and credentialing in areas related to human subjects protection. The VA also has detailed policies and procedures for dealing with research misconduct allegations, with the most recent version being issued in early 2014 (VA, 2014). These policies and procedures were reviewed and revised prior to being reissued, with a number of substantive changes introduced to clarify roles and improve procedures for conducting inquiries and investigations and to harmonize VA’s policies with those of the Public Health Service that are implemented by ORI (Bannerman, 2014).
Research integrity officers are appointed at all VA facilities with an active research program. Depending on how the processes of conducting the initial inquiry, undertaking the investigation, reviewing the report, adjudicating the result, and overseeing any appeal proceed in specific cases, there are defined roles for the director of the facility where misconduct has been alleged, the director of the Veterans Integrated Service Network (VISN) that includes the facility, the VA Office of Research Oversight, the VA Office of General Counsel, and the VA Under Secretary for Health.
One interesting aspect of the revised VA research misconduct policy is that it includes specific provisions for publication of findings, which the previous policy lacked:
Publication of Final Findings of Research Misconduct. For all findings of research misconduct adjudicated by a VISN Director and upheld by the Under Secretary for Health on appeal, if any, VA may publish the respondent’s name, the respondent’s current or former VA position, a detailed summary of the find-
ings, and the corrective actions imposed, in any venue deemed appropriate. Such venues include, but are not limited to, Government exclusionary lists (if relevant), the Federal Register, ORO’s Web site, other VA publications, and media outlets. VA may also provide the information referenced in this paragraph to the respondent’s current employer and academic affiliates, as well as other entities whose notification would be necessary to implement a corrective action (e.g., journal editorial boards). NOTE: In those cases where there is a determination that the extent of the research misconduct is significant and/or the possible or actual consequences of the research misconduct are significant, it is considered to be in the interests of both VA and the scientific community to publish final findings of research misconduct. (VA, 2014)
This approach to publishing investigation results differs from those of NSF-OIG and ORI discussed above. The policy allows, but does not require, the names, findings, and corrective actions related to misconduct to be published, preserving discretion for the agency.
The U.S. Department of Energy (DOE) is also a significant sponsor of research. Much of the research that DOE supports is performed at its National Laboratories and user facilities, most of which are managed and operated by contractors. DOE’s research misconduct policy was adopted in 2005 and specifies that research misconduct allegations should be referred to “the DOE Element responsible for the contract or financial assistance agreement” (10 CFR Parts 600 and 733; 48 CFR Parts 935, 952, and 970). The policy also specifies that the DOE element in question should consult with the DOE Office of the Inspector General (OIG), which can decide whether to investigate the allegation itself. If DOE-OIG declines to investigate, the allegation is referred to the contractor or grantee. The requirements for contractors and grantees regarding research misconduct investigations are covered in more detail in DOE’s contracting regulations (48 CFR Chapter 9). The contractor or grantee is primarily responsible for adjudication and determination of corrective actions, although DOE reserves the right to take additional action.
Questions about DOE’s policies were raised in connection with an investigation of an anonymous allegation against a research group at Oak Ridge National Laboratory (Reich, 2011). In that case, the lab’s investigation found that Stephen Pennycook’s group had not fabricated or falsified data. In the aftermath of this case, questions were raised in a Freedom of Information Act lawsuit by Nature reporter Eugenie Samuel Reich about whether DOE’s oversight of research misconduct investigations by contractors and grantees was adequate, and whether DOE should consider establishing a new organization focused on performing such oversight (Reich, 2011). A 2014 audit of DOE’s management of research misconduct investigations reported that around 30 research misconduct allegations had been received by DOE’s Office of Science and the National Laboratories between 2009 and 2013 (DOE-OIG, 2014). It is unclear how many of these allegations proceeded from the inquiry stage to an investigation. DOE-OIG audited
the responses to 21 allegations and found that they were addressed appropriately. However, DOE-OIG found several cases where requirements to report allegations to the OIG or the contracting officer were not followed and others where contractors did not follow their own misconduct investigation procedures. The report recommended that DOE’s Office of Science “provide additional education and guidance on the procedures and responsibilities for conducting research misconduct allegation reviews to Department officials, laboratories, and financial assistance recipients” (DOE-OIG, 2014).
The Environmental Protection Agency (EPA) also performs research and supports extramural work. Its policy on addressing research misconduct allegations made against EPA employees and contractors was adopted in 2003 and specifies investigative and reporting requirements (EPA, 2006). As is the case in several of the other agency examples, the agency’s Office of the Inspector General has an important role in overseeing responses to allegations, including the authority to step in and undertake an investigation under certain circumstances. Research misconduct is also discussed in EPA’s scientific integrity policy, which was adopted in 2014 (EPA, 2012). This newer policy does not replace or amend the procedures for responding to allegations but does identify a new position within EPA, the Scientific Integrity Official, who is responsible for working to promote scientific integrity within EPA.
The concept of detrimental research practices and specific examples of DRPs are discussed in Chapter 4. Chapter 5 describes the negative impacts of DRPs on the research enterprise in terms of misallocated financial resources and wasted effort. The sum total of these negative impacts may be greater than the harm done by research misconduct. Some detrimental research practices related to authorship that do not constitute misconduct, such as honorary authorship, are discussed below in a section focused on authorship issues and challenges.
The discussion in Chapter 5 also explains how some detrimental research practices, such as misleading statistical analysis that falls short of falsification, incomplete reporting of results that leads to misrepresentation of findings, and the failure to retain or share data and other information (such as code) underlying reported results, are implicated in the reproducibility problem—that an alarmingly high percentage of the reported findings in certain fields cannot be replicated. The example of several specific cases also shows that DRPs are closely connected with research misconduct. Tolerance of DRPs in certain fields, as embodied in the policies of journals and sponsors, as well as in accepted practices at the laboratory level, can delay or prevent the discovery of misconduct.
To the extent that standards can be improved and tolerance for DRPs can be lowered or eliminated, fabrication and falsification of data will be more easily and quickly uncovered in many cases. In addition to improving the efficiency of
research in these fields in the production of reliable knowledge, the development and implementation of higher standards and improved practices will make it more difficult for long careers to be built on fraudulent work, as Stapel and Reuben were able to do (as described in Chapter 5). We can expect some, perhaps many, researchers inclined in that direction to be deterred. Discouraging, reducing, and eliminating DRPs will support and strengthen the effective operation of science’s self-correcting tendencies.
An example from high-energy physics illustrates the value of good research practices in the process of reporting results, identifying and correcting errors, and confirming findings. The apparent discovery in 2002 of pentaquarks, a short-lived particle made up of five subatomic quarks, quickly led to a number of confirmatory reports (Chalmers, 2015). Previous theoretical work had predicted the existence of pentaquark states. However, subsequent efforts to replicate these results at a higher level of sensitivity failed and appeared to prove that pentaquarks do not exist. In the most recent development, researchers analyzing data collected from an experiment at the Large Hadron Collider at CERN appear to have confirmed the existence of pentaquarks (Chalmers, 2015). This episode shows the value of reporting results and the underlying information so that others can confirm results and extend the findings, and serves as a reminder that science often proceeds through various twists and turns in the accumulation of reliable knowledge.
A widely reported 2011 article claiming that bacteria could grow without phosphorus by using arsenic instead is an example showing the value of postpublication community review in identifying problems with work that are unrelated to misconduct (Wolfe-Simon et al., 2011). The article was criticized immediately and refuted by later work (Kaufman, 2012).
However, as discussed in Chapter 5, current standards and practices in particular fields may not be adequate to counteract widespread lack of rigor in study design, bias in selecting data or publishing results, and other errors. Developing appropriately high standards in research and ensuring their wide adoption are complex tasks requiring the contributions of various stakeholders with different perspectives and incentives. The heightened attention that the reproducibility problem has recently attracted provides an opportunity to make progress.
Better awareness and recognition that there is a problem at the level of specific fields and disciplines, and communication of this awareness to institutions and investigators, can be an important starting point. A well-designed replication effort can provide insights on the nature and possible scope of problems. A recently published effort to reproduce 100 studies published in three psychology journals is a valuable demonstration along these lines (OSC, 2015). The replication effort was undertaken as an open, global collaborative and involved contacting the original authors for materials and asking them to review the replication study protocol, public registration of the protocol, and public archiving of the replication materials and data (Aarts et al., 2015). The result was that 36
percent of the replication efforts yielded significant results versus 97 percent of the original studies. In addition, the effects found in the replications averaged half the magnitude of the originals. The effort also found that the original results from cognitive psychology were more robust than those from social psychology. While pointing out some caveats and uncertainties in interpreting the results, the summary of the replication effort yielded important insights into irreproducibility in psychology and its likely sources:
More generally, there are indications of cultural practices in scientific communication that may be responsible for the observed results. Low-power research designs combined with publication bias favoring positive results together produce a literature with upwardly biased effect sizes. This anticipates that replication effect sizes would be smaller than original studies on a routine basis—not because of differences in implementation but because the original study effect sizes are affected by publication and reporting bias, and the replications are not. Consistent with this expectation, most replication effects were smaller than original results, and reproducibility success was correlated with indicators of the strength of initial evidence, such as lower original P values and larger effect sizes. This suggests publication, selection, and reporting biases as plausible explanations for the difference between original and replication effects. The replication studies significantly reduced these biases because replication preregistration and pre-analysis plans ensured confirmatory tests and reporting of all results. (OSC, 2015)
Strengthening Standards and Ensuring Transparency
Detrimental research practices and some amount of failure to reproduce research results are not new problems. When the research enterprise was smaller and researchers in specific fields were more likely to know each other, personal communications about irreproducible work could be shared privately (Begley and Ioannidis, 2015). This undoubtedly still occurs, although this informal knowledge that certain work is unreliable may not be widely shared. As the enterprise has grown larger and competition has become more intense, the incentive to publish more articles has become stronger. In some of the specific examples described in this report, there appeared to be little or no checking of data at the laboratory or institutional levels, raising the question of whether ineffective supervision is widespread in certain fields and institutions. Funders and journals may not insist that researchers make data, code, and other information underlying results available. These factors, in combination, may create environments where publication bias and selection bias can go relatively unchecked and influence reported work.
Another important point discussed in Chapter 5 is that some false results will and should continue to appear in the normal course of science. Introducing practices aimed at reducing the irreproducibility rate to zero across all fields would be counterproductive and impose significant costs.
Clearly, improving transparency is a key factor in making improvements.
Chapter 8 is devoted to a discussion of best practices for researchers, research institutions, journals, sponsors, and societies. Much of the best practices discussion is related to improving transparency. Broad principles related to transparency in such areas as sharing data should be observed as widely as possible across all fields; these principles are the focus of several recommendations in Chapter 10. As discussed in Chapter 3, information technologies have become much more important across most research fields in the past two decades, but the utilization of these new tools has outpaced the ability of some fields and disciplines to develop standards and practices that will ensure a level of transparency consistent with fostering integrity and reproducibility.
How should fields go about developing new standards and ensuring that they are followed? One recent article encouraged disciplines to develop detailed case studies on selected nonreproducible publications with the goal of “deriving general principles for improving science in each field” (Alberts et al., 2015). One historical example of a field where DRPs were once widely tolerated is human language technology (HLT), which includes areas such as automated speech recognition and machine translation (Liberman, 2012). A public demonstration at Georgetown University in 1954 of a system that translated several Russian sentences into English encouraged the belief that the most significant barriers to machine translation had been overcome, yet the Georgetown system had a small vocabulary, and a limited number of grammar rules and did not represent a true scientific advance (Hutchins, 1982). After this demonstration, HLT received significant federal funding, but by the mid-1960s there was not much to show for it. The systems produced during these times could generate an impressive demonstration but performed poorly in real-world use, with output requiring extensive human post-editing. A negative evaluation of the potential of the field led federal agencies to largely end support for HLT research for almost two decades (NAS-NRC, 1966). When the Defense Advanced Research Projects Agency renewed support for HLT in the mid-1980s, a number of steps were taken to ensure that research produced clear, usable results. The results of all funded projects needed to be judged against a well-defined, objective evaluation metric, developed and applied by the neutral National Bureau of Standards (now the National Institute of Standards and Technology), on shared datasets, with the results of the evaluation revealed to the sponsor and the other investigators (Liberman, 2012). Although some HLT investigators complained at first about this “common task structure,” the field quickly embraced it, strengthening its research culture as a result. The common task structure created a positive feedback loop that accelerated progress. Error rates decline by a fixed percentage every year, with advances mainly taking the form of incremental improvement. The sharing and reuse of data have become central to research practices in HLT. Advances in the field have led to products that are widely used today, such as Apple’s Siri and Google Translate.
In recent years, there have been a number of positive developments related to ensuring quality and reproducibility at the broad level of the research enterprise
as well as in specific fields and disciplines (Nature, 2015b). Experts have made the case that integrity, quality, reproducibility, and the credibility of research are strongly interconnected:
If science is to enhance its capacities to improve our understanding of ourselves and our world, protect the hard-earned trust and esteem in which society holds it, and preserve its role as a driver of our economy, scientists must safeguard its rigor and reliability in the face of challenges posed by a research ecosystem that is evolving in dramatic and sometimes unsettling ways. (Alberts et al., 2015)
While it would take considerable space to list or describe all the recent and ongoing efforts, it is worth identifying a few significant initiatives. A 2012 workshop identified key requirements for methodological reporting in animal studies aimed at improving the predictability and quality of preclinical animal studies, such as sample size estimation, whether the animals were randomized and how, and data handling (Landis et al., 2012). In 2013, Nature introduced a checklist that is “intended to prompt authors to disclose technical and statistical information in their submissions, and to encourage referees to consider aspects important for research reproducibility” (Nature, 2013). In biomedical research, the EQUATOR (Enhancing the QUAlity and Transparency of health Research) Network (http://www.equator-network,org) is an international initiative that promotes reporting standards aimed at ensuring transparency and reliability.
Efforts to address the issue of sharing clinical trial data have also gained momentum in recent years (IOM, 2015). For clinical trials, sharing data at the time of publication is aspirational. There may be many reasons to wait for a specified period of time before opening up the data and metadata for sharing (IOM, 2015). Several recent proposals indicate that consensus is building around a standard recommended maximum of 6 months following publication for data to be shared (IOM, 2015; Taichman et al., 2016).
The Center for Open Science, the group that was responsible for the recent effort to replicate psychology results discussed above, has also developed a set of Transparency and Openness Promotion (TOP) guidelines that it has put forward for consideration and possible adoption by journals (Nosek et al., 2015). The TOP guidelines include eight standards, with each standard comprising three levels that are intended to encourage movement toward greater transparency and openness over time. Two of the standards are intended to reward researchers for open practices by establishing citation standards for data, code, and research materials and by establishing conditions under which the journals will publish replication studies. Four of the standards specifically define openness through the research process in design standards, research materials, data sharing, and analytic methods. The final two standards cover preregistration of studies and analysis plans that are aimed at clarifying the distinction between research intended to confirm hypotheses and research intended to generate hypotheses. The TOP guidelines have already attracted an impressive list of signatory journals, including a number
of journals from outside of psychology and even general scientific journals such as Science and PLOS ONE (COS, 2015).
The examples of biomedical research, social psychology, HLT, and high-energy physics highlight the importance to research quality and integrity of reproducibility of results and the availability of data, code, and other information necessary for replication. Disciplines and fields have traditionally had a wide variety of cultures and practices related to data (NAS-NAE-IOM, 2009a). In Chapter 3, the problems caused by resistance to sharing of data and code in climate science were described. Even in some areas of computational science, where the value of transparency would appear to be obvious, there are significant barriers to reproducibility, including routine withholding of code and data on sponsored research (Liberman et al., 2012).
The efforts of the Center for Open Science and others raise the possibility that fields and disciplines can establish and implement higher standards that define today’s commonly tolerated DRPs as unacceptable and provide checks and incentives to reduce the occurrence of those practices to a level far below what exists today. Progress on this front will help to foster research integrity as well as improve the quality of research across a range of fields and disciplines.
Nature and Scope of the Problem
As discussed in other parts of this report, published papers are the currency of science. Through such papers, science is communicated, critiqued, and assessed. The number and quality of published articles credited to a scientist, especially peer-reviewed articles, are major criteria for promotion and tenure, and so have a powerful impact on scientific careers. Authorship designates who is willing to take responsibility for an article and who bears responsibility for the work in case of error or allegations of misconduct. Authorship credit is therefore an integral part of the scientific enterprise as a professional system.
Chapter 3 discusses how changes in the research environment such as technological advances that have transformed many aspects of performing and reporting research, the growing importance of collaborative and interdisciplinary research, and the globalization of research are affecting authorship practices and conventions. Several of the most difficult challenges to research integrity involve authorship abuses, particularly authorship credit misallocations/misappropriations (B. C. Martinson and Z. Master, personal communicaation, July 27, 2015). As discussed in Chapter 4, plagiarism is one category of authorship credit misallocation that is included in the definition of research misconduct by the U.S. federal government and by most other countries. For the most part, other categories of authorship credit misallocation are considered detrimental research practices for
the purposes of this report. This section will describe some of the most pressing challenges related to authorship and research integrity and consider the advantages and disadvantages of alternative approaches to addressing them.
Authorship can be misused in several ways. Gift, guest, or honorary authorship involves listing an author who made no substantive contribution to the research reported. For example, researchers may add the name of a prominent researcher to a paper in the belief that it will increase its odds of being accepted by a prestigious journal. Gift authorship can happen with or without the knowledge or permission of the researcher being “honored.” When the gift author had no role in the conducting or writing of the article, listing his or her name is a misallocation of credit. In cases where work is fabricated or falsified, questions are raised about the responsibilities of coauthors whose contributions may or may not have merited authorship. The stem cell case at Seoul National University and the University of Pittsburgh, described in Appendix D, discusses these issues.
A senior scientist may demand or be granted an authorship designation for a “specialized” service such as providing biological materials or specimens, helping to secure funding for the research, or serving as head of the laboratory or department where the research is undertaken. Insistence by a scientist in a position of authority that he or she be listed as an author on all papers submitted to journals by subordinates, including articles in which the senior scientist has played no direct role, is known as “coercive authorship.”
As data and code sharing become part of the usual practice of science, reuse of these scholarly outputs is increasingly common. The expectation is that the use or reuse of data and/or code produced by another researcher will be appropriately cited. Such recognition rewards the producer of the data and code while improving, extending, and building on these objects in their own right. It is inappropriate to condition data or code reuse on coauthorship when there is no other contribution to the paper. This is a coercive practice that slows the advancement of science when other mechanisms are in place to reward data and code contributors, such as citation. The practice of conditioning data use on coauthorship is more widespread in some disciplines than in others but should not exist in any discipline. This is separate from, and not to be confused with, a data or code contributor who is or becomes part of the research team and collects novel data or builds code for the purposes of a research project or series of projects. Coercive authorship practices occur when coauthorship is conditioned on using data and code associated with a previous or different project rather than the only expectation being citation for downstream use.
Another detrimental authorship practice is unacknowledged or “ghost” authorship, in which researchers who have made a substantial contribution to a research article are not listed as authors. Not all unacknowledged authorship fits into this category. For example, reporting someone else’s research results as one’s own without designating that person as an author and without their knowledge is a form of plagiarism. A professional writer whose only involve-
ment in the research is participation in writing the paper is not considered to be an author in most contexts, but many journals require that professional writers be acknowledged.
A problematic form of ghost authorship arises when researchers who are directly involved in all phases of the research are not acknowledged (Fugh-Berman, 2010). For example, a pharmaceutical company may finance and undertake research that supports a non-FDA-approved use of one of its products, prepare the paper, and recruit prominent medical researchers to sign on as authors. The corporate support and industry authors may not be disclosed. In some cases, the listed academic authors will have had some involvement with the research, but sometimes they do not. In these latter cases, ghost authorship also becomes a type of honorary authorship.
While the immediate motivation for this form of ghostwriting is to hide the financial interest of the sponsor and ghost authors in the work, it has also been associated with other detrimental research practices such as selective reporting and suppression of some findings. In the Paxil case described in Appendix D, data falsification was admitted by the sponsor and ghost authors but denied by the listed authors. If data are falsified or the reported results are misleading in such clinical studies and the listed authors are not able to vouch for the integrity of the data or results, using the study as a basis for treating patients may present serious health and safety risks.
In addition to the Paxil case, several other examples of alleged ghostwriting that involved other alleged detrimental research practices led to legal consequences for both medical industry sponsors and ghostwriters (Feeley, 2012; Fugh-Berman, 2010). In one case, documents were released showing that Pfizer’s Wyeth Pharmaceutical Company had not disclosed its role in preparing journal articles supporting the used of Prempro, a hormone drug, and recruiting academic authors (Fugh-Berman, 2010). In 2012, Pfizer had paid $896 million to settle only about half of the cases alleging Prempro had caused cancer (Feeley, 2012). In addition to Paxil and Prempro, ghostwriting has “been documented in the promotion of ‘Fen-phen’, Neurontin, Vioxx and Zoloft” (Fugh-Berman, 2010). The companies that produce these drugs have paid millions to billions of dollars in lawsuit settlements.
This form of ghostwriting has been condemned as an “example of fraud” and “a disturbing violation of academic integrity standards, which form the basis of scientific reliability” (Bosch and Ross, 2012; Stern and Lemmens, 2011). The practice is not currently equated with plagiarism and so is not within ORI’s power to regulate. Bosch and Ross (2012) suggest that ORI include ghostwriting in its definition of research misconduct so that it can be investigated and addressed under the federal research misconduct policy. The International Committee of Medical Journal Editors (ICMJE, 2015) established criteria against which to determine appropriate assignment of biomedical authorship and recommends that those who do not meet all of the criteria only be listed in the acknowledgments
sections. The Committee on Publication Ethics (COPE, 2011) also recommends that specific rules be implemented to prevent ghostwriting, which is explicitly defined as misconduct in its guidelines. The pharmaceutical industry itself has promulgated guidelines for clinical trials that specify adherence to the ICMJE authorship criteria (PhRMA, 2014).
All of the authorship abuses described above undermine research integrity. Even when the research that is reported is correct and of high quality, inaccurate and misleading authorship designations can lead to misallocation of credit, rewards, and future resources. They can damage the conduct of science if, for example, authorship credit without deep knowledge or skill in the science involved helps promote an honorary author to a position of authority. They can also obscure responsibility for reported work and make it more difficult to address other forms of misconduct, such as data fabrication. Indeed, there is evidence that engaging in authorship credit misrepresentation increases the risk that researchers will engage in research misconduct later (B.C. Martinson and Z. Master, personal communication, July 27, 2015). Several cases discussed in Appendix D, including the Paxil case and the stem cell case at Seoul National University and the University of Pittsburgh involve authorship.
Over the past several decades, surveys and meta-analyses have shed light on how prevalent inaccurate and misleading authorship designations are. A 2011 meta-analysis of research on authorship found that an average of 29 percent of respondents had experienced some problems with misuse of authorship (Marusic et al., 2011). An international survey of authors of articles published in six general medical journals in 2008 found that 21 percent of papers had honorary and/or ghost authors, down from 29 percent in 1996 (Wislar et al., 2011). Both the 2011 and 1996 surveys used the ISMJE definition of authorship (to be discussed in more detail below). Almost two-thirds of the 2011 respondents resided in the United States or Canada, with most of the rest residing in Europe. Even if other fields have a much lower incidence of authorship misrepresentation than biomedical research, the overall incidence would be disturbingly high, since biomedical research constitutes a large fraction of overall research funding and publishing.
More recent work presented at a scientific meeting and reported in the media found significantly higher rates of guest and ghost authorship than the results cited above (Jaschik, 2015).
Addressing Authorship Credit Misrepresentation
Stakeholders in the research enterprise widely recognize that more vigorous efforts are needed to reduce and ultimately eliminate authorship credit misrepresentation. In recent years, a number of journals and professional groups such as the Council of Science Editors, COPE, and ICMJE have updated and clarified their authorship criteria to prohibit honorary and ghost authorship. Journals also
are adopting practices such as author contribution statements and are requiring independent approval of all coauthors on articles as mechanisms to discourage inaccurate authorship designation. In a 2009 report, the Institute of Medicine called on academic medical centers and teaching hospitals to prohibit medical ghostwriting (IOM, 2009).
A 2012 editorial in Science called for renewed attention to the problem of honorary authorship and advocated that more journals adopt the use of author contribution statements (Greenland and Fontanarosa, 2012). The editorial also called on research institutions to combat honorary authorship more directly and proactively, pointing out that institutions such as Washington University in St. Louis define honorary authorship as misconduct in their policies (Washington University, 2009). For example, junior researchers need to know who to notify and the appropriate procedures to follow when they are coerced into listing a noncontributing coauthor.
Several alternative approaches might be considered to address this challenge. One would be to treat some forms of authorship credit misrepresentation in addition to plagiarism as research misconduct. A footnote in the 1992 Responsible Science report states that “it is possible that some extreme cases of noncontributing authorship may be regarded as misconduct because they constitute a form of falsification” (NAS-NAE-IOM, 1992). Responsible Science also noted that, in 1989, a Public Health Service annual report of its activities to address research misconduct included several abuses of authorship as examples of misconduct, such as “preparation and publication of a book chapter listing co-authors who were unaware of being named as co-authors” and “engaging in inappropriate authorship practices on a publication and failure to acknowledge that data used in a grant application were developed by another scientist.” It should be noted that this formulation predated the 2000 federal policy on research misconduct. In 1989, the PHS definition of research misconduct was “fabrication, falsification, plagiarism, or other serious deviations from commonly accepted research practices.” None of the specific terms was further defined.
Authorship misrepresentation other than plagiarism is clearly not included in the definition of falsification specified in the current U.S. federal research misconduct policy (OSTP, 2000). A change in the definition of falsification would be needed for inaccurate or misleading authorship designations to be treated as research misconduct by the federal government.
Implementation of such a change would face a number of practical obstacles. To begin with, although the authorship standards of COPE, the Council of Science Editors, and ICMJE are widely respected, disciplines vary widely in authorship standards and practices. For example, ICMJE defines authors as those who have fulfilled the following criteria: (1) substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; (2) drafting the work or revising it critically for important intellectual content; (3) final approval of the version to be published; and (4) agreement
to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved (ICMJE, 2013a). However, in research fields involving work on complex instruments and the generation of large amounts of data, it is possible to imagine circumstances where articles are published in which no one qualifies as an author according to the ICMJE criteria. The same circumstances might imply author credit misrepresentation in one field and acceptable practice in another. This would make it difficult to develop a workable definition of falsification that could be applied in a consistent way.
Professional disputes and legal allegations over the denial of rightful authorship or a lack of rightful authorship credit have become a growing issue within the research enterprise. While academic theft is a serious transgression, it may be difficult to determine how, or from whom, an idea originated. There are numerous examples of researchers, often postdocs and junior scientists, proving that their research had been published without their name credited as an author or without their knowledge at all, both inside and outside of academia. However, there are also instances in which graduate students or junior scientists perform research with a mentor who developed the same research idea years earlier. In 1995 a graduate student, Pamela Berge, won more than $1 million in a lawsuit claiming academic theft against her mentors; however, it was later revealed that the research had been ongoing for several years before Berge entered the research laboratory and the verdict was overturned (Woolston, 2002). Clear communication and discussion of how authorship roles are to be determined at the onset of research may avoid later questions of authorship credit.
Another practical difficulty in addressing authorship credit misrepresentation other than plagiarism through the research misconduct policy framework involves the sheer scale of the phenomenon. Suppose that the study cited above is correct and more than 20 percent of biomedical research articles have honorary and/or ghost authors (Wislar et al., 2011). There are roughly 50,000 biomedical articles published by U.S. authors per year (NSB, 2012). If current practices were to continue, therefore, roughly 10,000 additional incidents of research misconduct would occur each year in just one discipline. While these incidents would certainly not all be reported or investigated, even 2,000 to 3,000 additional cases per year is more than an order of magnitude greater than the current combined number of cases now handled by NSF-OIG and ORI per year, which itself reflects substantial recent increases. By expanding the scope of the federal research misconduct definition in this way, implementing the recommendation might require significant additional resources for ORI, NSF-OIG, and perhaps other agencies.
Also, since the federal misconduct policy only applies to federally funded research, as discussed above, a change in interpretation of the research misconduct definition would not address honorary, coercive, or ghost authorship in purely privately funded research except as an exemplar and spur to raise standards across
the board. The problem of ghostwriting discussed above, for example, largely concerns research that is funded by companies.
An alternative approach to reducing and ultimately eliminating authorship credit misrepresentation would rely on identifying best practices for researchers, institutions, sponsors, and journals, and encouraging that these stakeholders accelerate adoption of these practices. For example, at the disciplinary level, societies and journals could work to update and specify their authorship standards. Sponsors and journals could more actively discourage ghost and guest authorship. A pathway toward strengthening authorship standards is discussed in Chapter 8. Chapter 9 discusses best practices, and Chapter 10 covers findings and recommendations addressing these issues.
As discussed above, whistleblowers are a critical source of information that leads to uncovering and investigating research misconduct. Those accused of misconduct or others at their institutions often retaliate against whistleblowers—according to one survey of research misconduct whistleblowers, around 70 percent experienced some negative consequences, including more than 20 percent who lost their positions (Lubalin et al., 1995). The falsified grant application case (Appendix D) illustrates the vulnerability of whistleblowers, even in situations where there was no retaliation on the part of the accused or the institution. Providing effective protection for whistleblowers is a key element in addressing research integrity going forward (Kornfeld, 2012).
What policies and practices toward research misconduct whistleblowers are needed at the institutional level? Institutions may have policies protecting whistleblowers, although it is not clear how many actually do. Even where policies exist, it may be difficult to effectively implement protections without a strong commitment from the institution. It is often not clear (and difficult to prove) whether difficulties experienced by whistleblowers are retaliation as a direct result of making an allegation. The “tone at the top” is also very important in determining how whistleblowers are treated (Gunsalus, 1993). Chapter 9 discusses best practices in institutional policies and practices in this area, including the commitment to maintain multiple anonymous mechanisms for reporting suspicions and allegations.
Federal policies have an impact as well. As discussed in Chapter 4, under the pre-2000 federal definitions of research misconduct, retaliation against a whistleblower or other obstruction of a research misconduct investigation could be pursued by NSF-OIG or ORI under the “other serious deviation” clause. Under the current definition, the federal oversight agencies may refer allegations of whistleblower retaliation to the institution, but have no further recourse after the institution makes its report, even if they believe that there are problems. By
contrast, NSF-OIG or ORI can send back an inadequate institutional report on fabrication, falsification, and plagiarism, or in NSF-OIG’s case, take over the investigation itself.
While including whistleblower retaliation as an element in the research misconduct definition is an option, there are other federal policy options that appear to be more straightforward and potentially more effective. One option would be to create standards for institutions as part of the research misconduct policy, without making whistleblower retaliation part of the misconduct definition. HHS published proposed standards for protecting research misconduct whistleblowers in November 2000 (HHS, 2000). These standards followed up on draft guidance developed by the Ryan Commission (Commission on Research Integrity, 1995). The standards were never implemented.
Another option would be to extend federal whistleblower protections to those who make allegations of research misconduct outside the federal government. This approach has actually been implemented. Research supported by the American Recovery and Reinvestment Act of 2009 (P.L. 111-5, 123 Stat. 115, 516) required recipient institutions to have whistleblower protection policies in place and specified multiple mechanisms for reporting research misconduct allegations (including to funding agency officials and members of Congress). It should be possible to look at the experience with the act and evaluate whether implementation of these protections created any difficulties for institutions, and whether this was an effective approach. Congress has the option to extend those provisions to all federal research.
In this connection, the problem of knowingly making false allegations of research misconduct deserves attention. Very little is known about the incidence of such allegations and how they are resolved. A researcher might be motivated to make a false allegation out of a desire for competitive advantage if the accused and accuser were working in the same area of research, because of commercial or political interests, personal animus, or mental illness. Bad-faith whistleblowers may have a financial incentive to make a claim; under the False Claims Act, individuals are able to sue on behalf of the U.S. government if they have “evidence of fraud against federal programs or contracts” and receive a small percentage of what is recovered (NWC, 2016).
Some personal testimony is available that provides guidance on the steps that should be taken by researchers who are falsely accused (Goldenring, 2010). Knowingly making false allegations of research misconduct is damaging in that they impair the work of the accused and his or her collaborators and also impose costs on the institutions, journals, and others who are required to investigate the allegation. In addition to protecting good-faith whistleblowers, preliminary inquiries and investigations certainly need to protect the accused; an investigation led by experts in their field should follow all claims. Like retaliation against good-faith whistleblowers, knowingly making false accusations is a form of other misconduct for the purposes of this discussion.
Even whistleblowers acting in good faith may not be very sympathetic figures, alienating colleagues and administrators. Apprehension about possible retaliation is certainly reasonable and can be expected to deter those who observe misconduct to come forward.