Organized science has always relied on the willingness of researchers to share their results, allowing others to test and build on their work. According to the Royal Society (2012), “open communication and deliberation sit at the heart of scientific practice.” The digital revolution of the past several decades has greatly expanded the scope and benefits of openness by making it possible for researchers to share and access scientific articles, the data underlying reported results, the methods used to generate and analyze data such as computer code, and other products of research. Openness increases transparency and reliability, facilitates more effective collaboration, accelerates the pace of discovery, and fosters broader and more equitable access to scientific knowledge and to the research process itself.
Many consider the 2002 launch of the Budapest Open Access Initiative (BOAI)—which called for free and open online access to the scientific literature—to mark the formal beginning of the open access movement (BOAI, 2002). In the years since, the emphasis of this movement has broadened from its original focus on open access to articles, and has come to include data, code, and other research products. What we know today as open science comprises both principles (transparency, reuse, participation, accountability, etc.) and practices (open publications, data-sharing, citizen science, etc.) (Open Science Training Handbook, 2018).
Open science stands at an important inflection point. A new generation of information technology (IT) tools and services holds the potential of further revolutionizing scientific practice. For example, the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. In order to have maximum impact, these tools and services need to be utilized as part of an open science ecosystem that spans institutional, national, and disciplinary boundaries.
Yet, despite the significant progress that has been made to create that ecosystem, today’s science is not completely open. Most scientific articles are only available on a subscription basis (European Commission, 2018a). Sharing data, code, and other research products is becoming more common, but is still not routinely done across all disciplines (Figshare, 2017). Limitations and barriers to
more rapid progress include an academic culture and researcher incentives that can work against open science, insufficient infrastructure and training, issues related to data privacy and national security, disciplinary differences in the nature of research and treatment of data, and the economic structure of the scholarly communications market.
Research enterprise stakeholders around the world are making substantial efforts to facilitate and expedite the transition to open science. The European Commission has made the creation of a European Open Science Cloud one of its policy priorities (European Commission, 2018a). Other private and public funders, such as the Bill & Melinda Gates Foundation and UK Research and Innovation (the coordinating body for the United Kingdom’s public research councils), have adopted policies to support open science. Science International (2015) assessed the “boundaries of openness” and proposed 12 principles to guide the practice and practitioners of open data, while the Académie des Sciences (France), German National Academy of Sciences Leopoldina, and the Royal Society (2016, 2017) jointly issued statements on scientific publications and good practice.
There is a growing world-wide consensus in the scientific community that the transition to open science, particularly in relation to digital data and code, can best be achieved by the establishment of a globally interoperable research infrastructure. A number of evolving projects around the globe, such as the Global Open (GO) FAIR Initiative, which originated in Europe, focus on involving all networked initiatives, research disciplines, and interested Member States of the European Union to make research data findable, accessible, interoperable, and reusable (FAIR) (GO FAIR, 2018). The international Research Data Alliance (RDA) and other groups have similar goals for international scientific data management. Societies, scholarly communicators, and the library community are also adopting policies and launching initiatives aimed at fostering a transition to open science.
In the United States, the Office of Science and Technology Policy (OSTP) issued a memorandum in 2013 instructing all federal agencies that spend more than $100 million per year on research and development to “develop a plan to support increased public access to the results of research funded by the Federal Government” (OSTP, 2013, p. 5). It also directs agencies to review options and needs for data repositories in areas of research they support and to require “all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers [to] develop data management plans” (OSTP, 2013, p. 5). The memo requires investigators to specify appropriate data management processes and options for long-term data access and preservation. A full text of the 2013 memo is provided in Appendix C. Federal policy has also been developed for nondigital scientific collections in a 2014 memorandum that is included as Appendix D.
Agencies developed and are implementing plans responding to the 2013 memo. For example, the National Institute of Standards and Technology (NIST) is participating in the National Data Service project, the National Institutes of Health (NIH) has proposed a Data Commons for biomedical research data, and
the National Science Foundation (NSF) has launched the Open Knowledge Network. Overall, implementation of the OSTP memo is uneven across agencies (Kriesberg et al., 2017). A 2017 report by the Association of American Universities and the Association of Public and Land-Grant Universities points out the need for agencies to set clear, consistent requirements and for agencies and universities to work together more closely in order to avoid a situation where standards and solutions are fragmented and not interoperable (AAU-APLU, 2017).
Recognizing the importance of accelerating progress toward open science, the Laura and John Arnold Foundation requested that the National Academies of Sciences, Engineering, and Medicine (the National Academies) undertake a study on identifying and addressing the challenges of broadening access to the results of scientific research. The committee was tasked with focusing on how to move toward open science as the default for scientific research results, with specific recommendations to be implemented (see Box 1-1 for the full statement of task). While the working definition of open science provided by the sponsor of the study is described in Box 1-1, the committee envisions that open science aims to ensure the open availability and usability of scholarly publications, the data that result from scholarly research, and the methodology, including code or algorithms, that was used to generate those data. Openness and sharing of information are fundamental to the progress of science and to the effective functioning of the research enterprise. In addition, although some of the analysis and discussion in the report is relevant to the humanities or other research-based disciplines outside of science and engineering, openness as it relates to those disciplines is not explicitly addressed.
In undertaking this task, the committee builds on previous National Academies work on related issues. The National Academies’ first authoritative statement on research data issues and supporting openness came in 1985, for example (NRC, 1985). The 1997 report Bits of Power: Issues in Global Access to Scientific Data assessed a global perspective on open science and data in the natural sciences and identified strengths and challenges in the European community. In 2003, Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences examined key principles and recommended that open data should be the default approach for biologists, including sharing data, software, and materials related to their publications in scholarly journals (NRC, 2003). The 2009 report, Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, called on researchers to make all research data and methods publicly accessible in a timely manner (NAS-NAE-IOM, 2009). As copyright issues are closely linked to open publications, Copyright in the Digital Era: Building Evidence for Policy (2013a) called on federal agencies and foundations to support a broad range of empirical research studies to contribute to the comprehensive review of U.S. copyright law.
The National Academies have also made significant contributions to issues related to massive data and data sharing. Frontiers in Massive Data Analysis (2013b) provides perspective on generating, using, sharing, and analyzing massive amounts of data. The Institute of Medicine’s Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk (2015) offers detailed analysis and recommendations on the responsible sharing of clinical trial data. Most recently, several consensus reports were released in 2017 relating to open science. Fostering Integrity in Research includes several recommendations responding to integrity and reproducibility concerns. Recommendation seven in the report states, “federal funding agencies and other research sponsors should allocate sufficient funds to enable the long-term storage, archiving, and access of datasets and code necessary for the replication of published findings” (NASEM, 2017b, p.8). A final report that lays out a vision for future data science education was issued in 2018, and an expert committee is assessing research and data reproducibility and replicability issues as this report is written in 2018.
In discussing their approach to the task, the committee acknowledged that it was not asked to examine whether or not open science is good, but, rather, how to move it forward in ways that are beneficial to the scientific community. To accomplish its task, the committee held four 2-day face-to-face meetings to gather information from experts and develop findings and recommendations. Several virtual meetings were also held. As part of its evidence-gathering process, the committee organized a 1-day public symposium in September 2017 to explore specific examples of open science and discussed a range of challenges focusing on stakeholder perspectives. During the symposium, the committee heard speakers from professional journals, the private sector, philanthropic organizations, federal agencies, academic libraries, the research community, and scientific societies, who spoke on challenges, drivers, and progress toward an open science enterprise. The committee reviewed a large body of written material on open science concerns, including literature that informed the committee on how specific solutions in policy, infrastructure, incentives, and requirements could facilitate open science.
This report is organized into an introduction, four topical chapters, a chapter that frames and discusses the committee’s findings and recommendations, and appendixes. Elements one, two, and three of the committee’s task are mostly addressed in Chapters 2 and 3. Elements four, five, six, and seven are largely addressed in Chapters 4, 5, and 6.
Broadening Access to the Results of Scientific Research
Chapter 2 introduces the origins and significance of open science while analyzing advantages and motivations for and barriers to open science. Open science typically refers to the entire process of conducting science, including the collaborative underpinnings of the scientific enterprise. The contemporary focus on openness in science is spurred by opportunities for sharing knowledge via the Internet, and delivers multiple benefits to society: the right of taxpayers to gain access to the results of publicly funded research and the ability of researchers and non-researchers alike to retrieve, scrutinize, and build directly on the work of investigators around the world. Although the statement of task does not call on the committee to establish that open science constitutes a superior approach, the evidence that exists so far is presented and discussed.
As for barriers, infrastructure issues, such as policy, architecture, placement, and cost, become more complex as networked computers become the standard mode of scientific communication and much of scientific performance. Finally, the open science initiative is challenged to acknowledge disciplinary differences and to avoid unintended but potentially harmful violations of privacy, intellectual property, and national security.
The State of Open Science
Chapter 3 describes the general state and current approaches to open science, focusing on open publications and open data. As part of the committee’s task, the chapter includes illustrative examples drawn from the disciplines of biomedical sciences, economics, astronomy and astrophysics, and earth sciences, along with other examples from outside of those disciplines.
While the research enterprise makes steady progress toward open science, it must navigate a complex environment of socio-political, economic, and practical challenges. Individual universities must develop their own access policies, although by now there are many successful models to guide the way.
With regards to articles, methods of publication have proliferated, featuring increasing use of preprints (which have not yet been published in a journal) and open access journals (which are freely available online to readers). The economic power of for-profit publishers persists, largely because many authors prefer to publish in journals that are considered to be the most prestigious in their fields and because many for-profit journals do not charge the authors themselves, relying instead on subscription revenue. In moving toward open publications, the community must consider not only when to adopt new open models, but also how to transition from the current mixed environment of closed and open models.
In the case of data, the committee and other experts expect data to become a dominant resource in the open science ecosystem. If this expectation is borne out, questions of lifecycle, reproducibility, compliance, and sharing need to be addressed by all stakeholders. These include differing data-sharing publication
practices in big science and especially in long tail/small science. Researchers with collections of physical objects, such as geological and paleontological objects, are pressed to consider not only their physical collection and data management plans, but also accessibility, reuse, and other issues common to all data collections. As the open science movement advances at a global level, it brings a critical need to foster international cooperation.
A Vision for Open Science by Design
Chapter 4 describes how open science can be implemented “by design” by defining open science by design as a set of principles and practices that fosters openness throughout the entire research life-cycle. The reader is invited to imagine a world of complete open publication, where all steps of the research process are findable, accessible, interoperable, and reusable (FAIR). This chapter explores the steps by which a researcher can access published ideas, build on them through data mining and other techniques, find and make use of existing concepts and methods in the existing literature, develop new hypotheses or methods, seek funding for an original pilot study, and publish results in appropriate venues. The chapter also discusses the need for enabling technologies and strengthening training for open science by design.
Transitioning to Open Science by Design
Chapter 5 discusses the legal frameworks and the context for realizing open science by design shaped by the policies and requirements of research funders. The chapter also identifies possible options and transition pathways to open science by design, including paying for open science, mandates, community-based initiatives, changes in the business environment, and possible short- and long-term options.
Accelerating Progress to Open Science by Design
Recent recommendations from other organizations are reviewed, including the AAU-APLU report discussed above (AAU/APLU, 2017) and the European Open Science Cloud (EOSC) Declaration of October 2017 (European Commission, 2017a). All such recommendations call out the need for developing new infrastructure and tools that support open science and open data. The report concludes with the committee’s own findings, recommendations, and implementation actions specifying agencies, universities, or other organizations to guide stakeholder efforts to fostering open science by design.
The intended audiences for this report include researchers, universities, private and nonprofit organizations, information science communities such as publishers and journal editors, scientific societies, the philanthropic community, and federal agencies interested in improving access to the results of scientific research. In other words, this report provides specific policy and practice options for all
stakeholders, not just federal scientific agencies, to move toward open science as the default for the research they support. The committee hopes that the report will help these audiences better understand the possible barriers and facilitators, desirable data policies, and infrastructure requirements that would be required to implement open science.