Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
1 Introduction Organized science has always relied on the willingness of researchers to share their results, allowing others to test and build on their work. According to the Royal Society (2012), âopen communication and deliberation sit at the heart of scientific practice.â The digital revolution of the past several decades has greatly expanded the scope and benefits of openness by making it possible for researchers to share and access scientific articles, the data underlying reported results, the methods used to generate and analyze data such as computer code, and other products of research. Openness increases transparency and reliability, facil- itates more effective collaboration, accelerates the pace of discovery, and fosters broader and more equitable access to scientific knowledge and to the research process itself. Many consider the 2002 launch of the Budapest Open Access Initiative (BOAI)âwhich called for free and open online access to the scientific litera- tureâto mark the formal beginning of the open access movement (BOAI, 2002). In the years since, the emphasis of this movement has broadened from its original focus on open access to articles, and has come to include data, code, and other research products. What we know today as open science comprises both principles (transparency, reuse, participation, accountability, etc.) and practices (open pub- lications, data-sharing, citizen science, etc.) (Open Science Training Handbook, 2018). Open science stands at an important inflection point. A new generation of information technology (IT) tools and services holds the potential of further rev- olutionizing scientific practice. For example, the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypothe- ses faster and more efficient. In order to have maximum impact, these tools and services need to be utilized as part of an open science ecosystem that spans insti- tutional, national, and disciplinary boundaries. Yet, despite the significant progress that has been made to create that eco- system, todayâs science is not completely open. Most scientific articles are only available on a subscription basis (European Commission, 2018a). Sharing data, code, and other research products is becoming more common, but is still not rou- tinely done across all disciplines (Figshare, 2017). Limitations and barriers to 15
16 Open Science by Design: Realizing a Vision for 21st Century Research more rapid progress include an academic culture and researcher incentives that can work against open science, insufficient infrastructure and training, issues re- lated to data privacy and national security, disciplinary differences in the nature of research and treatment of data, and the economic structure of the scholarly communications market. Research enterprise stakeholders around the world are making substantial efforts to facilitate and expedite the transition to open science. The European Commission has made the creation of a European Open Science Cloud one of its policy priorities (European Commission, 2018a). Other private and public fun- ders, such as the Bill & Melinda Gates Foundation and UK Research and Innova- tion (the coordinating body for the United Kingdomâs public research councils), have adopted policies to support open science. Science International (2015) as- sessed the âboundaries of opennessâ and proposed 12 principles to guide the prac- tice and practitioners of open data, while the AcadÃ©mie des Sciences (France), German National Academy of Sciences Leopoldina, and the Royal Society (2016, 2017) jointly issued statements on scientific publications and good practice. There is a growing world-wide consensus in the scientific community that the transition to open science, particularly in relation to digital data and code, can best be achieved by the establishment of a globally interoperable research infra- structure. A number of evolving projects around the globe, such as the Global Open (GO) FAIR Initiative, which originated in Europe, focus on involving all networked initiatives, research disciplines, and interested Member States of the European Union to make research data findable, accessible, interoperable, and reusable (FAIR) (GO FAIR, 2018). The international Research Data Alliance (RDA) and other groups have similar goals for international scientific data man- agement. Societies, scholarly communicators, and the library community are also adopting policies and launching initiatives aimed at fostering a transition to open science. In the United States, the Office of Science and Technology Policy (OSTP) issued a memorandum in 2013 instructing all federal agencies that spend more than $100 million per year on research and development to âdevelop a plan to support increased public access to the results of research funded by the Federal Governmentâ (OSTP, 2013, p. 5). It also directs agencies to review options and needs for data repositories in areas of research they support and to require âall extramural researchers receiving Federal grants and contracts for scientific re- search and intramural researchers [to] develop data management plansâ (OSTP, 2013, p. 5). The memo requires investigators to specify appropriate data manage- ment processes and options for long-term data access and preservation. A full text of the 2013 memo is provided in Appendix C. Federal policy has also been devel- oped for nondigital scientific collections in a 2014 memorandum that is included as Appendix D. Agencies developed and are implementing plans responding to the 2013 memo. For example, the National Institute of Standards and Technology (NIST) is participating in the National Data Service project, the National Institutes of Health (NIH) has proposed a Data Commons for biomedical research data, and
Introduction 17 the National Science Foundation (NSF) has launched the Open Knowledge Net- work. Overall, implementation of the OSTP memo is uneven across agencies (Kriesberg et al., 2017). A 2017 report by the Association of American Universi- ties and the Association of Public and Land-Grant Universities points out the need for agencies to set clear, consistent requirements and for agencies and universities to work together more closely in order to avoid a situation where standards and solutions are fragmented and not interoperable (AAU-APLU, 2017). CONTEXT FOR THE STUDY Recognizing the importance of accelerating progress toward open science, the Laura and John Arnold Foundation requested that the National Academies of Sciences, Engineering, and Medicine (the National Academies) undertake a study on identifying and addressing the challenges of broadening access to the results of scientific research. The committee was tasked with focusing on how to move toward open science as the default for scientific research results, with specific recommendations to be implemented (see Box 1-1 for the full statement of task). While the working definition of open science provided by the sponsor of the study is described in Box 1-1, the committee envisions that open science aims to ensure the open availability and usability of scholarly publications, the data that result from scholarly research, and the methodology, including code or algorithms, that was used to generate those data. Openness and sharing of information are funda- mental to the progress of science and to the effective functioning of the research enterprise. In addition, although some of the analysis and discussion in the report is relevant to the humanities or other research-based disciplines outside of science and engineering, openness as it relates to those disciplines is not explicitly ad- dressed. In undertaking this task, the committee builds on previous National Acade- mies work on related issues. The National Academiesâ first authoritative state- ment on research data issues and supporting openness came in 1985, for example (NRC, 1985). The 1997 report Bits of Power: Issues in Global Access to Scientific Data assessed a global perspective on open science and data in the natural sci- ences and identified strengths and challenges in the European community. In 2003, Sharing Publication-Related Data and Materials: Responsibilities of Au- thorship in the Life Sciences examined key principles and recommended that open data should be the default approach for biologists, including sharing data, soft- ware, and materials related to their publications in scholarly journals (NRC, 2003). The 2009 report, Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, called on researchers to make all research data and methods publicly accessible in a timely manner (NAS-NAE-IOM, 2009). As copyright issues are closely linked to open publications, Copyright in the Digital Era: Building Evidence for Policy (2013a) called on federal agencies and founda- tions to support a broad range of empirical research studies to contribute to the comprehensive review of U.S. copyright law.
18 Open Science by Design: Realizing a Vision for 21st Century Research BOX 1-1 Committee Statement of Task Wide access to scientific research results has proven to be an important tool for accelerating scientific progress. An ad hoc committee under the Board on Research Data and Information (BRDI) will conduct a study on the challenges of broadening access to the results of scientific research, described as âopen science.â Open science is defined, for the purposes of this study, as public access (i.e., no charge for access beyond the cost of an internet connection) to scholarly articles resulting from research projects, the data that support the results contained in those articles, computer code, algorithms, and other dig- ital products of publicly funded scientific research, so that the products of this research are findable, accessible, interoperable, and reusable (FAIR), with limited exceptions for privacy, proprietary business claims, and national se- curity. This study focuses on how to move toward open science as the default for scientific research results and includes the following tasks: 1. Provide a cursory overview of the extent to which scientific and engi- neering disciplines currently practice open science; 2. Identify the barriers to and facilitators of open science, such as cultural norms, incentives, service provider business models, policies, available infrastructure, education/training, and formal and informal data man- agement processes, and illustrate these barriers and facilitators in at least four scientific disciplines from the biological sciences, social sci- ences, physical sciences, and earth sciences; 3. Describe how policies and practices of participants in the research en- terprise, such as funders, publishers, journal editors, research institu- tions, scientific societies, researchers, service providers, and the pri- vate sector, are affecting open science; 4. Recommend specific solutions in policy, infrastructure, incentives and requirements that would facilitate open science; 5. Identify existing implementations of these solutions occurring in individual disciplines that could be extended to other disciplines (e.g., preprints), and demonstrations of proofs-of-concept that need to be brought to scale (e.g., preregistration for basic and preclinical research); 6. For potential solutions with no existing demonstrations, identify practi- cal implementation steps, policies, and appropriate stakeholder roles to develop solutions; 7. Provide specific policy and practice options for Federal science agen- cies to move toward open science as the default for the research they support. The committee will produce a consensus report with findings and recommen- dations that address these issues, with the majority of the focus on solutions that move the research enterprise toward open science.
Introduction 19 The National Academies have also made significant contributions to issues related to massive data and data sharing. Frontiers in Massive Data Analysis (2013b) provides perspective on generating, using, sharing, and analyzing mas- sive amounts of data. The Institute of Medicineâs Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk (2015) offers detailed analysis and recom- mendations on the responsible sharing of clinical trial data. Most recently, several consensus reports were released in 2017 relating to open science. Fostering Integrity in Research includes several recommendations responding to integrity and reproducibility concerns. Recommendation seven in the report states, âfederal funding agencies and other research sponsors should allocate sufficient funds to enable the long-term storage, archiving, and access of datasets and code necessary for the replication of published findingsâ (NASEM, 2017b, p.8). A final report that lays out a vision for future data science education was issued in 2018, and an expert committee is assessing research and data reproducibility and replicability issues as this report is written in 2018. STUDY PROCESS In discussing their approach to the task, the committee acknowledged that it was not asked to examine whether or not open science is good, but, rather, how to move it forward in ways that are beneficial to the scientific community. To accomplish its task, the committee held four 2-day face-to-face meetings to gather information from experts and develop findings and recommendations. Several vir- tual meetings were also held. As part of its evidence-gathering process, the com- mittee organized a 1-day public symposium in September 2017 to explore specific examples of open science and discussed a range of challenges focusing on stake- holder perspectives. During the symposium, the committee heard speakers from professional journals, the private sector, philanthropic organizations, federal agencies, academic libraries, the research community, and scientific societies, who spoke on challenges, drivers, and progress toward an open science enterprise. The committee reviewed a large body of written material on open science con- cerns, including literature that informed the committee on how specific solutions in policy, infrastructure, incentives, and requirements could facilitate open sci- ence. STRUCTURE OF THE REPORT This report is organized into an introduction, four topical chapters, a chapter that frames and discusses the committeeâs findings and recommendations, and appendixes. Elements one, two, and three of the committeeâs task are mostly ad- dressed in Chapters 2 and 3. Elements four, five, six, and seven are largely ad- dressed in Chapters 4, 5, and 6.
20 Open Science by Design: Realizing a Vision for 21st Century Research Broadening Access to the Results of Scientific Research Chapter 2 introduces the origins and significance of open science while an- alyzing advantages and motivations for and barriers to open science. Open science typically refers to the entire process of conducting science, including the collabo- rative underpinnings of the scientific enterprise. The contemporary focus on open- ness in science is spurred by opportunities for sharing knowledge via the Internet, and delivers multiple benefits to society: the right of taxpayers to gain access to the results of publicly funded research and the ability of researchers and non-re- searchers alike to retrieve, scrutinize, and build directly on the work of investiga- tors around the world. Although the statement of task does not call on the com- mittee to establish that open science constitutes a superior approach, the evidence that exists so far is presented and discussed. As for barriers, infrastructure issues, such as policy, architecture, place- ment, and cost, become more complex as networked computers become the stand- ard mode of scientific communication and much of scientific performance. Fi- nally, the open science initiative is challenged to acknowledge disciplinary differences and to avoid unintended but potentially harmful violations of privacy, intellectual property, and national security. The State of Open Science Chapter 3 describes the general state and current approaches to open sci- ence, focusing on open publications and open data. As part of the committeeâs task, the chapter includes illustrative examples drawn from the disciplines of bio- medical sciences, economics, astronomy and astrophysics, and earth sciences, along with other examples from outside of those disciplines. While the research enterprise makes steady progress toward open science, it must navigate a complex environment of socio-political, economic, and practi- cal challenges. Individual universities must develop their own access policies, alt- hough by now there are many successful models to guide the way. With regards to articles, methods of publication have proliferated, featuring increasing use of preprints (which have not yet been published in a journal) and open access journals (which are freely available online to readers). The economic power of for-profit publishers persists, largely because many authors prefer to publish in journals that are considered to be the most prestigious in their fields and because many for-profit journals do not charge the authors themselves, rely- ing instead on subscription revenue. In moving toward open publications, the community must consider not only when to adopt new open models, but also how to transition from the current mixed environment of closed and open models. In the case of data, the committee and other experts expect data to become a dominant resource in the open science ecosystem. If this expectation is borne out, questions of lifecycle, reproducibility, compliance, and sharing need to be addressed by all stakeholders. These include differing data-sharing publication
Introduction 21 practices in big science and especially in long tail/small science. Researchers with collections of physical objects, such as geological and paleontological objects, are pressed to consider not only their physical collection and data management plans, but also accessibility, reuse, and other issues common to all data collections. As the open science movement advances at a global level, it brings a critical need to foster international cooperation. A Vision for Open Science by Design Chapter 4 describes how open science can be implemented âby designâ by defining open science by design as a set of principles and practices that fosters openness throughout the entire research life-cycle. The reader is invited to imag- ine a world of complete open publication, where all steps of the research process are findable, accessible, interoperable, and reusable (FAIR). This chapter explores the steps by which a researcher can access published ideas, build on them through data mining and other techniques, find and make use of existing concepts and methods in the existing literature, develop new hypotheses or methods, seek fund- ing for an original pilot study, and publish results in appropriate venues. The chap- ter also discusses the need for enabling technologies and strengthening training for open science by design. Transitioning to Open Science by Design Chapter 5 discusses the legal frameworks and the context for realizing open science by design shaped by the policies and requirements of research funders. The chapter also identifies possible options and transition pathways to open sci- ence by design, including paying for open science, mandates, community-based initiatives, changes in the business environment, and possible short- and long- term options. Accelerating Progress to Open Science by Design Recent recommendations from other organizations are reviewed, including the AAU-APLU report discussed above (AAU/APLU, 2017) and the European Open Science Cloud (EOSC) Declaration of October 2017 (European Commis- sion, 2017a). All such recommendations call out the need for developing new in- frastructure and tools that support open science and open data. The report con- cludes with the committeeâs own findings, recommendations, and implementation actions specifying agencies, universities, or other organizations to guide stake- holder efforts to fostering open science by design. The intended audiences for this report include researchers, universities, pri- vate and nonprofit organizations, information science communities such as pub- lishers and journal editors, scientific societies, the philanthropic community, and federal agencies interested in improving access to the results of scientific research. In other words, this report provides specific policy and practice options for all
22 Open Science by Design: Realizing a Vision for 21st Century Research stakeholders, not just federal scientific agencies, to move toward open science as the default for the research they support. The committee hopes that the report will help these audiences better understand the possible barriers and facilitators, desir- able data policies, and infrastructure requirements that would be required to im- plement open science.