Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Summary Openness and sharing of information are fundamental to the progress of science and to the effective functioning of the research enterprise. The advent of scientific journals in the 17th century helped power the Scientific Revolution by allowing researchers to communicate across time and space, using the technolo- gies of that era to generate reliable knowledge more quickly and efficiently. Har- nessing todayâs stunning, ongoing advances in information technologies, the global research enterprise and its stakeholders are moving toward a new open sci- ence ecosystem. Open science aims to ensure the free availability and usability of scholarly publications, the data that result from scholarly research, and the meth- odologies, including code or algorithms, that were used to generate those data. BENEFITS AND MOTIVATIONS The research enterprise has already made significant progress toward open science, and is realizing a number of benefits, with the expectation that these will expand in the future: â¢ Rigor and reliability. New standards for data and code sharing in fields such as biomedical research and psychology are making it easier for re- searchers to reproduce and replicate reported work, strengthening scien- tific rigor and reliability. â¢ Ability to address new questions. Open science allows researchers to bring data and perspectives from multiple fields to bear on their work, opening up new areas of inquiry and expanding the opportunities for in- terdisciplinary collaboration. â¢ Faster and more inclusive dissemination of knowledge. The proportion of scientific articles that are openly available is increasing, which acceler- ates the process of disseminating research and building on results. Open publication also allows broader, more inclusive participation in research and expands the possibilities of productive research collaboration within the United States and around the world. â¢ Broader participation in research. Large-scale projects in fields such as astronomy and ecology are utilizing open data and expanding opportuni- ties for citizen scientists to contribute to scientific advances. â¢ Effective use of resources. Reuse of data in fields such as clinical re- search is facilitating the aggregation of multiple studies for meta-analysis and allows for more effective testing of new hypotheses. 1
2 Open Science by Design: Realizing a Vision for 21st Century Research â¢ Improved performance of research tasks. New tools such as electronic lab notebooks enable more accurate recording of research workstreams and automate various data curation tasks. â¢ Open publication for public benefit. The belief that the broader public should have access to publicly-funded research and its benefits provides an additional strong rationale for open science. In the case of publicly- funded research, the ultimate sponsor is the taxpayer. The public benefits from open science as new knowledge is utilized more rapidly to improve health, protect environmental quality, and deliver new products and ser- vices. BARRIERS AND LIMITATIONS The benefits of open science are accruing to researchers themselves, re- search sponsors, research institutions, disciplines, and scholarly communicators. Yet despite the significant progress made in recent years toward creating an open science ecosystem, science today is not completely open. Most scientific articles are only available on a subscription basis. Sharing data, code, and other research products is becoming more common, but is still not routinely done across all dis- ciplines. Several important barriers remain, as well as limitations on the extent and speed with which open science can be realized. These include: â¢ Costs and infrastructure. There are significant remaining cost barriers to widespread implementation of open publication and open data. New tech- nological and institutional infrastructure within specific disciplines and across disciplines needs to be developed. â¢ Structure of scholarly communications. Most publications are still only available on a subscription basis, and some potential pathways to open publication may disrupt the current scholarly communications ecosystem, including scientific society publishers, or may disadvantage early career researchers, researchers working in the developing world, or those in in- stitutions with fewer resources. â¢ Lack of supportive culture, incentives and training. Open practices such as preparing datasets and code for sharing and making preprints available are not generally rewarded and may even be discouraged by cur- rent incentive and reward systems. This may have the unintended conse- quence of causing a disadvantage to early career researchers. â¢ Privacy, security, and proprietary barriers to sharing. Sharing data, code, and other research products is becoming more common, but barriers related to ensuring patient confidentiality and the protection of national se- curity information exist in some domains. Proprietary research also presents barriers. Ultimately, some parts of the research enterprise may not be open.
Summary 3 â¢ Disciplinary differences. The nature of research and practices surround- ing treatment of data and code differ by discipline and even within a dis- cipline. The size of datasets and the nature of some data may prevent immediate, complete sharing. Safeguards to prevent misuse or misrepre- sentation of data will be needed. ABOUT THE STUDY In 2017, the National Academies of Sciences, Engineering, and Medicine launched a study aimed at overcoming barriers and moving toward open science as the default approach across the research enterprise. The Laura and John Arnold Foundation provided financial support for the study. The authoring committee, established under the Board on Research Data and Information, met in person four times and held several virtual meetings to gather information from experts and develop findings and recommendations. As part of its evidence-gathering process, the committee organized a 1-day public symposium in September 2017 to explore specific examples of open science and discussed a range of challenges focusing on stakeholder perspectives. The committee also reviewed a large body of written material on open science concerns, including literature that informed the commit- tee on how specific solutions in policy, infrastructure, incentives, and require- ments could facilitate open science. The committee was not asked to examine whether or not open science is good, but, rather, how to move it forward in ways that are beneficial to the scientific community. Also, issues related to the research use of data generated in other contexts (e.g. social media data) are not considered. The statement of task is available in Chapter 1. AN INFLECTION POINT The open science movement stands at an important inflection point. A new generation of information technology tools and services holds the potential of fur- ther revolutionizing scientific practice. For example, the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. These tools and services will have maximum impact when used within an open science ecosystem that spans institutional, na- tional, and disciplinary boundaries. At the same time, a number of organizations around the world are adopting new policies and launching new initiatives aimed at fostering open science. Public and private research funders such as the Bill & Melinda Gates Foundation, the European Commission (EC), and the Wellcome Trust have introduced mandates and support systems to ensure that the results of the research they support are open. Publishers are adopting openness frameworks and strengthening require- ments to ensure that the data and methods underlying articles are available. In the United States, federal agencies have developed and implemented policies based on 2013 and 2014 memoranda from the White Houseâs Office of Science and
4 Open Science by Design: Realizing a Vision for 21st Century Research Technology Policy aimed at increasing public access to the results of research funded by the federal government. OPEN SCIENCE BY DESIGN The central aim of this study is to provide guidance to the research enter- prise and its stakeholders as they build strategies for achieving open science and take the next steps. In order to frame the issues and possible actions, the commit- tee developed the concept of open science by design, defined as a set of principles and practices that fosters openness throughout the entire research life cycle (Figure S-1). The researcher is at the center of the concept of open science by design. From the very beginning of the research process, the researcher both contributes to open science and takes advantage of the open science practices of other mem- bers of the research community. The overarching principle of open science by design is that research conducted openly and transparently leads to better science. The vision of open science by design suggests that all phases of the research process provide opportunities for assessing and improving the reliability and effi- cacy of scientific research. The concept visualized in Figure S-1 can be further described as follows: â¢ Provocation: explore or mine open research resources and use open tools to network with colleagues. Researchers have immediate access to the most recent publications and have the freedom to search archives of papers, including preprints, research software code, and other open publi- cations, as well as databases of research results, all without charge or other barriers. Researchers use the latest database and text mining tools to ex- plore these resources, to identify new concepts embedded in the research, and to identify where novel contributions can be made. Robust collabora- tive tools are available to network with colleagues.
Summary 5 FIGURE S-1 Phases of Open Science by Design in the research life cycle. SOURCE: Committee generated. â¢ Ideation: develop and revise research plans and prepare to share re- search results and tools under FAIR principles. Researchers and their collaborators develop and revise their research plans, collect preliminary data from publicly available data repositories, and conduct a pilot study to test their new methods on the existing data. When applying for research funding, they develop the required data management plans, stating where data, workflow, and software code will be available for use by other re- searchers under FAIR (Findable-Accessible-Interoperable-Reusable) prin- ciples. In addition, in some cases, they may decide to pre-register their re- search plans and protocols in an open repository. â¢ Knowledge generation: collect data, conduct research using tools compatible with open sharing, and use automated workflow tools to ensure accessibility of research outputs. Researchers collect data, using tools that automate formatting and curation tasks to ensure that digital da- tasets are interoperable and documented. In the case of physical samples and specimens, such as rocks, ice core samples, or tissue samples, re- searchers develop concrete plans to archive these according to disciplinary best practices. With the availability of open software, the researcher can document approaches to cleaning and preparing data for analysis in an electronic research notebook. â¢ Validation: prepare data and tools for reproducibility and reuse and participate in replication studies. Researchers use open data techniques to analyze, interpret, and validate findings. They may present their prelim- inary findings at conferences and refine their methods based on relevant
6 Open Science by Design: Realizing a Vision for 21st Century Research comments and critiques. They may deposit their initial working paper in a preprint server and revise the paper based on the open peer review af- forded by the service. They prepare their data in standard formats accord- ing to disciplinary standards and describe both data and analytical code in optimal ways for reuse and replication. â¢ Dissemination: use appropriate licenses for sharing research outputs and report all results and supporting information (data, code, articles, etc.). Researchers select the best venue for open publication of their work, including articles, data, code, and other research products. They revise and, in some cases, substantially improve their work based on the com- ments of the peer reviewers. Upon acceptance and before final submission of their work, they select a public copyright license, such as the GNU Gen- eral Public License for software or a Creative Commons license for other works, including scholarly articles. â¢ Preservation: deposit research outputs in FAIR archives and ensure long-term access to research results. Researchers deposit the final peer- reviewed articles in an openly accessible archive as required by their re- search funders. They deposit their research data and software in one or more data archives, with clear and persistent links among the article, data, and software. These FAIR data are then used by other researchers in the provocation phase of their own work. The committeeâs concept of open science by design is by necessity general and idealized. Some discipline-specific nuances cannot be captured in such a broad concept. For example, there are fields where preregistration may not make sense or add value. Other challenges arise from the size or complexity of data. An important and emerging type of data are the very large datasets that capture ex- tremely rare, time-sensitive events. Subtleties in this data and their generation may not be readily captured without detailed knowledge of how the data were col- lected. Also, and importantly, open science by design is intended as a framework to empower the researcher. As expressed in other National Academies work, the principle for openness of data and other information underlying reported results is that they should be available no later than the time of publication, or when the researcher is seeking to gain credit for the work (NRC, 2003, 2009). For journal publication, any sharing prior to the point of final publication is up to the re- searcher, who is in full control of the decision of when to share. The committee believes that as open science by design becomes the norm, researchers will find that they benefit from sharing and collaborating early in the research process. ACCELERATING PROGRESS Achieving open science will require persistent, coordinated actions on the part of research enterprise stakeholders. The committee has developed findings,
Summary 7 recommendations, and implementation actions based on its review and synthesis of the information gathered throughout the course of the study. The complete set of findings is contained in Chapter 6 with the recommendations and implementa- tion actions. Building a Supportive Culture The specific ways in which cultural barriers to open science operate vary significantly by field or discipline. Overuse and misuse of bibliographic metrics such as the Journal Impact Factor in the evaluation of research and researchers is one important âbugâ in the operation of the research enterprise that has a detri- mental effect across disciplines. The perception and/or reality that researchers need to publish in certain venues in order to secure funding and career advance- ment may lock researchers into traditional, closed mechanisms for reporting re- sults and sharing research products. These pressures are particularly strong for early career researchers. Initiatives such as the San Francisco Declaration on Research Assessment seek to achieve broad buy-in on the part of stakeholders to move toward evalua- tion systems that use other methodologies. Concrete actions, such as the National Institutes of Health (2017a) decision to encourage investigators to use and cite interim research products such as preprints in seeking funding, can have a bene- ficial effect. Continued effort by stakeholders, working internationally and across disci- plinary boundaries, is needed to change evaluation practices and introduce other incentives so that the cultural environment of research better supports and rewards open practices. Recommendation One Research institutions should work to create a culture that actively supports Open Science by Design by better rewarding and supporting researchers en- gaged in open science practices. Research funders should provide explicit and consistent support for practices and approaches that facilitate this shift in culture and incentives. Implementation Actions â¢ Universities and other research institutions should explicitly reward the effort needed to make science open by design. â¢ Universities and other research institutions should partner with federal agencies in developing innovative approaches to assessing the impact of research in ways that include the impact of open science outputs. This should include, but is not limited to, the development of metrics for as- sessing the impact of interim research products such as preprints, with a
8 Open Science by Design: Realizing a Vision for 21st Century Research view toward comparing those with existing methods for measuring im- pact. â¢ Universities and other research institutions should move toward evaluat- ing published data and other research products in addition to published articles as part of the promotion and tenure process. Archived data should be valued, just as the publications that result from them are valued. â¢ Researchers should make full use of the many opportunities that are avail- able for making their research products openly available, and they should include that information in their curriculum vitae so that they can be ap- propriately credited and rewarded. â¢ In fields where this is not already common practice, research funders should encourage and reward the use of data and other research products that are available in publicly accessible databases. â¢ Universities and other research institutions should encourage and reward studies that focus on the replication and reproducibility of published re- search. Such studies should be published and made openly available. Training for Open Science by Design The report discusses several initiatives that emphasize training in open sci- ence and reproducibility. The emergence of data science as a recognized interdis- ciplinary field has highlighted the need for new educational content and ap- proaches related to data (NASEM, 2018a). Several federal agencies require that students or trainees supported by grants receive training in the responsible conduct of research, or RCR (NASEM, 2017b). Training and education that covers issues such as open science and repro- ducibility would complement the existing focus of RCR education and orient these programs toward supporting both research integrity and quality. Recommendation Two Research institutions and professional societies should train students and other researchers to implement open science practices effectively and should support the development of educational programs that foster Open Science by Design. Implementation Actions â¢ Universities should provide training in best practices for open science and data stewardship as part of the regular curriculum in graduate and post- graduate education and should expect these practices as a default in all onboarding/orientation processes of universities, including new student orientation, new faculty orientation, library orientations, and lab training.
Summary 9 Course curricula should be developed and implemented to complement domain-specific courses that support open science by design. â¢ Research funders should support the development of training programs in the principles and practices of open science by design. Federal agencies should require this training as part of all federally funded graduate training grants (e.g., NSF research traineeships and NIH training grants) to foster open science by design. â¢ Library and information science schools, professional societies, and other interested organizations should develop course curricula and offer courses in the principles and practices of open science by design. â¢ Research funders and professional societies should create programs or contests that seek the creative and innovative integration and (re)use of open data for new and impactful research. â¢ The private sector and other interested parties should create innovative educational tools for open science principles and practices. Ensuring Long-Term Preservation and Stewardship The issues and challenges related to preservation and stewardship of re- search products, particularly data, code, and other non-article products, are con- sidered in several places in the report. On the one hand, some of the technical and cost barriers to long-term data stewardship are falling, as tools for automated metadata tagging and classification become more widely used and data storage becomes cheaper over time. At the same time, the outputs of research continue to grow in volume and complexity, meaning that significant additional resources will still be required. For example, an important and emerging type of data are the very large datasets that capture extremely rare, time-sensitive events. Subtleties in these data and their generation may not be readily captured without detailed knowledge of how the data were collected. Developing and sustaining the infrastructure required for long-term stew- ardship of research products will present a continuing challenge. This report does not contain a detailed cost estimate and timeline for meeting these needs. Yet sev- eral of the immediate priorities and initial steps do not, in themselves, require the expenditure of significant resources. Research communities can start by develop- ing guidelines and criteria for determining what data and other research products should be preserved and for how long. Clearly, not everything needs to be pre- served. Federal agencies that require data management plans in grant applications can better clarify guidance for compliance expectations and institutional respon- sibilities. The work of developing necessary standards and policies on the part of stakeholders will enable effective planning of new infrastructure and associated financing. It is also important that approaches are flexible enough to adapt and change over time. The size and complexity of data in many fields are changing rapidly,
10 Open Science by Design: Realizing a Vision for 21st Century Research so that the solutions that are effective today might not be effective in a few years. At the same time, we have seen new tools and platforms continue to emerge that allow researchers to address challenges that were previously intractable. Recommendation Three Research funders and research institutions should develop the policies and procedures to identify the data, code, specimens, and other research products that should be preserved for long-term public availability, and they should provide the resources necessary for the long-term preservation and steward- ship of those research products. Implementation Actions â¢ Research institutions, professional societies and research funders should work together to develop selection guidelines and long-term stewardship best practices for the most valuable community datasets and other research products. â¢ Federal agencies should, consistent with the 2013 and 2014 Office of Sci- ence and Technology Policy (OSTP, 2013, 2014) memoranda for expand- ing public access to the results of federally funded research, continue to develop and standardize requirements for research products planning, management, reporting, and stewardship. â¢ Private research funders who have not already done so should adopt ap- proaches compatible with those developed for publicly funded research products planning, management, reporting, and stewardship. â¢ Researchers should describe the plan for dissemination and stewardship of their research products with some specificity, consistent with the stand- ardized sponsor requirements described above, including where their re- search products will be made publicly available and for what period of time. â¢ Research funders and research institutions should work together to re- source and provide the infrastructure needed for long-term preservation, stewardship, and community control of research products. This infrastruc- ture could be supported through direct costs or through an ear-marked per- centage of each funded grant. Facilitating Data Discovery, Reuse, and Reproducibility As progress toward open science by design continues, it is important that the community adhere to the ultimate goal of achieving the availability of research products under open principles. Utilizing advanced machine learning tools in an-
Summary 11 alyzing datasets or literature, for example, will facilitate new insights and discov- eries. Ensuring FAIR access should be a key consideration in deciding how to build repositories and other new resources. As is the case with ensuring long-term stewardship, new standards should be developed by funders in collaboration with research institutions and research- ers. Fields and disciplines that do not already have well-developed standards and practices for making research products available under FAIR principles will need time and help to create them. Where meeting new standards imposes costs, fun- ders should make the necessary resources available, thereby avoiding the imposi- tion of unfunded mandates. Specific actions enabling a transition need to be de- veloped in a transparent manner, and avoid disrupting researchers and their work to the extent possible. Recommendation Four Funders that support the development of research archives should work to ensure that these are designed and implemented according to the FAIR data principles. Researchers should seek to ensure that their research products are made available according to the FAIR principles and state with specific- ity any exceptions based on legal and ethical considerations. Implementation Actions â¢ Researchers should preferentially use open repositories that have been de- signed for interoperability and ease of discovery. â¢ Research funders should work to ensure that research products are availa- ble in repositories that allow for bulk transfer of digital objects to devel- opers or users of automated discovery and analysis tools. â¢ Researchers and research funders should require that research products designated for long-term preservation and stewardship are assigned per- sistent unique digital identifiers. â¢ Professional societies and research funders should support efforts to net- work and federate existing repositories for improved discoverability. â¢ Research funders should continue to support the development of methods and tools that improve the interoperability of heterogeneous data. Metadata schemes, commonly accepted workflows for the processing and analysis of data, and other standards should be developed and used for improved data discovery. â¢ Research funders should commission an independent assessment of the state of university and federal data archives. The assessment should ad- dress how the FAIR principles have or have not been adhered to and make recommendations for improving accessibility to distributed or federated archives.
12 Open Science by Design: Realizing a Vision for 21st Century Research Developing New Approaches to Fostering Open Science by Design There is a great deal of activity on the part of public and private research funders, research institutions, commercial and nonprofit publishers, community- organized groups and others aimed at preparing for and shaping a future research enterprise characterized by open science. Significant progress has been made, but a great deal of work needs to be done before open science by design is a reality. The committee focused on the choices facing U.S. organizations and institutions, realizing that the transition to open science by design is inherently a global pro- cess. Effective dissemination will remain central to the advance of knowledge in the emerging open science era. Considerable resources are devoted to the publi- cation of research results, much of them flowing to for-profit publishing compa- nies or to nonprofit scientific societies. Many scientific societies generate sur- pluses through their publishing activities that support their professional ecosystems, and some would be severely challenged by some approaches to im- plementing open publication. At the same time, research institutions are currently experiencing difficulty in absorbing the steady increases in subscription rates of recent years. Although scientific journals and articles will likely continue to play im- portant roles for the foreseeable future, it is clear that the institutions and practices that support the dissemination of research will continue to evolve. Fully open pub- lications are immediately accessible to all researchers at no cost and are available to all researchers under a copyright license that permits them to perform text and data mining or other productive reuses of the literature without the need for any negotiations or further permissions. While some subscription publishers have be- gun to offer researchers some forms of access for text and data mining and other productive reuses, their terms of access usually impose some restrictions on reuse. The past several decades have seen the printed journal eclipsed by online distribution of research results. Datasets and other non-article research products will be increasingly valued and become a more significant focus of dissemination efforts. New venues for disseminating research have emerged and will continue to appear and grow. The future evolution of research dissemination should be shaped by the changing needs of researchers and the broader enterprise, including the need to ensure openness. Issues of cost and sustainability should be considered from the standpoint of researchers. In developing new policies and support structures, re- search funders and research institutions should favor dissemination approaches that are responsive to community needs, and they should be transparent about their practices and costs. Certain approaches to implementing open publication have the potential to affect the research ecosystem in significant ways, with differential impacts on dif- ferent stakeholders. For example, a system that strongly favors publication ap- proaches based on the payment of article processing charges would favor estab- lished researchers and wealthy institutions over early career researchers and
Summary 13 institutions with fewer resources. In planning new policies and transitions, it will be necessary to anticipate differential impacts to the extent possible, consider ways of avoiding these, and build in evaluative and corrective mechanisms to ad- dress unanticipated consequences. Public and private funders have made significant contributions to fostering open science to this point. They should continue to support initiatives that accel- erate progress, and evaluate and revise their policies as needed. Recommendation Five The research community should work together to realize Open Science by Design to advance science and help science better serve the needs of society. Implementation Actions â¢ The federal government should revisit and update its open science policy, which is expressed in the 2013 and 2014 OSTP memoranda. â¢ Funders, institutions, and researchers should align policies and incentives to realize open publication, including rights-retention provisions. â¢ Research funders should support the establishment of a consortium of re- search community stakeholders to develop additional concrete methods for implementing open science by design. â¢ Professional societiesâindividually and collectivelyâshould work to transition from current publication strategies to new ones that foster open science by design. â¢ Journal editors should work with publishers to transition from current business models to new ones that foster open science by design. â¢ Research funders should explore innovative means to support the transi- tion from subscription-based systems to new publication strategies that enable open science by design. â¢ Librarians should work together with other members of the research com- munity to promote and implement open science by design. â¢ The research community should develop tools and other applications that depend on the long-term availability of open research products, thereby providing new sources of revenue for the private sector, enhancing the value of research products, and leading to an acceleration of scientific pro- gress.