Openness and sharing of information are fundamental to the progress of science and to the effective functioning of the research enterprise. The advent of scientific journals in the 17th century helped power the Scientific Revolution by allowing researchers to communicate across time and space, using the technologies of that era to generate reliable knowledge more quickly and efficiently. Harnessing today’s stunning, ongoing advances in information technologies, the global research enterprise and its stakeholders are moving toward a new open science ecosystem. Open science aims to ensure the free availability and usability of scholarly publications, the data that result from scholarly research, and the methodologies, including code or algorithms, that were used to generate those data.
BENEFITS AND MOTIVATIONS
The research enterprise has already made significant progress toward open science, and is realizing a number of benefits, with the expectation that these will expand in the future:
- Rigor and reliability. New standards for data and code sharing in fields such as biomedical research and psychology are making it easier for researchers to reproduce and replicate reported work, strengthening scientific rigor and reliability.
- Ability to address new questions. Open science allows researchers to bring data and perspectives from multiple fields to bear on their work, opening up new areas of inquiry and expanding the opportunities for interdisciplinary collaboration.
- Faster and more inclusive dissemination of knowledge. The proportion of scientific articles that are openly available is increasing, which accelerates the process of disseminating research and building on results. Open publication also allows broader, more inclusive participation in research and expands the possibilities of productive research collaboration within the United States and around the world.
- Broader participation in research. Large-scale projects in fields such as astronomy and ecology are utilizing open data and expanding opportunities for citizen scientists to contribute to scientific advances.
- Effective use of resources. Reuse of data in fields such as clinical research is facilitating the aggregation of multiple studies for meta-analysis and allows for more effective testing of new hypotheses.
- Improved performance of research tasks. New tools such as electronic lab notebooks enable more accurate recording of research workstreams and automate various data curation tasks.
- Open publication for public benefit. The belief that the broader public should have access to publicly-funded research and its benefits provides an additional strong rationale for open science. In the case of publicly-funded research, the ultimate sponsor is the taxpayer. The public benefits from open science as new knowledge is utilized more rapidly to improve health, protect environmental quality, and deliver new products and services.
BARRIERS AND LIMITATIONS
The benefits of open science are accruing to researchers themselves, research sponsors, research institutions, disciplines, and scholarly communicators. Yet despite the significant progress made in recent years toward creating an open science ecosystem, science today is not completely open. Most scientific articles are only available on a subscription basis. Sharing data, code, and other research products is becoming more common, but is still not routinely done across all disciplines. Several important barriers remain, as well as limitations on the extent and speed with which open science can be realized. These include:
- Costs and infrastructure. There are significant remaining cost barriers to widespread implementation of open publication and open data. New technological and institutional infrastructure within specific disciplines and across disciplines needs to be developed.
- Structure of scholarly communications. Most publications are still only available on a subscription basis, and some potential pathways to open publication may disrupt the current scholarly communications ecosystem, including scientific society publishers, or may disadvantage early career researchers, researchers working in the developing world, or those in institutions with fewer resources.
- Lack of supportive culture, incentives and training. Open practices such as preparing datasets and code for sharing and making preprints available are not generally rewarded and may even be discouraged by current incentive and reward systems. This may have the unintended consequence of causing a disadvantage to early career researchers.
- Privacy, security, and proprietary barriers to sharing. Sharing data, code, and other research products is becoming more common, but barriers related to ensuring patient confidentiality and the protection of national security information exist in some domains. Proprietary research also presents barriers. Ultimately, some parts of the research enterprise may not be open.
- Disciplinary differences. The nature of research and practices surrounding treatment of data and code differ by discipline and even within a discipline. The size of datasets and the nature of some data may prevent immediate, complete sharing. Safeguards to prevent misuse or misrepresentation of data will be needed.
ABOUT THE STUDY
In 2017, the National Academies of Sciences, Engineering, and Medicine launched a study aimed at overcoming barriers and moving toward open science as the default approach across the research enterprise. The Laura and John Arnold Foundation provided financial support for the study. The authoring committee, established under the Board on Research Data and Information, met in person four times and held several virtual meetings to gather information from experts and develop findings and recommendations. As part of its evidence-gathering process, the committee organized a 1-day public symposium in September 2017 to explore specific examples of open science and discussed a range of challenges focusing on stakeholder perspectives. The committee also reviewed a large body of written material on open science concerns, including literature that informed the committee on how specific solutions in policy, infrastructure, incentives, and requirements could facilitate open science. The committee was not asked to examine whether or not open science is good, but, rather, how to move it forward in ways that are beneficial to the scientific community. Also, issues related to the research use of data generated in other contexts (e.g. social media data) are not considered. The statement of task is available in Chapter 1.
AN INFLECTION POINT
The open science movement stands at an important inflection point. A new generation of information technology tools and services holds the potential of further revolutionizing scientific practice. For example, the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. These tools and services will have maximum impact when used within an open science ecosystem that spans institutional, national, and disciplinary boundaries.
At the same time, a number of organizations around the world are adopting new policies and launching new initiatives aimed at fostering open science. Public and private research funders such as the Bill & Melinda Gates Foundation, the European Commission (EC), and the Wellcome Trust have introduced mandates and support systems to ensure that the results of the research they support are open. Publishers are adopting openness frameworks and strengthening requirements to ensure that the data and methods underlying articles are available. In the United States, federal agencies have developed and implemented policies based on 2013 and 2014 memoranda from the White House’s Office of Science and
Technology Policy aimed at increasing public access to the results of research funded by the federal government.
OPEN SCIENCE BY DESIGN
The central aim of this study is to provide guidance to the research enterprise and its stakeholders as they build strategies for achieving open science and take the next steps. In order to frame the issues and possible actions, the committee developed the concept of open science by design, defined as a set of principles and practices that fosters openness throughout the entire research life cycle (Figure S-1).
The researcher is at the center of the concept of open science by design. From the very beginning of the research process, the researcher both contributes to open science and takes advantage of the open science practices of other members of the research community. The overarching principle of open science by design is that research conducted openly and transparently leads to better science. The vision of open science by design suggests that all phases of the research process provide opportunities for assessing and improving the reliability and efficacy of scientific research. The concept visualized in Figure S-1 can be further described as follows:
- Provocation: explore or mine open research resources and use open tools to network with colleagues. Researchers have immediate access to the most recent publications and have the freedom to search archives of papers, including preprints, research software code, and other open publications, as well as databases of research results, all without charge or other barriers. Researchers use the latest database and text mining tools to explore these resources, to identify new concepts embedded in the research, and to identify where novel contributions can be made. Robust collaborative tools are available to network with colleagues.
- Ideation: develop and revise research plans and prepare to share research results and tools under FAIR principles. Researchers and their collaborators develop and revise their research plans, collect preliminary data from publicly available data repositories, and conduct a pilot study to test their new methods on the existing data. When applying for research funding, they develop the required data management plans, stating where data, workflow, and software code will be available for use by other researchers under FAIR (Findable-Accessible-Interoperable-Reusable) principles. In addition, in some cases, they may decide to pre-register their research plans and protocols in an open repository.
- Knowledge generation: collect data, conduct research using tools compatible with open sharing, and use automated workflow tools to ensure accessibility of research outputs. Researchers collect data, using tools that automate formatting and curation tasks to ensure that digital datasets are interoperable and documented. In the case of physical samples and specimens, such as rocks, ice core samples, or tissue samples, researchers develop concrete plans to archive these according to disciplinary best practices. With the availability of open software, the researcher can document approaches to cleaning and preparing data for analysis in an electronic research notebook.
- Validation: prepare data and tools for reproducibility and reuse and participate in replication studies. Researchers use open data techniques to analyze, interpret, and validate findings. They may present their preliminary findings at conferences and refine their methods based on relevant
comments and critiques. They may deposit their initial working paper in a preprint server and revise the paper based on the open peer review afforded by the service. They prepare their data in standard formats according to disciplinary standards and describe both data and analytical code in optimal ways for reuse and replication.
- Dissemination: use appropriate licenses for sharing research outputs and report all results and supporting information (data, code, articles, etc.). Researchers select the best venue for open publication of their work, including articles, data, code, and other research products. They revise and, in some cases, substantially improve their work based on the comments of the peer reviewers. Upon acceptance and before final submission of their work, they select a public copyright license, such as the GNU General Public License for software or a Creative Commons license for other works, including scholarly articles.
- Preservation: deposit research outputs in FAIR archives and ensure long-term access to research results. Researchers deposit the final peer-reviewed articles in an openly accessible archive as required by their research funders. They deposit their research data and software in one or
more data archives, with clear and persistent links among the article, data, and software. These FAIR data are then used by other researchers in the provocation phase of their own work.
The committee’s concept of open science by design is by necessity general and idealized. Some discipline-specific nuances cannot be captured in such a broad concept. For example, there are fields where preregistration may not make sense or add value. Other challenges arise from the size or complexity of data. An important and emerging type of data are the very large datasets that capture extremely rare, time-sensitive events. Subtleties in this data and their generation may not be readily captured without detailed knowledge of how the data were collected.
Also, and importantly, open science by design is intended as a framework to empower the researcher. As expressed in other National Academies work, the principle for openness of data and other information underlying reported results is that they should be available no later than the time of publication, or when the researcher is seeking to gain credit for the work (NRC, 2003, 2009). For journal publication, any sharing prior to the point of final publication is up to the researcher, who is in full control of the decision of when to share. The committee believes that as open science by design becomes the norm, researchers will find that they benefit from sharing and collaborating early in the research process.
Achieving open science will require persistent, coordinated actions on the part of research enterprise stakeholders. The committee has developed findings,
recommendations, and implementation actions based on its review and synthesis of the information gathered throughout the course of the study. The complete set of findings is contained in Chapter 6 with the recommendations and implementation actions.
Building a Supportive Culture
The specific ways in which cultural barriers to open science operate vary significantly by field or discipline. Overuse and misuse of bibliographic metrics such as the Journal Impact Factor in the evaluation of research and researchers is one important “bug” in the operation of the research enterprise that has a detrimental effect across disciplines. The perception and/or reality that researchers need to publish in certain venues in order to secure funding and career advancement may lock researchers into traditional, closed mechanisms for reporting results and sharing research products. These pressures are particularly strong for early career researchers.
Initiatives such as the San Francisco Declaration on Research Assessment seek to achieve broad buy-in on the part of stakeholders to move toward evaluation systems that use other methodologies. Concrete actions, such as the National Institutes of Health (2017a) decision to encourage investigators to use and cite interim research products such as preprints in seeking funding, can have a beneficial effect.
Continued effort by stakeholders, working internationally and across disciplinary boundaries, is needed to change evaluation practices and introduce other incentives so that the cultural environment of research better supports and rewards open practices.
Research institutions should work to create a culture that actively supports Open Science by Design by better rewarding and supporting researchers engaged in open science practices. Research funders should provide explicit and consistent support for practices and approaches that facilitate this shift in culture and incentives.
- Universities and other research institutions should explicitly reward the effort needed to make science open by design.
- Universities and other research institutions should partner with federal agencies in developing innovative approaches to assessing the impact of research in ways that include the impact of open science outputs. This should include, but is not limited to, the development of metrics for assessing the impact of interim research products such as preprints, with a
view toward comparing those with existing methods for measuring impact.
- Universities and other research institutions should move toward evaluating published data and other research products in addition to published articles as part of the promotion and tenure process. Archived data should be valued, just as the publications that result from them are valued.
- Researchers should make full use of the many opportunities that are available for making their research products openly available, and they should include that information in their curriculum vitae so that they can be appropriately credited and rewarded.
- In fields where this is not already common practice, research funders should encourage and reward the use of data and other research products that are available in publicly accessible databases.
- Universities and other research institutions should encourage and reward studies that focus on the replication and reproducibility of published research. Such studies should be published and made openly available.
Training for Open Science by Design
The report discusses several initiatives that emphasize training in open science and reproducibility. The emergence of data science as a recognized interdisciplinary field has highlighted the need for new educational content and approaches related to data (NASEM, 2018a).
Several federal agencies require that students or trainees supported by grants receive training in the responsible conduct of research, or RCR (NASEM, 2017b). Training and education that covers issues such as open science and reproducibility would complement the existing focus of RCR education and orient these programs toward supporting both research integrity and quality.
Research institutions and professional societies should train students and other researchers to implement open science practices effectively and should support the development of educational programs that foster Open Science by Design.
- Universities should provide training in best practices for open science and data stewardship as part of the regular curriculum in graduate and postgraduate education and should expect these practices as a default in all onboarding/orientation processes of universities, including new student orientation, new faculty orientation, library orientations, and lab training.
Course curricula should be developed and implemented to complement domain-specific courses that support open science by design.
- Research funders should support the development of training programs in the principles and practices of open science by design. Federal agencies should require this training as part of all federally funded graduate training grants (e.g., NSF research traineeships and NIH training grants) to foster open science by design.
- Library and information science schools, professional societies, and other interested organizations should develop course curricula and offer courses in the principles and practices of open science by design.
- Research funders and professional societies should create programs or contests that seek the creative and innovative integration and (re)use of open data for new and impactful research.
- The private sector and other interested parties should create innovative educational tools for open science principles and practices.
Ensuring Long-Term Preservation and Stewardship
The issues and challenges related to preservation and stewardship of research products, particularly data, code, and other non-article products, are considered in several places in the report. On the one hand, some of the technical and cost barriers to long-term data stewardship are falling, as tools for automated metadata tagging and classification become more widely used and data storage becomes cheaper over time. At the same time, the outputs of research continue to grow in volume and complexity, meaning that significant additional resources will still be required. For example, an important and emerging type of data are the very large datasets that capture extremely rare, time-sensitive events. Subtleties in these data and their generation may not be readily captured without detailed knowledge of how the data were collected.
Developing and sustaining the infrastructure required for long-term stewardship of research products will present a continuing challenge. This report does not contain a detailed cost estimate and timeline for meeting these needs. Yet several of the immediate priorities and initial steps do not, in themselves, require the expenditure of significant resources. Research communities can start by developing guidelines and criteria for determining what data and other research products should be preserved and for how long. Clearly, not everything needs to be preserved. Federal agencies that require data management plans in grant applications can better clarify guidance for compliance expectations and institutional responsibilities. The work of developing necessary standards and policies on the part of stakeholders will enable effective planning of new infrastructure and associated financing.
It is also important that approaches are flexible enough to adapt and change over time. The size and complexity of data in many fields are changing rapidly,
so that the solutions that are effective today might not be effective in a few years. At the same time, we have seen new tools and platforms continue to emerge that allow researchers to address challenges that were previously intractable.
Research funders and research institutions should develop the policies and procedures to identify the data, code, specimens, and other research products that should be preserved for long-term public availability, and they should provide the resources necessary for the long-term preservation and stewardship of those research products.
- Research institutions, professional societies and research funders should work together to develop selection guidelines and long-term stewardship best practices for the most valuable community datasets and other research products.
- Federal agencies should, consistent with the 2013 and 2014 Office of Science and Technology Policy (OSTP, 2013, 2014) memoranda for expanding public access to the results of federally funded research, continue to develop and standardize requirements for research products planning, management, reporting, and stewardship.
- Private research funders who have not already done so should adopt approaches compatible with those developed for publicly funded research products planning, management, reporting, and stewardship.
- Researchers should describe the plan for dissemination and stewardship of their research products with some specificity, consistent with the standardized sponsor requirements described above, including where their research products will be made publicly available and for what period of time.
- Research funders and research institutions should work together to resource and provide the infrastructure needed for long-term preservation, stewardship, and community control of research products. This infrastructure could be supported through direct costs or through an ear-marked percentage of each funded grant.
Facilitating Data Discovery, Reuse, and Reproducibility
As progress toward open science by design continues, it is important that the community adhere to the ultimate goal of achieving the availability of research products under open principles. Utilizing advanced machine learning tools in an-
alyzing datasets or literature, for example, will facilitate new insights and discoveries. Ensuring FAIR access should be a key consideration in deciding how to build repositories and other new resources.
As is the case with ensuring long-term stewardship, new standards should be developed by funders in collaboration with research institutions and researchers. Fields and disciplines that do not already have well-developed standards and practices for making research products available under FAIR principles will need time and help to create them. Where meeting new standards imposes costs, funders should make the necessary resources available, thereby avoiding the imposition of unfunded mandates. Specific actions enabling a transition need to be developed in a transparent manner, and avoid disrupting researchers and their work to the extent possible.
Funders that support the development of research archives should work to ensure that these are designed and implemented according to the FAIR data principles. Researchers should seek to ensure that their research products are made available according to the FAIR principles and state with specificity any exceptions based on legal and ethical considerations. Implementation Actions
- Researchers should preferentially use open repositories that have been designed for interoperability and ease of discovery.
- Research funders should work to ensure that research products are available in repositories that allow for bulk transfer of digital objects to developers or users of automated discovery and analysis tools.
- Researchers and research funders should require that research products designated for long-term preservation and stewardship are assigned persistent unique digital identifiers.
- Professional societies and research funders should support efforts to network and federate existing repositories for improved discoverability.
- Research funders should continue to support the development of methods and tools that improve the interoperability of heterogeneous data. Metadata schemes, commonly accepted workflows for the processing and analysis of data, and other standards should be developed and used for improved data discovery.
- Research funders should commission an independent assessment of the state of university and federal data archives. The assessment should address how the FAIR principles have or have not been adhered to and make recommendations for improving accessibility to distributed or federated archives.
Developing New Approaches to Fostering Open Science by Design
There is a great deal of activity on the part of public and private research funders, research institutions, commercial and nonprofit publishers, community-organized groups and others aimed at preparing for and shaping a future research enterprise characterized by open science. Significant progress has been made, but a great deal of work needs to be done before open science by design is a reality. The committee focused on the choices facing U.S. organizations and institutions, realizing that the transition to open science by design is inherently a global process.
Effective dissemination will remain central to the advance of knowledge in the emerging open science era. Considerable resources are devoted to the publication of research results, much of them flowing to for-profit publishing companies or to nonprofit scientific societies. Many scientific societies generate surpluses through their publishing activities that support their professional ecosystems, and some would be severely challenged by some approaches to implementing open publication. At the same time, research institutions are currently experiencing difficulty in absorbing the steady increases in subscription rates of recent years.
Although scientific journals and articles will likely continue to play important roles for the foreseeable future, it is clear that the institutions and practices that support the dissemination of research will continue to evolve. Fully open publications are immediately accessible to all researchers at no cost and are available to all researchers under a copyright license that permits them to perform text and data mining or other productive reuses of the literature without the need for any negotiations or further permissions. While some subscription publishers have begun to offer researchers some forms of access for text and data mining and other productive reuses, their terms of access usually impose some restrictions on reuse.
The past several decades have seen the printed journal eclipsed by online distribution of research results. Datasets and other non-article research products will be increasingly valued and become a more significant focus of dissemination efforts. New venues for disseminating research have emerged and will continue to appear and grow.
The future evolution of research dissemination should be shaped by the changing needs of researchers and the broader enterprise, including the need to ensure openness. Issues of cost and sustainability should be considered from the standpoint of researchers. In developing new policies and support structures, research funders and research institutions should favor dissemination approaches that are responsive to community needs, and they should be transparent about their practices and costs.
Certain approaches to implementing open publication have the potential to affect the research ecosystem in significant ways, with differential impacts on different stakeholders. For example, a system that strongly favors publication approaches based on the payment of article processing charges would favor established researchers and wealthy institutions over early career researchers and
institutions with fewer resources. In planning new policies and transitions, it will be necessary to anticipate differential impacts to the extent possible, consider ways of avoiding these, and build in evaluative and corrective mechanisms to address unanticipated consequences.
Public and private funders have made significant contributions to fostering open science to this point. They should continue to support initiatives that accelerate progress, and evaluate and revise their policies as needed.
The research community should work together to realize Open Science by Design to advance science and help science better serve the needs of society.
- The federal government should revisit and update its open science policy, which is expressed in the 2013 and 2014 OSTP memoranda.
- Funders, institutions, and researchers should align policies and incentives to realize open publication, including rights-retention provisions.
- Research funders should support the establishment of a consortium of research community stakeholders to develop additional concrete methods for implementing open science by design.
- Professional societies—individually and collectively—should work to transition from current publication strategies to new ones that foster open science by design.
- Journal editors should work with publishers to transition from current business models to new ones that foster open science by design.
- Research funders should explore innovative means to support the transition from subscription-based systems to new publication strategies that enable open science by design.
- Librarians should work together with other members of the research community to promote and implement open science by design.
- The research community should develop tools and other applications that depend on the long-term availability of open research products, thereby providing new sources of revenue for the private sector, enhancing the value of research products, and leading to an acceleration of scientific progress.