The benefits of open science are accruing to researchers themselves, research sponsors, research institutions, disciplines, and scholarly communicators. Yet, despite significant progress toward creating an open science ecosystem, today’s science is not completely open. Most scientific articles are only available on a subscription basis. Sharing data, code, and other research products is becoming more common, but is still not routinely done across all disciplines. Barriers to more rapid progress include an academic culture and researcher incentives that can work against open science, insufficient infrastructure and training, issues related to data privacy and national security, and the economic structure of the scholarly communications market.
Open science also needs to overcome less defined sources of skepticism, which it can only do by proving its value to the research enterprise over time. Many important transformations and innovations in the history of science, and in history more broadly, have been opposed at first because of difficulty in quantifying or even imagining the benefits. For example, much of the biomedical research community was strongly opposed to the Human Genome Project when it was first proposed, believing that it diverted resources from more valuable investigator-driven work (Palca, 1992). The project and its impact look much different in hindsight. Today’s advances in biomedical research, and many other fields such as archaeology, would not be imaginable without genomic mapping and analysis. Also, researchers who are used to a framework where they are accountable to colleagues, to their disciplines, and to their institutions may be uneasy with open science’s implication that they are or should be accountable to the broader public.
The open science movement stands at an important inflection point. A new generation of information technology tools and services holds the potential of further revolutionizing scientific practice. For example, the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. These tools and services will have maximum impact when used within an open science ecosystem that spans institutional, national, and disciplinary boundaries. At the same time, a number of organizations around the world are adopting new policies and launching new initiatives aimed at fostering open science.
The vision of open science by design presented in this report seeks to enable the large population of stakeholders to move more rapidly toward open science as the default condition for the research they support. These stakeholders include the researchers themselves, universities, private and nonprofit organizations, publishers and journal editors, scientific societies, the philanthropic community, and federal agencies. Despite the barriers that must still be overcome to implement open science, the momentum of the movement toward open science is generally apparent, and strategies for accelerating access have been outlined by many members of the scientific community. To help accelerate this progress further, the committee has reviewed several recent recommendations, including those of a report by the Association of American Universities (AAU) and Association of Public and Land-grant Universities (APLU) and the European Open Science Cloud (EOSC) Declaration, before developing an action statement for specific stakeholders.
AAU-APLU Public Access Working Group Report
A joint working group on public access convened by the AAU and APLU released a report in November 2017 that provides recommendations and summarizes actions for federal agencies and universities to advance public access to data in a sustainable manner. The report recognizes that a significant culture shift at universities and among their faculty is required, in addition to carefully crafted new federal policies and investment in data infrastructure that support open access (APLU-AAU, 2017). The report also suggests, “by committing to a set of shared principles and minimal levels of standardization across institutions and agencies, we can help minimize costs, enhance interoperability between institutions and disciplines, and maximize the control institutions can exert over how they ensure access to publicly funded scholarship” (AAU-APLU, 2017, p. 1).
Internationally, the European Commission released the EOSC Declaration in October 2017 calling on all scientific stakeholders to endorse and commit to the principles of the declaration by 2020. The declaration, which emerged as a result of the EOSC Summit held in June 2017, recognizes the challenges of data-driven research in pursuing excellent science; grants the vision of European Open Science as widely inclusive of all disciplines and Member States in the long term; and confirms the implementation of the EOSC as a process based on constant learning and mutual alignment (EC, 2017a). Regarding data culture, it notes that “only a considerable cultural change will enable long-term reuse for science and for innovation of data created by research activities: no disciplines, institutions or countries must be left behind” (EC, 2017a, p. 1).
The Committee on Toward an Open Science Enterprise has developed the following set of findings and recommendations based on its review and synthesis of the information gathered throughout the course of the study. Each recommendation is the focus of a section that includes a discussion of relevant issues drawing on other parts of the report and a set of findings. Each of the five recommendations is followed by implementation actions specifying agencies, universities, or other organizations to guide stakeholder efforts to fostering open science by design.
Building a Supportive Culture
Transparency, openness, and reproducibility are readily recognized as vital features of science. When asked, most scientists embrace these features as disciplinary norms and values. Therefore, one might expect that these valued features would be routine in daily practice. Yet, a growing body of evidence suggests that this is not the case.
The actual and anticipated benefits of open science include more reliable knowledge, more rapid and creative generation of results, and broader and more inclusive participation in the research process. Significant barriers to wider and quicker adoption of open practices include the incentives and underlying cultural assumptions that operate in many fields.
The specific ways in which cultural barriers to open science operate vary significantly by field or discipline. Overuse and misuse of bibliographic metrics such as the Journal Impact Factor in the evaluation of research and researchers is one important “bug” in the operation of the research enterprise that has a detrimental effect across disciplines, as explained in Chapter 2. The perception and/or reality that researchers need to publish in certain venues in order to secure funding and career advancement may lock researchers into traditional, closed mechanisms for reporting results and sharing research products. These pressures are particularly strong for early career researchers.
Initiatives such as the San Francisco Declaration on Research Assessment seek to achieve broad buy-in on the part of stakeholders to move toward evaluation systems that use other methodologies. Concrete actions, such as the National Institutes of Health (2017a) decision to encourage investigators to use and cite interim research products such as preprints in seeking funding, can have a beneficial effect.
Continued effort by stakeholders, working internationally and across disciplinary boundaries, is needed to change evaluation practices and introduce other
incentives so that the cultural environment of research better supports and rewards open practices.
- The culture of academia does not adequately reward and support researchers engaged in open science practices.
- University tenure and promotion committees give credit for journal publications, but rarely give explicit credit to investigators who make their publications and data openly available for use by the broader community and thus do not incentivize such practices.
- There are increasing opportunities for authors to make their research products openly available. Many high-quality open access journals exist. An increasing number of high-quality open access publishers are supported by philanthropy and host institutions and offer fee waivers to authors in case of economic hardship (Shieber, 2009; Lawson, 2015). There are even peer-reviewed open access publishers that charge a nominal article processing charge or none at all. The Directory of Open Access Journals can be searched to find appropriate journals (DOAJ, 2018). Many journal publishers do not prohibit prospective authors from depositing their initial manuscripts in preprint servers. Most journal publishers do not prohibit authors from posting their accepted articles on their personal websites or depositing them in their university’s open access repository. Most federal agencies require deposit of federally funded research results in public repositories.
- Journal articles are currently the primary method for summarizing and sharing scientific results, and the journal’s impact factor plays a large role in the assessment of academic achievement. In the digital age, while the journal framework may well continue for branding and content integration purposes, compiling articles in journals for distribution is no longer a requirement for broad distribution.
Research institutions should work to create a culture that actively supports Open Science by Design by better rewarding and supporting researchers engaged in open science practices. Research funders should provide explicit and consistent support for practices and approaches that facilitate this shift in culture and incentives.
- Universities and other research institutions should explicitly reward the effort needed to make science open by design.
- Universities and other research institutions should partner with federal agencies in developing innovative approaches to assessing the impact of research in ways that include the impact of open science outputs. This should include, but is not limited to, the development of metrics for assessing the impact of interim research products such as preprints, with a view toward comparing those with existing methods for measuring impact.
- Universities and other research institutions should move toward evaluating published data and other research products in addition to published articles as part of the promotion and tenure process. Archived data should be valued, just as the publications that result from them are valued.
- Researchers should make full use of the many opportunities that are available for making their research products openly available, and they should include that information in their curriculum vitae so that they can be appropriately credited and rewarded.
- In fields where this is not already common practice, research funders should encourage and reward the use of data and other research products that are available in publicly accessible databases.
- Universities and other research institutions should encourage and reward studies that focus on the replication and reproducibility of published research. Such studies should be published and made openly available.
Training for Open Science by Design
The importance of training for open science by design is discussed in several places in the report, particularly Chapter 4. Initiatives such as the European Union’s FOSTER project and the Berkeley Initiative for Transparency in the Social Sciences (BITSS) have emphasized training in open science and reproducibility. The emergence of data science as a recognized interdisciplinary field has highlighted the need for new educational content and approaches related to data (NASEM, 2018a).
Several federal agencies require that students or trainees supported by grants receive training in the responsible conduct of research, or RCR (NASEM, 2017b). Training and education that covers issues such as open science and reproducibility would complement the existing focus of RCR education and orient these programs toward supporting both research integrity and quality.
- Few academic institutions provide formal training and education in the principles and practices of open science.
- The university library community has an important role to play in the promulgation and support of open science principles and practices.
- Federal training programs, while requiring training in the responsible conduct of research, do not explicitly require training in the many aspects of open science principles and practices.
Research institutions and professional societies should train students and other researchers to implement open science practices effectively and should support the development of educational programs that foster Open Science by Design.
- Universities should provide training in best practices for open science and data stewardship as part of the regular curriculum in graduate and postgraduate education and should expect these practices in all onboarding/orientation processes of universities, including new student orientation, new faculty orientation, library orientations, and lab training as a default. Course curricula should be developed and implemented to complement domain-specific courses that support open science by design.
- Research funders should support the development of training programs in the principles and practices of open science by design. Federal agencies should require this training as part of all federally funded graduate training grants (e.g., NSF research traineeships and NIH training grants) to foster open science by design.
- Library and information science schools, professional societies, and other interested organizations should develop course curricula and offer courses in the principles and practices of open science.
- Research funders and professional societies should create programs or contests that seek the creative and innovative integration and (re)use of open data for new and impactful research.
- The private sector and other interested parties should create innovative educational tools for open science principles and practices.
Ensuring Long-Term Preservation and Stewardship
The issues and challenges related to preservation and stewardship of research products, particularly data, code, and other nonarticle products, are considered in several places in the report. On the one hand, some of the technical and cost barriers to long-term data stewardship are falling, as tools for automated metadata tagging and classification become more widely used and cloud storage becomes cheaper over time. At the same time, the outputs of research continue to grow in volume and complexity, meaning that significant additional resources will still be required. In addition, ensuring preservation and long-term stewardship—
particularly beyond the time period specified by the grant—requires standards and institutional capabilities that need to be developed by stakeholders and updated over time.
- Ensuring long-term preservation and stewardship of data and other research products requires a commensurate long-term commitment of resources.
- Public access to data and scientific collections created with federal support is required by federal agencies but the infrastructure and funding to store; curate; and preserve data, code, samples, and other research products are not necessarily available.
- Although some of the technical and cost barriers to large-scale data storage are falling, the outputs of research continue to grow in volume and complexity, meaning that significant additional resources will still be required. Significant cultural and institutional barriers also remain.
- The library community, including archivists, curators, and other information scientists, play an important role in effecting long-term preservation and stewardship.
- Scientific disciplines vary to the extent that data and other research products are shared and archived.
- Not all data and other research products should be preserved for the long term, and most research communities do not have well-defined criteria for determining what data and physical collections should be preserved and for what length of time. The rise of interdisciplinary research implies that data preservation criteria should consider possible use outside of the discipline in which the research was originally conducted.
- Most federal agencies require a data management plan as part of grant applications, although there is insufficient guidance for compliance expectations and institutional responsibilities.
- Developing and sustaining the infrastructure required for long-term stewardship of research products will present a continuing challenge. The work of developing necessary standards and policies on the part of stakeholders will enable effective planning of new infrastructure and associated financing.
- Approaches should be flexible enough to adapt and change over time. The size and complexity of data in many fields are changing rapidly, so that the solutions that are effective today might not be effective in a few years. At the same time, we have seen new tools and platforms continue to emerge that allow researchers to address challenges that were previously intractable.
Research funders and research institutions should develop the policies and procedures to identify the data, code, specimens, and other research products that should be preserved for long-term public availability, and they should provide the resources necessary for the long-term preservation and stewardship of those research products.
- Research institutions, professional societies, and research funders should work together to develop selection guidelines and long-term stewardship best practices for the most valuable community datasets and other research products.
- Federal agencies should, consistent with the 2013 and 2014 Office of Science and Technology Policy (OSTP, 2013, 2014) memoranda for expanding public access to the results of federally funded research, continue to develop and standardize requirements for research products planning, management, reporting, and stewardship.
- Private research funders who have not already done so should adopt approaches compatible with those developed for publicly funded research products planning, management, reporting, and stewardship.
- Researchers should describe the plan for dissemination and stewardship of their research products with some specificity, consistent with the standardized sponsor requirements described above, including where their research products will be made publicly available and for what period of time.
- Research funders and research institutions should work together to resource and provide the infrastructure needed for long-term preservation, stewardship, and community control of research products. This infrastructure could be supported through direct costs or through an ear-marked percentage of each funded grant.
Facilitating Data Discovery, Reuse, and Reproducibility
As progress toward open science by design continues, it is important that the community adhere to the ultimate goal of achieving the availability of research products under FAIR (findable, accessible, interoperable, reusable) principles. Open science under FAIR principles has the potential to deliver benefits to those researchers and disciplines that are participating, which will help make the case for supporting openness. Utilizing advanced machine learning tools in analyzing datasets or literature, for example, will facilitate new insights and discoveries. Ensuring FAIR access should be a key consideration in deciding how to build repositories and other new resources.
As is the case with ensuring long-term stewardship, new standards should be developed by funders in collaboration with research institutions and researchers. Fields and disciplines that do not already have well-developed standards and practices for making research products available under FAIR principles will need time and help to create these. Where meeting new standards imposes costs, funders should make the necessary resources available. Open science will be realized more quickly and effectively by avoiding the imposition of unfunded mandates. Specific actions enabling a transition need to be developed in a transparent manner, and avoid disrupting researchers and their work to the extent possible.
- It is difficult to determine how much data (open or otherwise) are generated through federally sponsored research projects and where they can be found. It is difficult to plan agency or budgetary data strategies based on this missing information.
- For certain types of data in several disciplines (e.g., computational biology, genomics, proteomics), papers cannot be submitted to major journals unless the relevant data have already been deposited in an open domain repository. This has facilitated the discovery and reuse of data as well as the reproducibility of research. At the same time this has only happened in a small number of fields.
- It is difficult to discover datasets and code through search, making the “findable” part of the FAIR principles challenging.
- There is considerable variation among different disciplines for what constitutes ethical practices in the publication and usage of open data.
- Public access to research data is not sufficient to ensure usability and enable reuse. Uncurated data are often difficult to use. Data curation, management, and stewardship allow for optimal discovery, reuse, and validation of the results of scientific research.
- The value of open data depends heavily on the proper usage of such data, which in turn relies on a proper understanding of how the data were generated and organized. Disciplinary differences are considerable, and some very large and complex datasets require considerable knowledge and expertise to use effectively.
- For most researchers, the amount of the relevant published literature is beyond the human capacity to gather, read, and analyze without the assistance of automated discovery and analytical tools. Such tools are in development, but that development is impeded by the lack of ready access to the entire corpus of published scientific research by tool developers.
- Open access publications are legally available for all, although not all open access publishers make their content readily available for bulk transfer to tool developers or users of text and data mining tools.
- Subscription publishers have varying policies concerning the availability and use of their publications for text and data mining, with the largest publishers making this content available only under the terms of a negotiated license agreement.
- Open access to the data and metadata, along with the code used to generate and/or interpret those data, supports reproducibility, replicability, and the reliability of reported results.
Funders that support the development of research archives should work to ensure that these are designed and implemented according to the FAIR data principles. Researchers should seek to ensure that their research products are made available according to the FAIR principles and state with specificity any exceptions based on legal and ethical considerations.
- Researchers should preferentially use open repositories that have been designed for interoperability and ease of discovery.
- Research funders should work to ensure that research products are available in repositories that allow for bulk transfer of digital objects to developers or users of automated discovery and analysis tools.
- Researchers and research funders should require that research products designated for long-term preservation and stewardship are assigned persistent unique digital identifiers.
- Professional societies and research funders should support efforts to network and federate existing repositories for improved discoverability.
- Research funders should continue to support the development of methods and tools that improve the interoperability of heterogeneous data. Metadata schemes, commonly accepted workflows for the processing and analysis of data, and other standards should be developed and used for improved data discovery.
- Research funders should commission an independent assessment of the state of university and federal data archives. The assessment should address how the FAIR principles have or have not been adhered to and make recommendations for improving accessibility to distributed or federated archives.
Developing New Approaches to Fostering Open Science by Design
As the report discusses in Chapters 3 and 5, there is a great deal of activity on the part of public and private research funders, research institutions, commercial and nonprofit publishers, community-organized groups, and others aimed at
preparing for and shaping a future research enterprise characterized by open science. Significant progress has been made, but a great deal of work needs to be done before open science by design is a reality. The committee focused on the choices facing U.S. organizations and institutions, realizing that the transition to open science by design is inherently a global process.
Chapter 5 describes a number of issues, a few possible scenarios, and options for action. The recent AAU-APLU report emphasizes the need for federal and other research sponsors to clarify requirements. In addition, revisiting federal policies supporting open science would allow for approaches to be modified and updated. Specific actions enabling a transition need to be developed in a transparent manner, and avoid disrupting researchers and their work to the extent possible.
The research enterprise is at an important point in the transition to open science, where research sponsors, both public and private, have an opportunity to shape the future through their investments.
- Significant progress in open science practices has been made in recent years, but the majority of research products are not open, and very little research output meets the FAIR guidelines.
- Many, though not all, research funder policies are moving toward open science principles and practices.
- Infrastructure for open science is being designed and deployed, although with variation across fields of study.
- Disciplinary preprint servers, such as arXiv, RePEc and BioRxiv, have successfully provided an open platform to post prepublication versions of manuscripts at no charge. These platforms have had an important positive effect on these disciplines.
- Open publications and open data provide an opportunity for the private sector and others to develop useful products for researchers and other communities.
- The current subscription-based business model for many publishers conflicts with the goal of immediate open access to publications and data.
- Article processing charges are a possible replacement for subscription fees as a business model, but they also have limitations, since the payment of the charges will still be a burden on some part of the ecosystem and will fall unevenly on different stakeholders.
- Certain approaches to implementing open publication have the potential to affect the research ecosystem in significant ways, with differential impacts on different stakeholders. In planning new policies and transitions, it will be necessary to anticipate differential impacts to the extent possible, consider ways of avoiding these, and build in evaluative and corrective mechanisms to address unanticipated consequences.
The research community should work together to realize Open Science by Design to advance science and help science better serve the needs of society.
- The federal government should revisit and update its open science policy, which is expressed in the 2013 and 2014 OSTP memoranda.
- Funders, institutions, and researchers should align policies and incentives to realize open publication, including rights-retention provisions.
- Research funders should support the establishment of a consortium of research community stakeholders to develop additional concrete methods for implementing open science by design.
- Professional societies—individually and collectively—should work to transition from current business models to new ones that foster open science by design.
- Journal editors should work with publishers to transition from current business models to new ones that foster open science by design.
- Research funders should explore innovative means to support the transition from subscription-based systems to new publication strategies that enable open science by design.
- Librarians should work together with other members of the research community to promote and implement open science by design.
- The research community should develop tools and other applications that depend on the long-term availability of open research products, thereby providing new sources of revenue for the private sector, enhancing the value of research products, and leading to an acceleration of scientific progress.