Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 133
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Appendix B Relevant National Academy of Sciences, National Academy of Engineering, Institute of Medicine, and National Research Council Reports On Being a Scientist: Responsible Conduct in Research, Third Edition (2009) Committee on Science, Engineering, and Public Policy Synopsis: Describes the ethical responsibilities of researchers, using case studies. Treatment of data is one of the topics covered. Provides an overall framework for responsible research practices that underlies this study’s discussion on ensuring the integrity of data. Models in Environmental Regulatory Decision Making (2007) Committee on Models in the Regulatory Decision Process, National Research Council Synopsis: Examines the use of models by the Environmental Protection Agency in the regulatory process, and recommends a life-cycle management approach to developing, testing, and revising models. Developing environmental regulations relies on both data and models. Principles outlined in the report, such as the importance of peer review and of providing accurate descriptions of a model’s assumptions, are analogous to this study’s principles for providing access to data and metadata. Environmental Data Management at NOAA: Archiving, Stewarding, and Access (2007) Committee on Archiving and Accessing Environmental and Geospatial Data at NOAA, National Research Council Synopsis: The National Oceanographic and Atmospheric Administration (NOAA) collects, manages, and disseminates a wide range of climate, weather, ecosystem, and other environmental data used by scientists, engineers, resource managers, policy makers, and others in the United States and around the world. The increasing volume and diversity of NOAA’s data holdings—which
OCR for page 134
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age include everything from satellite images of clouds to the stomach contents of fish—and a large number of users present NOAA with substantial data management challenges. The report offers nine general principles for effective environmental data management, along with a number of guidelines on how the principles could be applied at NOAA. The principles and guidelines developed for NOAA are consistent with the accessibility and stewardship principles laid out in this study, and represent an example of how they apply to an agency with significant data management responsibilities in the earth sciences. The description of NOAA’s data management challenges also illustrates the challenges of providing access and stewardship for large, heterogeneous datasets. Sciences and Security in a Post 9/11 World (2007) Committee on a New Government-University Partnership for Science and Security Synopsis: Explores various aspects of science and security, including access to data and movement of students and researchers across borders. Upholds the principle that the results of unclassified basic research should not be restricted. Surface Temperature Reconstructions for the Last 2,000 Years (2006) Committee on Surface Temperature Reconstructions for the Last 2,000 Years, National Research Council Synopsis: Examines the use of proxy evidence from multiple sources to reconstruct surface temperatures. In addition to its main conclusions about the reliability of multiproxy reconstructions, the report points out the differences in approaches to data availability in the fields covered, and that open access to data and methods will improve public confidence in the results of this research. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health (2006) Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation, National Research Council Synopsis: Explores intellectual property (IP) issues related to genomic and protein research, identifies areas where emerging practices in patenting and sharing data or research resources might impede research, and recommends steps that federal agencies, research institutions, and companies should take to prevent IP protections from impeding future breakthroughs. Access to and sharing of research data are addressed in several recommendations. Improving Business Statistics Through Interagency Data Sharing: Summary of a Workshop (2006) Caryn Kuebler and Christopher Mackie, Rapporteurs, Steering Committee for the Workshop on the Benefits of Interagency Business Data Sharing, National Research Council
OCR for page 135
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Synopsis: Describes the benefits of greater sharing of business and other data among federal agencies, the barriers (mainly the need to maintain confidentiality), and possible approaches. Covers issues of data access relevant to economics and other social sciences. Expanding Access to Research Data: Reconciling Risk and Opportunities (2005) Panel on Data Access for Research Purposes, National Research Council Synopsis: Focuses on expanded access to microdata from studies conducted by federal statistical agencies under pledges of confidentiality. Describes barriers to data access that are common in the social sciences, and develops approaches to overcoming them. Building an Electronic Records Archive at the National Archives and Records Administration (NARA): Recommandation for a Long-Term Strategy (2005) Committee on Digital Archiving and the NARA, National Research Council Synopsis: Develops a comprehensive long-term strategy for how the NARA should approach archiving digital data. Many of the issues and barriers identified in the report, and the recommended strategies for addressing them, are relevant to a wide range of research fields and organizations charged with stewardship of research data. Improving Data to Analyze Food and Nutrition Policies (2005) Panel on Enhancing the Data Infrastructure in Support of Food and Nutrition Programs, Research, and Decision Making, National Research Council Synopsis: Examines existing data sources used to support policy making and policy evaluation in food and nutrition programs. Recommends steps to strengthen the data infrastructure in this area. A good example of an end-use-motivated inventory of open and proprietary data sources. Electronic Scientific, Technical, and Medical Journal Publishing and Its Implications: Report of a Symposium (2004) Committee on Electronic Scientific, Technical, and Medical Journal Publishing and Its Implications and Committee on Science, Engineering and Public Policy, The National Academies Synopsis: Summarizes a symposium that considered the changing digital environment for scholarly publishing. Licensing Geographic Data and Services (2004) Committee on Licensing Geographic Data and Services, National Research Council Synopsis: Addresses the growing practice whereby federal agencies license geographic data from private vendors for their own use and for the use of outside researchers. Provides guidelines for when and under what circumstances agen-
OCR for page 136
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age cies should enter such agreements, and describes complementary strategies, such as creation of a National Commons and Marketplace in Geographic Data, to maximize access to data for research and other uses. A careful examination of a field where access to private data is necessary for the advance of research. These guidelines may become applicable to other fields in the future. Seeking Security: Pathogens, Open Access, and Genome Databases (2004) Committee on Genomics Databases for Bioterrorism Threat Agents, National Research Council Synopsis: Examines the security implications of access to genomic data, concluding that continued open access to genomic data is the best approach. Recommends that professional societies educate researchers about the risks of research results being misused. An example of a field in which open access is the best approach to ensuring security. Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences (2003) Committee on Responsibilities of Authorship in the Biological Sciences, National Research Council Synopsis: The publication of experimental results and sharing of research materials related to those results have long been key elements of the life sciences. Over time, standard practices have emerged from communities of life scientists to facilitate the presentation and sharing of different types of data and materials. But recently a concern has emerged that, in practice, publication-related data and materials are not always readily available to the research community. This report finds that the life sciences community does possess commonly held ideas and values about the role of publication in the scientific process. Those ideas define the responsibilities of authors and underpin the development of community standards: practices for sharing data, software, and materials adopted by different disciplines of the life sciences to facilitate the use of scientific information and ensure its quality. The report is a very clear and thorough exploration of standards and expectations for making data accessible in an important field. The principles developed—that authors are required to make data available as a quid pro quo for publication, that authors are obligated to provide data and other materials in a form on which scientists can build further with research, and that all members of the scientific community have equal responsibility for upholding community standards—are consistent with those recommended by this study, and represent something of a “gold standard” that other fields might try to emulate.
OCR for page 137
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Government Data Centers: Meeting Increasing Demands (2003) Committee on Coping with Increasing Demands on Government Data Centers, National Research Council Synopsis: Describes the increasing demands on government data centers that store and provide access to environmental data, and technical approaches to ensure effective operation in the future. In the earth and environmental sciences, the federal government has a major responsibility for the stewardship of data. Provides an overview of the issues and makes recommendations for technical approaches that might be used by the centers and users. These approaches might have relevance to other fields. Ensuring the Quality of Data Disseminated by the Federal Government: workshop Report (2003) Committee on Ensuring the Quality of Government Information, National Research Council Synopsis: Summarizes discussion at a series of workshops involving agencies and researchers to discuss implementation of the Data Quality Act. Provides background on the Data Quality Act, which is an important part of the policy context for this study’s discussion of the integrity and accessibility of data. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium (2003) Julie M. Esanu and Paul F. Uhlir, Editors, National Research Council Synopsis: Papers from a symposium on how the scientific community can maintain and expand the public domain for scientific and technical data and information. The papers explore many aspects of the intellectual property environment for research. Access to Research Data in the 21st Century: An Ongoing Dialogue Among Interersted Parties, Report of a Workshop (2002) Science, Technology, and Law Panel, National Research Council Synopsis: A workshop on issues related to the Data Access Act (the Shelby Amendment) which was adopted in 2000. Points out that peer review does not detect fraud or substitute for the judgment of the scientific community as a whole; it provides advice to a journal editor about the importance of the findings and whether the reported evidence supports the author’s claims. Illustrates the barriers to making data available, particularly in fields where data can be used to identify individuals. Also illustrates the pros and cons of various approaches to ensuring the accessibility of data, including that of the Data Access Act, which is modeled on the Freedom of Information Act.
OCR for page 138
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Toward New Partnerships in Remote Sensing: Government, the Private Sector, and Earth Science Research (2002) Steering Committee on Space Applications and Commercialization, National Research Council Synopsis: Much of the remote sensing data needed for earth sciences research are now provided by private sector entities, and are made available to the federal government and university researchers through various licensing agreements and partnership arrangements. The report evaluates these arrangements and makes recommendations for how they should be structured in order to best advance science. The report explores intellectual property issues involved when private sector data is obtained for use in government and university environments. The principles developed might be useful for other fields where data generated by the private sector might be utilized to advance research. Integrity in Scientific Research: Creating an Environment That Promotes Responsible Conduct (2002) Committee on Assessing Integrity in Research Environments, National Research Council, Institute of Medicine Synopsis: Provides a high-level view on research integrity and how it can be promoted. Much of the focus is on institutional approaches to education and self-assessment. Consistent with this study’s findings and recommendations on institutional responsibility. Geoscience Data and Collection: National Resources in Peril (2002) Committee on the Preservation of Geoscience Data and Collections, National Research Council Synopsis: Describes the importance of geoscience data and collections and the challenges of stewardship. Develops criteria for prioritizing geoscience data and collections to be preserved, and recommends a specific strategy for doing so. A case study of the tension between devoting resources to creating new data and preserving existing data. A good example of how criteria can be developed on a disciplinary basis for making these judgments. Assessment of the Usefulness and Availability of NASA’s Earth and Space Science Mission Data (2002) Task Group on the Usefulness and Availability of NASA’s Space Mission Data, National Research Council Synopsis: Calls on NASA to devote more resources and management attention to data stewardship, including ensuring compatibility with parallel data efforts such as the National Virtual Observatory. Earth and space science examples illustrating the importance of data reuse.
OCR for page 139
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Preparing for the Revolution: Information Technology and the Future of the Research University (2002) Panel on the Impact of Information Technology on the Future of the Research University, National Research Council Synopsis: Broad overview of information technology changes and their implications for the research university. Calls attention to the institutional role in preserving and disseminating knowledge, including data. Transforming Remote Sensing Data into Information and Application (2001) Steering Committee on Space Applications and Commercialization, National Research Council Synopsis: Examines possibilities for applying remote-sensing data to new applications and the implications for policy. Illustrates the value of data reuse while also recognizing that developing new applications may carry considerable costs. Points out the lack of standard data protocols and formats as a barrier to using data for new applications. Issues for Science and Engineering Researchers in the Digital Age (2001) Office of Special Projects, National Research Council Synopsis: A broad overview of how information technology is transforming science and engineering research, and the implications for researchers. Highlights the importance of ensuring the quality of digital data and the challenges of stewardship. Resolving Conflicts Arising from the Privatization of Environmental Data (2001) Committee on Geophysical and Environmental Data, National Research Council Synopsis: Defines appropriate spheres for the public and private sectors in the growing field of environmental data. Recommends that the public sector should continue to collect and synthesize data, and to provide such data at no more than the marginal cost of reproduction with no usage restrictions. The private sector would focus on value-added distribution and specific observational systems. Improving the Collection, Management, and Use of Marine Fisheries Data (2000) Ocean Studies Board, National Research Council Synopsis: Describes the current system of data collection, management, and use in the marine fisheries field, and recommends improvements. Illustrates the growing need to work across sectors to improve data quality and stewardship in a “small science” field that is highly relevant to policy.
OCR for page 140
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Bioinformatics: Converting Data to Knowledge: Workshop Summary (2000) A Workshop Summary by Robert Pool and Joan Esnayra, Board on Biology, National Research Council Synopsis: Summary of a workshop on data issues related to bioinformatics. Illustrates how the growing availability of data is transforming science and engineering. Improving Access to and Confidentiality of Research Data: Report of a Workshop (2000) Christopher Mackie and Norman Bradburn, Editors, National Research Council Synopsis: Explores the challenges of improving access to data with confidentiality restrictions. The challenge of improving access to data with confidentiality restrictions goes across several fields. The Digital Dilemma: Intellectual Property in the Information Age (2000) Committee on Intellectual Property Rights in the Emerging Information Infrastructure, National Research Council Synopsis: In-depth examination of copyright issues, including those related to digital archiving, in the wake of the Digital Millennium Copyright Act. Relevant to the changing environment for scientific publishing, an important aspect of the context for this study, as well as the role of libraries. A Question of Balance: Private Rights and Public Interest in Scientific and Technical Databases (1999) Committee for a Study on Promoting Access to Scientific and Technical Data for the Public Interest, National Research Council Synopsis: Describes the importance of scientific and technical databases in research, and standard practices for production, dissemination, and use of data in federal, nonprofit, and commercial contexts. Develops principles and guidelines for agencies, research institutions, and investigators. Explores various proposals for creating new intellectual property protection for noncopyrightable databases current at the time of the study, along with the pros and cons of these proposals. The European Union had recently created such protection. Several of the principles and guidelines are consistent with this study, including: (1) scientific and technical data owned or controlled by the government should be made available for use by not-for-profit and commercial entities alike on a nonexclusive basis and should be disseminated to all users at no more than the marginal cost of reproduction and distribution, whenever possible; (2) federal funding agencies should require university and other not-for profit researchers or their employing institutions that use federal funds, wholly or in substantial part, in creating databases not to grant exclusive rights to such databases when
OCR for page 141
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age submitting them for publication or for incorporation into other databases. Also provides a good overview of intellectual property issues related to data. Data itself is not copyrightable, and there are significant limitations on copywriting databases. The policy context has not changed much since the time of this report, as the United States and other nations have not followed the European Union to create new intellectual property protection for databases. Finding the Path: Issues of Access to Research Resources (1999) Committee on Federal Policy for Access to Research, Resources, National Research Council Synopsis: This conference summary describes issues affecting access to a variety of research resources in the life sciences, including data and databases, materials, software, and so forth. Provides background on data access issues in the life sciences. The recommendations are largely superseded by Sharing Publication-Related Data and Materials (2003). Assuring Data Quality and Validity in Clinical Trials for Regulatory Decision Making: Workshop Report (1999) Jonathan R. Davis, Vivian P. Nolan, Janet Woodcock, and Ronald W. Estabrook, Editors, Institute of Medicine Synopsis: Describes the process for assuring the integrity of clinical trial data and suggests improvements. Background to the issues of clinical trials data discussed in this study. Bits of Power: Issues in Global Access to Scientific Data (1997) Committee on Issues in the Transborder Flow of Scientific Data, National Research Council Synopsis: Outlines the needs for access to data in the physical, astronomical, geological, and biological sciences. Characterizes the legal, economic, policy, and technical factors and trends that have an influence on access to data by the scientific community. Identifies and analyzes the barriers to international access to scientific data. Recommends approaches that could help overcome those barriers. The two key challenges are the increasing quantities, varieties, dissemination modes, and interdisciplinary relevance of data, and increasing legal and economic restrictions on publicly funded data. States the principle that “full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research. The public-good interests in the full and open access to and use of scientific data need to be balanced against legitimate concerns for the protection of national security, individual privacy, and intellectual property.” This study would extend this principle somewhat, to include private-sector-funded data on which published research results are based.
OCR for page 142
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Responsible Science: Ensuring the Integrity of the Research Process (1992) Committee on Science, Engineering, and Public Policy Synopsis: Broad overview and guidance on how the research enterprise should ensure research integrity. The principles and approaches developed in this study still underlie the definitions, standards, and policies related to ensuring responsible research and dealing with misconduct. SharingResearch Data (1985) Stephen E. Fienberg, Margaret E. Martin, and Miron L. Straf, Editors, Committee on National Statistics, National Research Council Synopsis: Explores advantages of and barriers to sharing social sciences data. Early exploration of the idea of asking researchers to provide a data dissemination plan in their proposals, including “the time of release of data, the means by which the data would be made available and preserved for long-term use, the technical form in which data would be released, the supporting documentation that would accompany the data, what forms of access to confidential or other sensitive data would be provided, and an assessment of the policy relevance and broad research value of the data.”