Click for next page ( R2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
THE FUTURE OF SCIENTIFIC KNOWLEDGE DISCOVERY IN OPEN NETWORKED ENVIRONMENTS Summary of a Workshop Paul F. Uhlir, Rapporteur Board on Research Data and Information Policy and Global Affairs THE NATIONAL ACADEMIES PRESS Washington, D.C. www.nap.edu

OCR for page R1
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This study was supported by the National Science Foundation under Grant No. 1042078. This report was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or any agency thereof. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the organizations or agencies that provided support for the project. International Standard Book Number-13: 978-0-309-26791-5 International Standard Book Number-10: 978-0-309-26791-9 Additional copies of this report are available from the National Academies Press, 500 Fifth Street, NW, Room 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu. Copyright 2012 by the National Academy of Sciences. All rights reserved. Printed in the United States of America

OCR for page R1
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Charles M. Vest is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. Charles M. Vest are chair and vice chair, respectively, of the National Research Council. www.national-academies.org

OCR for page R1

OCR for page R1
THE FUTURE OF SCIENTIFIC KNOWLEDGE DISCOVERY IN OPEN NETWORKED ENVIRONMENTS: A National Workshop Steering Committee John Leslie King, Chair William Warner Bishop Collegiate Professor of Information University of Michigan Hal Abelson Professor, Massachusetts Institute of Technology Francine Berman Vice President of Research, Rensselaer Polytechnic Institute Bonnie Carroll President, Information International Associates Michael Carroll Professor, American University, Washington College of Law Alyssa Goodman Professor, Harvard University Sara Graves Director, Information Technology and Systems Center University Professor of Computer Science University of Alabama in Huntsville Michael Lesk Professor, Rutgers University Gilbert Omenn Professor, University of Michigan Project Staff Paul F. Uhlir Board Director The National Academies Daniel Cohen Program Officer The National Academies [on detail from Library of Congress] Cheryl Levey Senior Program Associate The National Academies v

OCR for page R1
Board on Research Data and Information Membership Michael Lesk, Chair (until 11/2011) Paul A. David Rutgers University Stanford University Roberta Balstad, Vice Chair (until 11/2011) Kelvin Droegemeier Columbia University University of Oklahoma Francine Berman, Co-Chair Clifford Duke Rensselaer Polytechnic Institute Ecological Society of America Clifford Lynch, Co-Chair Barbara Entwisle Coalition for Networked Information University of North Carolina Maureen Baginski (until 11/2011) Stephen Friend Serco Sage Bionetworks Laura Bartolo Michael Goodchild (until 11/2011) Kent State University University of California, Santa Barbara R. Steven Berry (until 11/2011) Alyssa Goodman (until 11/2011) University of Chicago Harvard University Christine Borgman (until 11/2011) Margaret Hedstrom University of California, Los Angeles University of Michigan Philip Bourne Michael Keller (until 11/2011) University of California, San Diego Stanford University Norman Bradburn (until 11/2011) Alexa T. McCray University of Chicago Harvard Medical School Henry Brady Michael R. Nelson (until 11/2011) University of California, Berkeley Georgetown University Mark Brender Daniel Reed (until 11/2011) GeoEye Foundation Microsoft Research, Microsoft Inc. Bonnie Carroll Alan M. Title Information International Associates Lockheed Martin Advanced Technology Center Michael Carroll American University, Washington College of Ann J. Wolpert Law Massachusetts Institute of Technology Sayeed Choudhury Cathy H. Wu (until 11/2011) The Johns Hopkins University University of Delaware and Georgetown University Medical Center Keith Clarke University of California, Santa Barbara vi

OCR for page R1
Board on Research Data and Information Staff Paul F. Uhlir Daniel Cohen Board Director Program Officer [on detail from Library of Congress] Subhash Kuvelker Cheryl Levey Senior Program Officer Senior Program Associate vii

OCR for page R1

OCR for page R1
Preface and Acknowledgments Digital technologies and networks are now part of everyday work in the sciences, and have enhanced access to and use of scientific data, information, and literature significantly. They offer the promise of accelerating the discovery and communication of knowledge, both within the scientific community and in the broader society, as scientific data and information are made openly available online. The phrase "scientific knowledge discovery in open networked environments" is subject to many definitions. For purposes of this project, the focus was on computer-mediated or computational scientific knowledge discovery, taken broadly as any research processes enabled by digital computing technologies. Such technologies may include data mining, information retrieval and extraction, artificial intelligence, distributed grid computing, and others. These technological capabilities support computer-mediated knowledge discovery, which some believe is a new paradigm in the conduct of research. The emphasis was primarily on digitally networked data, rather than on the scientific, technical, and medical literature. The meeting also focused mostly on the advantages of knowledge discovery in open networked environments, although some of the disadvantages were raised as well. The workshop brought together a set of stakeholders in this area for intensive and structured discussions. The purpose was not to make a final declaration about the directions that should be taken, but to further the examination of trends in computational knowledge discovery in the open networked environments, based on the following questions and tasks: 1. Opportunities and Benefits: What are the opportunities over the next 5 to 10 years associated with the use of computer-mediated scientific knowledge discovery across disciplines in the open online environment? What are the potential benefits to science and society of such techniques? 2. Techniques and Methods for Development and Study of Computer-mediated Scientific Knowledge Discovery: What are the techniques and methods used in government, academia, and industry to study and understand these processes, the validity and reliability of their results, and their impact inside and outside science? 3. Barriers: What are the major scientific, technological, institutional, sociological, and policy barriers to computer-mediated scientific knowledge discovery in the open online environment within the scientific community? What needs to be known and studied about each of these barriers to help achieve the opportunities for interdisciplinary science and complex problem solving? 4. Range of Options: Based on the results obtained in response to items 13, define a range of options that can be used by the sponsors of the project, as well as other similar organizations, to obtain and promote a better understanding of the computer-mediated scientific knowledge discovery processes and mechanisms for openly available data and information online across the scientific domains. The objective of defining these options is to improve the activities of the sponsors (and other similar organizations) and the activities of researchers that they fund externally in this emerging research area. ix

OCR for page R1
The first day of the 2-day meeting consisted primarily of invited expert speakers, who addressed tasks 1-3. This was followed immediately by a workshop on the second day to leverage the expertise of the invitees to address task 4, based on the discussions of tasks 13 on the first day. The slides presented by the speakers at the meeting are posted on the National Academy of Sciences' Board on Research Data and Information Web site and the entire meeting was webcast.1 This report has been prepared by the workshop rapporteur as a factual summary of what occurred at the workshop. The committee's role was limited to planning and convening the workshop. The views contained in the report are those of the individual workshop participants and do not necessarily represent the views of all workshop participants, the steering committee, or the National Academies. It can be argued that too much time has passed since the meeting took place, and that the results of this effort are not timely enough to provide insight. In fact, the elapsed time between the events reported here and this report has provided time to assess the issues with more care. We are grateful to the National Science Foundation (NSF) for support of this project under NSF Grant Number 1042078. This volume has been reviewed in draft form by individuals chosen for their technical expertise, in accordance with procedures approved by the National Research Council's Report Review Committee. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets institutional standards for quality. The review comments and draft manuscript remain confidential to protect the integrity of the process. We wish to thank the following individuals for their review of this report: Sayeed Choudhury, Johns Hopkins University; Stephen Hilgartner, Cornell University; Michael Kurtz, Harvard University; Robert McDonald, Indiana University; Mark Parsons, University of Colorado; Jack Stankovic, University of Virginia; and Katherine Strandburg, New York University. Although the reviewers listed above have provided constructive comments and suggestions, they were not asked to endorse the content of the individual papers. Responsibility for the final content of the papers rests with the individual authors. 1 Available at http://sites.nationalacademies.org/PGA/brdi/PGA_060424. x

OCR for page R1
We would especially like to recognize the contributions of Daniel Cohen, on assignment to the National Academies from the U.S. Library of Congress, who assisted with the editing and the production of the manuscript Cheryl Levey of the board staff also helped with the review process and the preparation of this volume. Finally, we would like to thank Raed Sharif for his editorial support in completing this manuscript. John Leslie King Paul F. Uhlir Steering Committee Chair Project Director xi

OCR for page R1

OCR for page R1
Contents 1. Opening Session 1 Introduction, 1 John Leslie King Opening Remarks by Project Sponsors, 3 Alan Blatecky Sylvia Spengler Keynote Address: An Overview of the State of the Art, 7 Tony Hey Discussion, 17 2. Experiences with Developing Open Scientific Knowledge Discovery in Research and Applications 19 Case Studies International Online Astronomy Research, 19 Alberto Conti Integrative Genomic Analysis, 25 Stephen Friend Geoinformatics: Linked Environments for Atmospheric Discovery, 31 Sara Graves Implications of the Three Scientific Knowledge Discovery Case Studies The User Perspective International Online Astronomy Research, c. 2011, 37 Alyssa Goodman Integrative Genomic Analysis, 45 Joel Dudley Geoinformatics, 55 Mohan Ramamurthy Discussion, 61 3. How Might Open Online Knowledge Discovery Advance the Progress of Science? 69 Technological Factors Session Chair: Hal Abelson Interoperability, Standards, and Linked Data, 71 James Hendler National Technological Needs and Issues, 77 Deborah Crawford Discussion, 81 Sociocultural, Institutional, and Organizational Factors, 83 Session Chair: Michael Lesk Sociocultural Dimensions, 85 Clifford Lynch Institutional Factors, 89 Paul Edwards Discussion, 96 Policy and Legal Factors, 99 Session Chair: Michael Carroll Legal Aspects, 101 Michael Madison Knowledge Discovery in Open Networked Environments: Some Policy Issues, 107 Gregory A. Jackson Discussion, 114 How Can We Tell? What Needs to Be Known and Studied to Improve Potential for Success?, 117 Session Chair: Francine Berman Introduction, 119 xiii

OCR for page R1
Francine Berman An Academic Perspective, 121 Victoria Stodden A Government Perspective, 127 Walter L. Warnick Discussion, 130 4. Summary of Workshop Results from Day One and Discussion of Additional Issues 133 Introduction, 135 Bonnie Carroll Opportunities and Benefits for Automated Scientific Knowledge Discovery in Open Networked Environments, 137 Puneet Kishor Techniques and Methods for Development and Study of Automated Scientific Knowledge Discovery, 147 Alberto Pepe Barriers to Automated Scientific Knowledge Discovery in Open Networked Environments, 155 Alberto Pepe Range of Options for Further Research, 163 Puneet Kishor 5. Appendix: Workshop Agenda 179 xiv