Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page R1
THE FUTURE OF SCIENTIFIC KNOWLEDGE DISCOVERY IN
OPEN NETWORKED ENVIRONMENTS
Summary of a Workshop
Paul F. Uhlir, Rapporteur
Board on Research Data and Information
Policy and Global Affairs
THE NATIONAL ACADEMIES PRESS
Washington, D.C.
www.nap.edu
OCR for page R2
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the Governing Board of the
National Research Council, whose members are drawn from the councils of the National Academy of
Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the
committee responsible for the report were chosen for their special competences and with regard for
appropriate balance.
This study was supported by the National Science Foundation under Grant No. 1042078. This
report was prepared as an account of work sponsored by an agency of the United States
government. Neither the United States government nor any agency thereof, nor any of their
employees, makes any warranty, express or implied, or assumes any legal liability or
responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights.
Reference herein to any specific commercial product, process, or service by trade name,
trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement,
recommendation, or favoring by the United States government or any agency thereof. Any
opinions, findings, conclusions, or recommendations expressed in this publication are those of
the author(s) and do not necessarily reflect the views of the organizations or agencies that
provided support for the project.
International Standard Book Number-13: 978-0-309-26791-5
International Standard Book Number-10: 978-0-309-26791-9
Additional copies of this report are available from the National Academies Press, 500 Fifth Street, NW, Room
360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2012 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
OCR for page R3
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in
scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general
welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to
advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of
Sciences.
The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a
parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing
with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of
Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and
recognizes the superior achievements of engineers. Dr. Charles M. Vest is president of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent
members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts
under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal
government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is
president of the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community
of science and technology with the Academy's purposes of furthering knowledge and advising the federal government.
Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating
agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the
government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies
and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. Charles M. Vest are chair and vice chair, respectively, of the
National Research Council.
www.national-academies.org
OCR for page R4
OCR for page R5
THE FUTURE OF SCIENTIFIC KNOWLEDGE DISCOVERY IN OPEN
NETWORKED ENVIRONMENTS: A National Workshop
Steering Committee
John Leslie King, Chair
William Warner Bishop Collegiate Professor of Information
University of Michigan
Hal Abelson
Professor, Massachusetts Institute of Technology
Francine Berman
Vice President of Research, Rensselaer Polytechnic Institute
Bonnie Carroll
President, Information International Associates
Michael Carroll
Professor, American University, Washington College of Law
Alyssa Goodman
Professor, Harvard University
Sara Graves
Director, Information Technology and Systems Center
University Professor of Computer Science
University of Alabama in Huntsville
Michael Lesk
Professor, Rutgers University
Gilbert Omenn
Professor, University of Michigan
Project Staff
Paul F. Uhlir
Board Director
The National Academies
Daniel Cohen
Program Officer
The National Academies
[on detail from Library of Congress]
Cheryl Levey
Senior Program Associate
The National Academies
v
OCR for page R6
Board on Research Data and Information Membership
Michael Lesk, Chair (until 11/2011) Paul A. David
Rutgers University Stanford University
Roberta Balstad, Vice Chair (until 11/2011) Kelvin Droegemeier
Columbia University University of Oklahoma
Francine Berman, Co-Chair Clifford Duke
Rensselaer Polytechnic Institute Ecological Society of America
Clifford Lynch, Co-Chair Barbara Entwisle
Coalition for Networked Information University of North Carolina
Maureen Baginski (until 11/2011) Stephen Friend
Serco Sage Bionetworks
Laura Bartolo Michael Goodchild (until 11/2011)
Kent State University University of California, Santa Barbara
R. Steven Berry (until 11/2011) Alyssa Goodman (until 11/2011)
University of Chicago Harvard University
Christine Borgman (until 11/2011) Margaret Hedstrom
University of California, Los Angeles University of Michigan
Philip Bourne Michael Keller (until 11/2011)
University of California, San Diego Stanford University
Norman Bradburn (until 11/2011) Alexa T. McCray
University of Chicago Harvard Medical School
Henry Brady Michael R. Nelson (until 11/2011)
University of California, Berkeley Georgetown University
Mark Brender Daniel Reed (until 11/2011)
GeoEye Foundation Microsoft Research, Microsoft Inc.
Bonnie Carroll Alan M. Title
Information International Associates Lockheed Martin Advanced Technology
Center
Michael Carroll
American University, Washington College of Ann J. Wolpert
Law Massachusetts Institute of Technology
Sayeed Choudhury Cathy H. Wu (until 11/2011)
The Johns Hopkins University University of Delaware and
Georgetown University Medical Center
Keith Clarke
University of California, Santa Barbara
vi
OCR for page R7
Board on Research Data and Information Staff
Paul F. Uhlir Daniel Cohen
Board Director Program Officer
[on detail from Library of Congress]
Subhash Kuvelker Cheryl Levey
Senior Program Officer Senior Program Associate
vii
OCR for page R8
OCR for page R9
Preface and Acknowledgments
Digital technologies and networks are now part of everyday work in the sciences,
and have enhanced access to and use of scientific data, information, and literature significantly.
They offer the promise of accelerating the discovery and communication of knowledge, both
within the scientific community and in the broader society, as scientific data and information are
made openly available online.
The phrase "scientific knowledge discovery in open networked environments" is subject
to many definitions. For purposes of this project, the focus was on computer-mediated or
computational scientific knowledge discovery, taken broadly as any research processes enabled
by digital computing technologies. Such technologies may include data mining, information
retrieval and extraction, artificial intelligence, distributed grid computing, and others. These
technological capabilities support computer-mediated knowledge discovery, which some believe
is a new paradigm in the conduct of research.
The emphasis was primarily on digitally networked data, rather than on the scientific,
technical, and medical literature. The meeting also focused mostly on the advantages of
knowledge discovery in open networked environments, although some of the disadvantages were
raised as well.
The workshop brought together a set of stakeholders in this area for intensive and
structured discussions. The purpose was not to make a final declaration about the directions that
should be taken, but to further the examination of trends in computational knowledge discovery
in the open networked environments, based on the following questions and tasks:
1. Opportunities and Benefits: What are the opportunities over the next 5 to 10 years
associated with the use of computer-mediated scientific knowledge discovery across disciplines
in the open online environment? What are the potential benefits to science and society of such
techniques?
2. Techniques and Methods for Development and Study of Computer-mediated
Scientific Knowledge Discovery: What are the techniques and methods used in government,
academia, and industry to study and understand these processes, the validity and reliability of
their results, and their impact inside and outside science?
3. Barriers: What are the major scientific, technological, institutional, sociological, and
policy barriers to computer-mediated scientific knowledge discovery in the open online
environment within the scientific community? What needs to be known and studied about each
of these barriers to help achieve the opportunities for interdisciplinary science and complex
problem solving?
4. Range of Options: Based on the results obtained in response to items 13, define a
range of options that can be used by the sponsors of the project, as well as other similar
organizations, to obtain and promote a better understanding of the computer-mediated scientific
knowledge discovery processes and mechanisms for openly available data and information
online across the scientific domains. The objective of defining these options is to improve the
activities of the sponsors (and other similar organizations) and the activities of researchers that
they fund externally in this emerging research area.
ix
OCR for page R10
The first day of the 2-day meeting consisted primarily of invited expert speakers, who
addressed tasks 1-3. This was followed immediately by a workshop on the second day to
leverage the expertise of the invitees to address task 4, based on the discussions of tasks 13 on
the first day. The slides presented by the speakers at the meeting are posted on the National
Academy of Sciences' Board on Research Data and Information Web site and the entire
meeting was webcast.1
This report has been prepared by the workshop rapporteur as a factual summary of what
occurred at the workshop. The committee's role was limited to planning and convening the
workshop. The views contained in the report are those of the individual workshop participants
and do not necessarily represent the views of all workshop participants, the steering committee,
or the National Academies.
It can be argued that too much time has passed since the meeting took place, and that
the results of this effort are not timely enough to provide insight. In fact, the elapsed time
between the events reported here and this report has provided time to assess the issues with
more care.
We are grateful to the National Science Foundation (NSF) for support of this project
under NSF Grant Number 1042078. This volume has been reviewed in draft form by
individuals chosen for their technical expertise, in accordance with procedures approved by the
National Research Council's Report Review Committee. The purpose of this independent
review is to provide candid and critical comments that will assist the institution in making its
published report as sound as possible and to ensure that the report meets institutional standards
for quality. The review comments and draft manuscript remain confidential to protect the
integrity of the process.
We wish to thank the following individuals for their review of this report: Sayeed
Choudhury, Johns Hopkins University; Stephen Hilgartner, Cornell University; Michael Kurtz,
Harvard University; Robert McDonald, Indiana University; Mark Parsons, University of
Colorado; Jack Stankovic, University of Virginia; and Katherine Strandburg, New York
University.
Although the reviewers listed above have provided constructive comments and
suggestions, they were not asked to endorse the content of the individual papers. Responsibility
for the final content of the papers rests with the individual authors.
1
Available at http://sites.nationalacademies.org/PGA/brdi/PGA_060424.
x
OCR for page R11
We would especially like to recognize the contributions of Daniel Cohen, on
assignment to the National Academies from the U.S. Library of Congress, who assisted with
the editing and the production of the manuscript Cheryl Levey of the board staff also helped
with the review process and the preparation of this volume. Finally, we would like to thank
Raed Sharif for his editorial support in completing this manuscript.
John Leslie King Paul F. Uhlir
Steering Committee Chair Project Director
xi
OCR for page R12
OCR for page R13
Contents
1. Opening Session 1
Introduction, 1
John Leslie King
Opening Remarks by Project Sponsors, 3
Alan Blatecky
Sylvia Spengler
Keynote Address: An Overview of the State of the Art, 7
Tony Hey
Discussion, 17
2. Experiences with Developing Open Scientific Knowledge Discovery in Research and
Applications 19
Case Studies
International Online Astronomy Research, 19
Alberto Conti
Integrative Genomic Analysis, 25
Stephen Friend
Geoinformatics: Linked Environments for Atmospheric Discovery, 31
Sara Graves
Implications of the Three Scientific Knowledge Discovery Case Studies The User Perspective
International Online Astronomy Research, c. 2011, 37
Alyssa Goodman
Integrative Genomic Analysis, 45
Joel Dudley
Geoinformatics, 55
Mohan Ramamurthy
Discussion, 61
3. How Might Open Online Knowledge Discovery Advance the Progress of Science? 69
Technological Factors
Session Chair: Hal Abelson
Interoperability, Standards, and Linked Data, 71
James Hendler
National Technological Needs and Issues, 77
Deborah Crawford
Discussion, 81
Sociocultural, Institutional, and Organizational Factors, 83
Session Chair: Michael Lesk
Sociocultural Dimensions, 85
Clifford Lynch
Institutional Factors, 89
Paul Edwards
Discussion, 96
Policy and Legal Factors, 99
Session Chair: Michael Carroll
Legal Aspects, 101
Michael Madison
Knowledge Discovery in Open Networked Environments: Some Policy Issues, 107
Gregory A. Jackson
Discussion, 114
How Can We Tell? What Needs to Be Known and Studied to Improve Potential for Success?, 117
Session Chair: Francine Berman
Introduction, 119
xiii
OCR for page R14
Francine Berman
An Academic Perspective, 121
Victoria Stodden
A Government Perspective, 127
Walter L. Warnick
Discussion, 130
4. Summary of Workshop Results from Day One and Discussion of
Additional Issues 133
Introduction, 135
Bonnie Carroll
Opportunities and Benefits for Automated Scientific Knowledge Discovery in Open Networked
Environments, 137
Puneet Kishor
Techniques and Methods for Development and Study of Automated Scientific Knowledge Discovery, 147
Alberto Pepe
Barriers to Automated Scientific Knowledge Discovery in Open Networked Environments, 155
Alberto Pepe
Range of Options for Further Research, 163
Puneet Kishor
5. Appendix: Workshop Agenda 179
xiv