Bioinformatics

Converting Data to Knowledge

A Workshop Summary by

Robert Pool, Ph.D.

and

Joan Esnayra, Ph.D.

Board on Biology

Commission on Life Sciences

National Research Council

NATIONAL ACADEMY PRESS
Washington, D.C.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
Bioinformatics: Converting Data to Knowledge Bioinformatics Converting Data to Knowledge A Workshop Summary by Robert Pool, Ph.D. and Joan Esnayra, Ph.D. Board on Biology Commission on Life Sciences National Research Council NATIONAL ACADEMY PRESS Washington, D.C.

OCR for page R1
Bioinformatics: Converting Data to Knowledge NATIONAL ACADEMY PRESS 2101 Constitution Avenue Washington, D.C. 20418 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This report has been prepared with funds provided by the Department of Energy, grant DEFG02-94ER61939, and the National Cancer Institute, contract N01-OD-4-2139. ISBN 0-309-07256-5 Additional copies are available from the National Academy Press, 2101 Constitution Ave., NW, Box 285, Washington, DC 20055; 800-624-6242 or 202-334-3313 in the Washington metropolitan area; Internet <http://www.nap.edu>. Copyright 2000 by the National Academy of Sciences. All rights reserved. Printed in the United States of America.

OCR for page R1
Bioinformatics: Converting Data to Knowledge THE NATIONAL ACADEMIES National Academy of Sciences National Academy of Engineering Institute of Medicine National Research Council The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Bruce M. Alberts is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. William A. Wulf is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Kenneth I. Shine is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Bruce M. Alberts and Dr. William A. Wulf are chairman and vice chairman, respectively, of the National Research Council.

OCR for page R1
Bioinformatics: Converting Data to Knowledge PLANNING GROUP FOR THE WORKSHOP ON BIOINFORMATICS: CONVERTING DATA TO KNOWLEDGE DAVID EISENBERG, University of California, Los Angeles, California DAVID J. GALAS, Keck Graduate Institute of Applied Life Sciences, Claremont, California RAYMOND L. WHITE, University of Utah, Salt Lake City, Utah Science Writer ROBERT POOL, Tallahassee, Florida Staff JOAN ESNAYRA, Study Director JENNIFER KUZMA, Program Officer NORMAN GROSSBLATT, Editor DEREK SWEATT, Project Assistant Acknowledgments The steering committee acknowledges the valuable contributions to this workshop of Susan Davidson, University of Pennsylvania; Richard Karp, University of California, Berkeley; and Perry Miller, Yale University. In addition, the steering committee thanks Marjory Blumenthal and Jon Eisenberg, of the NRC Computer Science and Telecommunications Board, for helpful input.

OCR for page R1
Bioinformatics: Converting Data to Knowledge BOARD ON BIOLOGY MICHAEL T. CLEGG, Chair, University of California, Riverside, California JOANNA BURGER, Rutgers University, Piscataway, New Jersey DAVID EISENBERG, University of California, Los Angeles, California DAVID J. GALAS, Darwin Technologies, Seattle, Washington DAVID V. GOEDDEL, Tularik, Inc., San Francisco, California ARTURO GOMEZ-POMPA, University of California, Riverside, California COREY S. GOODMAN, University of California, Berkeley, California CYNTHIA J. KENYON, University of California, San Francisco, California BRUCE R. LEVIN, Emory University, Atlanta, Georgia ELLIOT M. MEYEROWITZ, California Institute of Technology, Pasadena, California ROBERT T. PAINE, University of Washington, Seattle, Washington RONALD R. SEDEROFF, North Carolina State University, Raleigh, North Carolina ROBERT R. SOKAL, State University of New York, Stony Brook, New York SHIRLEY M. TILGHMAN, Princeton University, Princeton, New Jersey RAYMOND L. WHITE, University of Utah, Salt Lake City, Utah Staff RALPH DELL, Acting Director (until August 2000) WARREN MUIR, Acting Director (as of August 2000)

OCR for page R1
Bioinformatics: Converting Data to Knowledge COMMISSION ON LIFE SCIENCES MICHAEL T. CLEGG, Chair, University of California, Riverside, California FREDERICK R. ANDERSON, Cadwalader, Wickersham and Taft, Washington, D.C. PAUL BERG, Stanford University, Stanford, California JOANNA BURGER, Rutgers University, Piscataway, New Jersey JAMES CLEAVER, University of California, San Francisco, California DAVID EISENBERG, University of California, Los Angeles, California NEAL L. FIRST, University of Wisconsin, Madison, Wisconsin DAVID J. GALAS, Keck Graduate Institute of Applied Sciences, Claremont, California DAVID V. GOEDDEL, Tularik, Inc., San Francisco, California ARTURO GOMEZ-POMPA, University of California, Riverside, California COREY S. GOODMAN, University of California, Berkeley, California JON W. GORDON, Mount Sinai School of Medicine, New York, New York DAVID G. HOEL, Medical University of South Carolina, Charleston, South Carolina BARBARA S. HULKA, University of North Carolina, Chapel Hill, North Carolina CYNTHIA J. KENYON, University of California, San Francisco, California BRUCE R. LEVIN, Emory University, Atlanta, Georgia DAVID M. LIVINGSTON, Dana-Farber Cancer Institute, Boston, Massachusetts DONALD R. MATTISON, March of Dimes, White Plains, New York ELLIOT M. MEYEROWITZ, California Institute of Technology, Pasadena, California ROBERT T. PAINE, University of Washington, Seattle, Washington RONALD R. SEDEROFF, North Carolina State University, Raleigh, North Carolina ROBERT R. SOKAL, State University of New York, Stony Brook, New York CHARLES F. STEVENS, The Salk Institute for Biological Studies, La Jolla, California SHIRLEY M. TILGHMAN, Princeton University, Princeton, New Jersey RAYMOND L. WHITE, University of Utah, Salt Lake City, Utah Staff WARREN MUIR, Executive Director

OCR for page R1
Bioinformatics: Converting Data to Knowledge Preface In 1993 the National Research Council's Board on Biology established a series of forums on biotechnology. The purpose of the discussions is to foster open communication among scientists, administrators, policy-makers, and others engaged in biotechnology research, development, and commercialization. The neutral setting offered by the National Research Council is intended to promote mutual understanding among government, industry, and academe and to help develop imaginative approaches to problem-solving. The objective, however, is to illuminate issues, not to resolve them. Unlike study committees of the National Research Council, forums cannot provide advice or recommendations to any government agency or other organization. Similarly, summaries of forums do not reach conclusions or present recommendations, but instead reflect the variety of opinions expressed by the participants. The comments in this report reflect the views of the forum's participants as indicated in the text. For the first forum, held on November 5, 1996, the Board on Biology collaborated with the Board on Agriculture to focus on intellectual property rights issues surrounding plant biotechnology. The second forum, held on April 26, 1997, and also conducted in collaboration with the Board on Agriculture, was focused on issues in and obstacles to a broad genome project with numerous plant and animal species as its subjects. The third forum, held on November 1, 1997, focused on privacy issues and the desire to protect people from unwanted intrusion into their medical records. Proposed laws contain broad language that could affect bio-

OCR for page R1
Bioinformatics: Converting Data to Knowledge medical and clinical research, in addition to the use of genetic testing in research. After discussions with the National Cancer Institute and the Department of Energy, the Board on Biology agreed to run a workshop under the auspices of its forum on biotechnology titled “Bioinformatics: Converting Data to Knowledge” on February 16, 2000. A workshop planning group was assembled, whose role was limited to identifying agenda topics, appropriate speakers, and other participants for the workshop. Topics covered were: database integrity, curation, interoperability, and novel analytic approaches. At the workshop, scientists from industry, academe, and federal agencies shared their experiences in the creation, curation, and maintenance of biologic databases. Participation by representatives of the National Institutes of Health, National Science Foundation, US Department of Energy, US Department of Agriculture, and the Environmental Protection Agency suggests that this issue is important to many federal bodies. This document is a summary of the workshop and represents a factual recounting of what occurred at the event. The authors of this summary are Robert Pool and Joan Esnayra, neither of whom were members of the planning group. This workshop summary has been reviewed in draft form for accuracy by individuals who attended the workshop and others chosen for their diverse perspectives and technical expertise in accordance with procedures approved by the NRC's Report Review Committee. The purpose of this independent review is to assist the NRC in making the published document as sound as possible and to ensure that it meets institutional standards. We wish to thank the following individuals, who are neither officials nor employees of the NRC, for their participation in the review of this workshop summary: Warren Gish, Washington University School of Medicine Anita Grazer, Fairfax County Economic Development Authority Jochen Kumm, University of Washington Genome Center Chris Stoeckert, Center for Bioinformatics, University of Pennsylvania While the individuals listed above have provided many constructive comments and suggestions, it must be emphasized that responsibility for the final content of this document rests entirely with the authors and the NRC. Joan Esnayra Study Director

OCR for page R1
Bioinformatics: Converting Data to Knowledge Contents      THE CHALLENGE OF INFORMATION   1      An Explosion of Databases,   3      A Workshop in Bioinformatics,   4      CREATING DATABASES   5      Four Elements of a Database,   7      Database Curation,   7      The Need for Bioinformaticists,   9      BARRIERS TO THE USE OF DATABASES   11      Proprietary Issues,   11      Disparate Terminology,   13      Interoperability,   13      MAINTAINING THE INTEGRITY OF DATABASES   17      Error Prevention,   18      Error Correction,   18      The Importance of Trained Curators and Annotators,   19      Data Provenance,   20      Database Ontology,   20      Maintaining Privacy,   22

OCR for page R1
Bioinformatics: Converting Data to Knowledge      CONVERTING DATA TO KNOWLEDGE   23      Data Mining,   23      International Consortium for Brain Mapping,   25      SUMMARY   29      Appendixes       A Agenda   31     B Participant Biographies   33

OCR for page R1
Bioinformatics: Converting Data to Knowledge Dedication This report is dedicated to the memory of Dr. G. Christian Overton for his vision and pioneering contributions to genomic research.

OCR for page R1
Bioinformatics: Converting Data to Knowledge This page in the original is blank.