Mathematics and Computer Science Panel Summary
The Mathematics and Computer Science panel was composed of scientists and educators selected for their expertise in mathematics, computer science, or biology, especially knowledge of connections between the fields. The panel met on March 15-16, 2001, at Boston University to discuss how to integrate that kind of knowledge into the undergraduate education of future biomedical researchers.
EXPERTISE OF MEMBERS OF THE PANEL
Nancy Kopell is W.G. Aurelio Professor of Mathematics and Science and co-director of the Center for BioDynamics (a multidisciplinary center for biology, mathematics, and engineering) at Boston University. Her research includes the mathematics of self-organizing systems (both physical and biological); currently she is focusing on dynamics of the nervous system, especially rhythmic activity associated with cognition and motor control. She was awarded a MacArthur Fellowship and is a member of the National Academy of Sciences. She received a bachelor’s degree from Cornell University and her PhD from the University of California at Berkeley.
Robert Blystone is professor of biology at Trinity University in San Antonio, Texas. His research is on the nitration of thermally stressed tissue, computer reconstruction of tissue, and educational issues related to quanti-
tative learning in biology. He is a Fellow of the American Association for the Advancement of Science and has been named the Scott Professor for Teaching and the Piper Professor of Texas. He serves on the education committee of the American Society for Cell Biology and the college committee of the National Association of Biology Teachers. He has participated in many conferences, workshops, and panels on education, the most recent being the Mathematics Association of America’s study of Mathematics Education Reform in Biology and Chemistry. He teaches courses on Biological Visualization, Developmental Biology, and Organsimal Structure & Function. He received a BS in biological sciences from the University of Texas at El Paso and an MA and PhD in zoology from the University of Texas at Austin.
Louis Gross is director of the Institute for Environmental Modeling and professor of ecology and evolutionary biology and professor of mathematics at the University of Tennessee in Knoxville. His research interests include mathematical ecology, computational ecology, quantitative training for life science students, photosynthetic dynamics, and parallel computation for ecological models. He was the co-director of courses and workshops on mathematical ecology held by the International Centre for Theoretical Physics in Trieste, Italy, between 1986 and 2000. He has organized two NSF-sponsored workshops on quantitative curriculum development for life science students. In 1999 he taught an NSF Chataqua Course entitled Life Science Education: Preparing Fearless Biologists. At Tennessee he teaches courses on Mathematical Ecology, Mathematical Modeling and Evolutionary Theory, and Basic Concepts in Ecology. He received his BS degree from Drexel University and his PhD in Applied Mathematics from Cornell University.
Richard Karp is senior research scientist at the International Computer Science Institute in Berkeley, California, and professor of computer science and adjunct professor of molecular biotechnology at the University of Washington in Seattle. He has done research on NP-completeness, fast parallel algorithms, string matching, and, most recently, computational biology. His current research is on the application of algorithms, combinatorial mathematics, and probability to problems in genomics. He is particularly interested in physical mapping, in the analysis of genome sequencing strategies, and in the application of algorithms to the study of gene expression. He is a member of the National Academy of Sciences and
the National Academy of Engineering. He was awarded the Fulkerson Prize in Discrete Mathematics, Lanchester Prize in Operations Research, ACM Turing Award, and the U.S. National Medal of Science. He is a member of the National Advisory Board for Computer Professionals for Social Responsibility. He received a Distinguished Teaching Award from the UC Berkeley Academic Senate. His teaching at the University of Washington includes Algorithms in Molecular Biology. He has bachelor’s and PhD degrees from Harvard University.
Eric Lander is director of the Whitehead Institute and professor of biology at the Massachusetts Institute of Technology. His research interests include human, mouse, and population genetics, and computational methods in biology. He was awarded a MacArthur Fellowship and is a Fellow of the American Association for the Advancement of Science. He is a member of the National Academy of Sciences and the Institute of Medicine. He was on the NRC Committee on the Mathematical Sciences in Genome and Protein Structure Research, which produced the report Calculating the Secrets of Life. He has taught courses on mathematics, statistics, and economics, and developed new courses on bidding and bargaining, artificial intelligence, and on science-based businesses. He was awarded MIT’s Baker Memorial Prize for Excellence in Undergraduate Teaching. He received his bachelor’s degree from Princeton University and his PhD in Mathematics from Oxford University.
Markus Meister is professor of molecular and cellular biology at Harvard University. His research is in the field of systems neuroscience, specifically using retina to understand how large systems of neurons represent and process information. He has been a PEW Scholar, NSF Presidential Faculty Fellow, Lucille P. Markey Scholar, Fellow of the Helen Hay Whitney Foundation, and is a member of the Stiftung Maximilianeum and the Studienstifung des Deutschen Volkes of Germany. He teaches graduate and undergraduate students in the Molecular and Cellular Biology Program; his courses include Experimental Neuroscience and Function of Neural Systems. He received his PhD from the California Institute of Technology.
Alan Perelson is head of the Los Alamos National Laboratory’s Theoretical Biology and Biophysics Division. His research interests include mathematical and theoretical biology with an emphasis on problems in immunology and virology. He has taught courses in the biophysics field at UC-Berkeley,
Brown University, and the Ecole Normale Superieure in Paris. He is on the Board of Governors for the Institute of Mathematics and its Applications at the University of Minnesota. He is a member of the Science Board, head of the Theoretical Immunology Program, and an external professor of the Santa Fe Institute. He is a past president of the Society for Mathematical Biology. He was awarded an NIH Research Career Development Award. He serves on the Springer-Verlag editorial board responsible for textbooks in biomathematics. He received his bachelor’s degree in life science and electrical engineering from the Massachusetts Institute of Technology and a PhD in biophysics from University of California at Berkeley.
Louise Ryan is professor of biostatistics at Harvard School of Public Health. Her research is on statistical methods related to environmental health research and risk assessment. She is a Fellow of the American Statistical Association and in the International Statistics Institute. She received the Speigelman Award from the American Public Health Association. She is currently a co-editor of Biometrics and president of the Eastern North American Region of the International Biometric Society. She has served on advisory boards for several government agencies, including the National Toxicology Program and the Environmental Protection Agency, as well as NRC committees on toxicological effects of mercury and arsenic in drinking water. She teaches graduate courses at the Harvard School of Public Health and is the program director for an Initiative for Minority Student Development Grant, which supports summer internships and predoctoral training. In addition, she is the director of the Summer Program in Biostatistics at Harvard School of Public Health, which targets undergraduate math majors from underrepresented minority groups. She received the Harvard School of Public Health Mentoring Award in 2000. She received her PhD from Harvard University.
DeWitt Sumners is the Robert O. Lawton Distinguished Professor of Mathematics at Florida State University and co-director of the Program in Mathematics and Molecular Biology. His current research projects include DNA topology and analyzing the function of the human brain. He specializes in knot theory and applications of topology to molecular biology and polymer configuration, both in theory development and computational simulation. He serves on the NRC Board on Mathematical Sciences and was on the Committee on Mathematical Challenges from Computational Chemistry. He teaches undergraduates at Florida State, including intro-
ductory calculus courses. He has presented to Congress on Calculating the Secrets of Life: Mathematics and Medicine. He received his bachelor’s degree in physics from Louisiana State University and PhD in mathematics from the University of Cambridge.
Consultant to the committee:
Charles Peskin is professor of mathematics at the Courant Institute of Mathematical Sciences, New York University. He has done extensive mathematical and numerical analysis of physiological problems, particularly in cardiac fluid dynamics and the study of the heart’s architecture. His research interests include the application of mathematics and computing to problems arising in medicine and biology, fluid dynamics of the heart, and molecular machinery within biological cells. He is a winner of the NYU Alumni Association’s Great Teacher Award, a MacArthur Fellowship, the James H. Wilkinson Prize in Numerical Analysis and Scientific Computing, and the New York City Mayor’s Award for Excellence in Science and Technology. He teaches a freshman honors seminar in computer simulation. He received his bachelor’s degree in engineering and applied physics from Harvard University and his PhD in physiology from Yeshiva University.
REPORT OF THE MATHEMATICS AND COMPUTER SCIENCE PANEL
It was the unanimous view of the mathematics and computer science panels that there need to be major revisions in the education of scientists working on cutting-edge biology questions. Biology is changing from a purely experimental science done at the bench to one in which large databases of information and quantitative models play a significant role in the day-to-day life of a research biologist. As the role of the 30,000 or so genes in the human begin to unfold, new means will be needed to understand the interactions between gene products that lead to the coordinated activities of the cell. Gene networks, metabolic networks, neural networks, and cell signaling are all terms bantered about and reflect the need for biologists to view and understand the coordinated activities of large numbers of components of the complex systems underlying life. While today’s students learn about the importance of in vitro models, students in 2010 should be prepared for doing in silico (or computer) experiments, which may be as commonplace as today’s in vitro experimental systems. To prepare for this sea
change in activities, undergraduate biology majors who plan to pursue a research career need to be educated in a more quantitative manner. The panel felt that no one curriculum should be mandated because students interested in different areas of biology could benefit from different courses, even at the undergraduate level. Furthermore, it made a distinction between the “quantitative biologist,” who works at the interface of math/ computer science and biology, and the “research biologist,” who needs familiarity with a range of mathematical and computational ideas without necessarily being expert. Thus, the panel felt that flexibility in offerings is more advisable than a fixed curriculum.
The panel suggested that all biology majors, not just future biomedical researchers, should be exposed to and develop a conceptual understanding for the idea of rate of change, modeling, equilibria and stability, structure of a system, interactions among components, data and measurement, stochasticity, visualizing, and algorithms. More details on these concepts are in Chapter 2. In addition, future biomedical researchers should graduate with the ability to do in-depth analysis in a subset of the listed topics. In addition to these content recommendations, the panel recommended early exposure to quantitative ideas, via a reorganization of the first-year biology course to introduce a variety of quantitative concepts in the context of biological themes. A similar approach for upper-level students would integrate quantitative ideas into courses such as genomics, ecology, and neurobiology. One mechanism for doing this is to offer a standard course in a “quantitatively intensive” version, analogous to the “writing intensive” courses offered by some schools. The quantitatively intensive version would involve more credit hours and could be taught by a different faculty member as a seminar or a laboratory. The panel also recommended that research opportunities in quantitative biology be encouraged and funded.
The panel recommended curricular changes beyond the biology department. Many biology majors would benefit from new courses in the department of mathematics. The panel proposed a new sequence designed to condense many of the existing undergraduate mathematics courses into three or four semesters. Computer science courses designed for biology students would also be beneficial. Changes in the curriculum will not be enough to produce the desired cohort of quantitatively trained biologists; it will also be necessary to help train the teachers of these future scientists, and to provide both the teachers and the students with appropriate teaching materials. This will require funding to produce, publicize, and/or adapt this material.
The panel felt that students interested in biology are opting to major in other disciplines because they do not feel quantitatively challenged in the traditional biology courses. They would like to attract these students by offering a quantitative track within the biology major. In a track designed for a quantitative biologist, a student might take one year of standard calculus, which many students now take in high school; one semester of linear algebra; one semester of statistics; a one-semester course on ordinary differential equations that includes some numerical work, possibly with packages such as Matlab; and one course on discrete mathematics tailored toward genome problems. These courses could be standard math classes and thus not add a burden on the biology department, although the discrete mathematics course could be of the type listed below for research biologists. Such a track can be flexibly designed for students with different educational goals. Some students may wish to pursue a career in biology that involves the development and analysis of models, databases, etc. Such students would require more education in math, computer science, and physical science than the wet-lab biologist who needs to be familiar with quantitative reasoning but who will not be creating new quantitative tools and analyses. Clearly, it is a difficult task to identify the needs of students in the earliest days of their entry into the study of biology, but if appropriate choices are available, students will self-select the track that fits their capabilities and interests.
In a track for research biologists, new courses are needed. The level of mathematics in this track would not be as great as for the quantitative biologist. However, more emphasis would need to be placed on motivating mathematics and statistics and showing how they are used. Rather than doing standard calculus, linear algebra, and differential equations, a one-year course on mathematics for biologists should be designed. This course should be based on biological examples and include methods of solving problems, but with more emphasis on standard packages, e.g. Matlab and Mathematica, than a course for mathematics majors or quantitative biologists. In addition, a second course (one semester or a year) encompassing ideas of genomics, bioinformatics, statistics and probability, discrete mathematics, the use of databases, tools for searching databases, and some introduction to programming or writing scripts should be implemented.
Students in either track would benefit from the opportunity to do basic research. The National Science Foundation Division of Mathematical Sciences (NSF DMS) funds a collection of REU (Research Experiences for Undergraduates) summer programs each year. Each program has 10-15
students in residence at a university for six weeks or so of intensive mathematics; the students individually and collectively attack mathematics research problems. It would be possible to run REU programs in mathematics departments that were geared for biology undergraduates with the cooperation of the local university biology department and wet laboratory access. One scenario would be for the students to take a math-modeling course and learn to use canned software packages of interest in modern biology in the morning, and work in a wet biology lab in the afternoon. Ideally, the mathematics, calculation, and visualization techniques they learn in the math department would be applicable toward the analysis of the data they would be generating in the wet lab. Perhaps the funding for such biologically oriented REU summer programs would come jointly from the mathematics and biology directorates at the NSF (and perhaps the NIH).
Interdisciplinary Modeling Courses
The panel advocated the teaching of interdisciplinary courses in modeling, both at the introductory level and the advanced level. Interdisciplinary courses are distinguished by several characteristics. First, they are intended for a mixed audience that covers a spectrum from students who think of themselves as primarily biological to students who think of themselves as primarily mathematical or computational (and including, of course, students who already think of themselves as primarily interdisciplinary). Having students with different majors enrolled in the same course is an asset. It encourages discussion across traditional disciplinary boundaries and it is useful for students to see familiar material in an unfamiliar light. This will happen in such a course across the whole spectrum of students, although different aspects of the course will be familiar/unfamiliar to different students. Students with different backgrounds will be able to help each other with different aspects of the course material. The panel felt that it was often advantageous to organize interdisciplinary courses around the biological material. Mathematical/computational methods should be taught, but on a need-to-know basis. The emphasis should not be on the methods per se, but rather on how the methods elucidate the biology. The goal should be to see biology in a whole new light as a result of the mathematical/computational approach to the subject. Interdisciplinary courses can be taught either by individuals or by interdisciplinary teams. The people who teach such courses should, either individually or collectively,
have actual experience in the application of mathematics or computation to biological problems. If team teaching is used, great care must be taken to avoid fragmentation of the course into separate modules on biology, mathematics, and computation. The point of the course should be to bring the mathematics and computation to bear on the biology.
Where in the undergraduate curriculum should interdisciplinary courses appear? Ideally, they should appear in both the first and last year of the curriculum. The purpose of the first-year course would be to provide strong motivation by showing how useful mathematics and computing can be in biology. The student thus motivated would then go on to learn more mathematics and computing, as well as biology, in the middle years of the curriculum. The interdisciplinary course in the last year would then put all that knowledge to work in an integrated way. The first-year course is more difficult to design because of the limited knowledge of the student. One possible format is a “whet the appetite” course that could be given to a relatively large class of freshmen. The goal would not be to provide indepth, hands-on experience, but rather to expose students to the broad range of interesting work that can be done at the interface of biology and mathematics. A series of speakers would present case studies on a wide range of topics such as various aspects of genomics, environmental science, medical statistics, computational biology, mathematical biology, toxicology, or risk assessment. The twin purposes of such a course would be for biology students to see that mathematics and computation can play an important role in their work, and for mathematics and computer science students to see the potential for applying quantitative methods (statistics, applied mathematics, computer science) to biology and medicine. Because not all schools would have faculty with the expertise to run such a course, they might need to rely on a series of outside visitors. In any case, it would probably run more like a colloquium series than an actual course.
At the opposite extreme, a second possible format is the first-year seminar. This seminar could be devised so that students do hands-on computer simulations of biological phenomena. Such a course would meet alternately in a classroom and a computer laboratory. Class time would be devoted to the exposition of mathematical models in biology, and of methods for studying the behavior of such models by computer. The computer lab would be a hands-on experience in which students work individually or in small groups on computing projects. Each research team would report on its work to the class as a whole. Suitable topics at this level may be drawn from physiology (blood circulation, gas exchange in the lung, con-
trol of cell volume, electrical activity of neurons, renal countercurrent mechanism, muscle mechanics) or population biology (epidemic and endemic disease, ecological dynamics, population genetics, evolution). Mathematical models would either involve systems of algebraic equations (accessible with high school mathematics) or ordinary differential equations (made tractable and understandable via Euler’s method without any formal course in differential equations required). Simulations involving random numbers can also be done with only an intuitive introduction to probability and the use of a random number generator. A computer language such as Matlab makes it easy to write programs that implement Euler’s method (and other similar methods), and also provides easy access to graphical output, including animations. Black-box software that solves differential equations should be avoided because it short-changes the educational value of seeing how the problem is actually being solved.
The senior-level interdisciplinary course could reprise many of the same topics at a different level of sophistication. Where the first-year course might have considered only point neurons, for example, the senior course might consider spatially distributed neurons, thus moving up mathematically from ordinary to partial differential equations. Again, numerical methods provide a path to understanding without a formal course in partial differential equations. Besides the use of more advanced methods, the senior-level course should be characterized by a greater emphasis on original research projects conducted by the students. The projects in this course would be similar to senior theses, but would be done at least in some cases by teams of students, and in all cases in the context of a group of like-minded students, engaged in similar interdisciplinary efforts, to whom the work would eventually be reported.
Competency and Expertise in Computer Science
The panel recommended that all biology students receive instruction in computer science. It is useful to distinguish three levels of aspiration concerning the role of computer science in undergraduate biology education.
Fluency with Information Technology
The goal is to prepare biology students to use information technology today and to adapt to changes in information technology in the future.
They should acquire an understanding of how computers work, basic programming skills, and fluency in using networks and databases. A course at this level should include simple programming assignments. To give it a biological accent, there should be laboratory experiences using Medline, Genbank, and other biological databases, as well as physiological and ecological simulations. One assignment might ask students to use computer searches to track down all known information about a given gene and the protein it encodes, including both structure and function. This would involve exploring the internal structure of the gene (exons, introns, promoter, transcription factor binding sites); the regulatory control of the gene; sequence homologs of the gene and the protein; the structure and function of the protein; gene interaction networks and metabolic pathways involving the protein; and interactions of the protein with other proteins and with small molecules.
The NRC report Being Fluent with Information Technology lays out the structure and objectives of such a course in detail, but is not oriented specifically toward biologists.
Capability in Program Design for Computational Biology and Genomics Applications
A course at this level provides the minimal skills required to be an effective computer user within a computationally oriented biology research team. A good example is a course by Adam Arkin at Berkeley. His course introduces students to structured software development and selected principles of computer science, with applications in computational biology and allied disciplines. The principal language used for instruction is Java, with a course module on Perl. Examples and tutorials are drawn from problems in computational biology. The course requires one significant programming project, preferably biologically oriented.
Capability in Developing Software Tools for Use by the Biology Community
A foundation for reaching this level is provided by courses in discrete mathematics, data structures, and algorithms. According to the student’s interests, these could be followed by courses in database management systems, information systems, software engineering, computer graphics, or computer simulation techniques. Biologists could select courses that teach
the design and specification of database and information systems, not merely their internal structure.
Graph theory and combinatorics are at the heart of many of the successful applications of mathematics and computer science in high-throughput genomics research (microarray chips) and rational drug design; the panel believes that this interface will continue to grow in importance. Computational geometry and the ability to describe, visualize, and computationally compare complicated surfaces in space will become an important area in proteomics and computational medicine.
Teaching Materials and Faculty Development
Standard texts either need to be revised or replaced by more quantitative texts. The texts for most courses in elementary discrete mathematics are not especially exciting. They are filled with definitions but do not challenge the students with interesting problems. More exciting courses are taught by Stephen Rudich at Carnegie-Mellon and by Alistair Sinclair and Umesh Vazirani at Berkeley (CS 70). Terry Speed at Berkeley has developed an introductory statistics course in which the motivating examples are drawn from genomics.
It should be possible to develop courses in discrete mathematics, probability, and algorithms that emphasize applications in biology. It is particularly easy to find motivating examples from genomics and genetics, since those subjects are inherently combinatorial and probabilistic. As one example, sequence alignment is an ideal vehicle for introducing dynamic programming. Graph theory can be linked to sequencing by hybridization. Pedigree analysis and the design of genetic crosses abound with combinatorial puzzles. Probability can be illustrated through the analysis of sequencing and mapping strategies or pooling designs. Fred Roberts at Rutgers has done excellent work in this area (The Scientist 9, July 10, 1995).
There are not many programs designed specifically to impart quantitative literacy to biology faculty. Some existing programs target other audiences, such as quantitative training of K-12 students, high school teachers, predoctoral students, and postdoctoral students. Joe Rosenstein at Rutgers and Maria Klawe at the University of British Columbia are very active in developing such programs. In addition, The Keck Center in Houston, with sponsorship from NSF and NLM, runs a computational biology training program for predoctoral and postdoctoral fellows. A summer short course might be an appropriate vehicle for enhancing the quantitative lit-
eracy of undergraduate biology faculty; NSF sponsors such short courses in other fields. The computer science component of the program might be modeled after Adam Arkin’s programming course described above. Terry Speed’s statistics course (also described above) might be a model for the statistics component.