Educating Statisticians for the 21st Century
H. Jean Thiebaux
National Science Foundation
Here I ask you to think of "the customer" as the future of science and to consider a case for a course of study in science, mathematics, and philosophy as offering the best undergraduate preparation for some future statisticians—including those who may find the most exciting work in the coming century.
It is important that statisticians be partners in the future of science —partners in setting and following agendas as well as analyzing the data fallout. This speaks to the benefit of statistical progression as well as to the benefit of other disciplines of science. Research in statistics and research in the companion areas of scientific enquiry are mutually supportive and progress together in their development.
Statistics may be thought of as the art of persuading the universe to yield information about itself. It provides the logical structure for scientific inference; for this reason, non-statisticians frequently think of statistics only in an immediate serving capacity. However, I want to entertain another point of view, namely, one that places the continuing development of our discipline as a major item on a bigger scientific agenda.
I will describe the developmental goal of statistics as the creation of a complete and unambiguous formalism—a filter—for taking in data received from the universe and giving back assignments of relative credibilities to possible truths about the universe. We can, and some of us do, work on this as a mathematical abstraction. However, it is intrinsically bound to developmental goals of other sciences. Optimally, statistics and its companion, probability, are full partners in the progression of scientific understanding. And what the customer needs from the statistician/probabilist is refinement of the filter . If we focus on a particular broad transect of current scientific activity, say, all the research dedicated to understanding and predicting global change, we can see that "refinement of the filter" means extending its capacity to planetary-scale systems, whether they are biological, geophysical, social, or political systems.
We ask and seek an answer to the question, What does this customer requirement mean for the education of future statisticians? History offers guideposts in seeking an answer, in the lives and work of the Bernoullis, Gauss, Galton, and Fisher. These prominent figures from the archives of statistics are all recognized for their considerable contributions to the creation of the filter. The patterns of their work support the conclusion that the future of science (as a customer of statistics) will be best served by statisticians who are solidly grounded in traditional science, mathematics, and philosophy.
We can trace the roots of our discipline back at least to the 1600s, to the Bernoulli family and three generations of scientist-mathematicians whose work encompassed physics, chemistry, astronomy, biology, and engineering, as well as mathematics. We are likely to know their name
NOTE: These are personal opinions of the author and do not represent National Science Foundation policy.
best, however, for their work in the theory of probability. My objective in citing them is to point to the breadth of their interests and acquaintance with science and to conjecture that the ground-breaking contributions the Bernoullis made in probability theory were inspired by questions that, during the more than 100 years of their work, were at the forefront of scientific enquiry (Bell, 1937).
In chronological order, the next on the list is Karl Friedrich Gauss, a prolific contributor to scientific research. Among his many accomplishments, this scientist-mathematician developed a basis for present-day multiple linear regression and spatial objective analysis. However, he did far more than lay a foundation for statistical analysis techniques that are based on what we now call the normal distribution. He is rightly named as one of the giants in the intellectual history of science. Gauss possessed extraordinary breadth of vision and insight, which he applied to the study of relationships among elements of deterministic systems. From these grew his formulations for estimating parameters from observations that are encumbered by uncertainty. In fact, the work that led to his formulation of least-squares estimation (in 1795) was directed toward the determination of planetary orbits, the scientific objective that inspired developments in statistical analysis of enormous scientific impact.
Francis Galton is next, for his description of the index of correlation in 1888 (Galton, 1890). Galton's work confronted puzzles of heredity, where these were confounded by influences of the nurturing environment. His discovery gave scientific understanding and mathematical expression to observed frequencies of biological characteristics (Stigler, 1989).
Finally we make note of the many contributions of R. A. Fisher, a man who studied quantum theory in combination with his studies of the theory of error. From early in his education, Fisher too was seriously interested in the biological theory of evolution and genetics; and much of his work in the analysis of variance, for which he is best known, was done in direct connection with agricultural field experiments. See, for instance, Box (1987).
All of the above eminent figures in the history of statistics worked in scientific arenas, and their contributions to statistics were inspired by the need for structured inference in the sciences that fascinated them.
Consider, now, that the last of these major contributors worked at the beginning of the 20th century, and that there is a critical difference in the intellectual environments, then and now. Those who have been cited worked at times in the history of science when the perimeter of scientific understanding and formalism was much smaller. In those historic times, from 70 to 300 years ago, it was possible for one mind to be fully immersed in a process of scientific discovery and to know what was known about all its aspects. Scientific questions, and the intellectual energy of quests for their answers, inspired and nourished the earliest development of our field—by individual scientist-mathematicians. Although it is interesting to speculate about that relationship with scientific enquiry and tempting to envy its participants, it is no longer possible for a single mind to grasp the full complexity of many scientific questions.
From where we stand on the historic time line and look to the future, the perspective and opportunities are quite different. Now the "leading edge of discovery" is the edge of a much expanded scientific arena, following a century of technology-driven research. And now, many scientific projects are undertaken by research teams, and each research team is made up of several scientists. Team members will share scientific objectives and common scientific
language, even through they may come from different disciplines. Thus the single mind of great vision has been replaced by a collective mind of suitable broad expertise.
What does a participating statistician need to bring to be a full partner, to further develop and apply the filter? Basic equipment should include a language of science and enough experience within scientific disciplines to appreciate and respect the methods and minds that are working to expand the knowledge bases of these disciplines.
Let us now turn directly to the topic of this symposium: "modern interdisciplinary university statistics education." As you might guess, I am going to rewrite that in terms appropriate to my customer, namely, as "education of future statisticians for interdisciplinary research." However, I am not going to presume to propose a curriculum, even for those future statisticians who will be scientific collaborators. Nonetheless, I do wish to raise some fresh possibilities for discussion. What the foregoing has to say about academic preparation is that we can do a great favor to students who believe they are headed toward interdisciplinary research by encouraging them to study science in parallel with mathematics, philosophy, and (maybe) a little statistics, in pregraduate years. The science content should best fit the student's true curiosity, and the mathematics should support future work on the filter.
For the "future of science" customer, with the statistician/probabilist as a full scientific partner, the complete education program will culminate in a PhD in statistics. Right at the top, that means that the student will complete a full complement of course requirements in statistical theory and methods in graduate school. With this as a "given," the one negative I am going to point out is my belief that the future of a creative, collaborative researcher is not optimally served by narrowing course concentration in undergraduate years solely to statistics courses. A broad perspective with solid grounding in the sciences will serve far better.
Now I am going to change tack a bit and support what I have said by anticipating what may be ahead for collaborative statisticians of the future. There will be the excitement of working at the boundary of science, developing probability and statistics in conjunction with investigations of our physical and biological habitat. And there will be energy and inspiration from sharing explorations of the frontiers of science:
from gaining a scientific perspective,
from accepting the challenge of representing physical and biological systems with mathematical abstraction, and
from comparing its implications with subsequent measurements and observations.
These are privileged rewards of true partnership in interdisciplinary research.
Let me give this substance by describing a particular area. My choice is one that is receiving a lot of attention at present, because it concerns our ability to survive in our physically limited, global environment. It goes under the broad title of "global change research," and it involves a spectrum of critical elements of global sustainability. Among these are population distribution, marine and agricultural productivity, and energy requirements and resources.
Uncertainties in all areas of global change research stem from the limits of resolution of the information provided by global systems, whether they are geophysical, chemical, biological, or social/political. The scientific process of understanding such highly complex systems couples information from recent observations with established theory, for refinement of theory as a
scientific goal, and for refinement of predictability as a societal goal. Thus, a key to reducing the uncertainties is to expand the limits of information resolution—through the filter.
The amount of time-dated data that has been collected and archived from Earth observing systems is truly immense, and the information contained in it is highly complex. Its volume and complexity magnify the challenge to data management and to the structure of scientific inference. In fact, the objectives of scientific inference should provide the framework for management of the data that is dedicated to scientific inquiry. Thus, the challenges fall within the forum where the scientific programs concerned with the environment come together with statistical science. Specifically, they are defined by the need for crafting filter algorithms that will output predictive estimates and statistical decisions with vast amounts of detailed (but noisy) input.
Classical statistical science has dealt with data sets that could be conceptually isolated. This is inadequate for estimation and inference in the larger framework of global change. High-resolution analyses aimed at reducing uncertainty in the inferences of global change research, which can be achieved through refinement of the filter, require bold new work. This work must take the true spatial and time coherencies of global systems as fundamental givens, and apply the philosophy of scientific inference in the creation of new statistical analysis techniques. Thus the requirement is for strong, focused, interdisciplinary teams dedicated to this mission.
This is a critical time in the history of the human use of our planet. Both scientifically and politically, the time is right to take on this task of refining the filter. There are inherent risks because the way ahead is unmapped, and big steps must be taken by collaborating scientists who have not traditionally worked closely together. The work will be challenging — at the conceptual limits of statistical science. The results can bring major refinements to the technology of assessment and predictability. The work will require statisticians who know both the language and the conceptual boundaries of the science.
Thus I conclude that students who wish to work on the filter in the context of 21st-century science need room in their academic programs
to fully explore science,
to achieve facility with applicable mathematics, and
to contemplate the relationship between what we think we know and what we can measure.
Bell, E. T. 1937. Men of Mathematics. New York: Simon and Schuster.
Box, J. F. 1987. Guinness, Gosset, Fisher, and small samples. Stat. Sci. 2:45-52.
Galton, F. 1890. Kinship and correlation. N. Am. Rev. 150:419-431.
Stigler, S. M. 1989. Francis Galton's account of the invention of correlation. Stat. Sci. 4:73-86.