What Academia Needs
Peter J. Bickel
University of California at Berkeley
I have been asked to speak on the topic "Modern Interdisciplinary University Statistics Education: What the Customer (Academia) Needs." My translation of this is: What kind of education would we like our future faculty to have so that they can best train the next generation of statisticians in industry, government, and academia? Or, equivalently, to what training and opportunities should we expose our own PhD students? I arrive at this by considering statistics departments as the customer for new statistics faculty.
As for "interdisciplinary," I think the adjective is redundant. Leslie Kish in his 1977 American Statistical Association presidential address (Kish, 1978) gives a description of our field that makes this point well:
Statistics is a peculiar kind of enterprise of contradictory character because it is at the same time so special and go general. Statistics exists only at the interfaces of chance and empirical data. But it exists at every such interface, which I propose to be both necessary and sufficient for an activity to be properly called statistics. It has a special and proscribed function whenever and wherever empirical data are treated; in scientific research of any kind; in government, commerce, industry, and agriculture; in medicine, education, sports, and insurance; and so on for every human activity and every discipline.
Of course this interpretation is probably not what the organizers had in mind but rather the question, What do our colleagues in other departments want of us in terms of courses, consulting, and collaborative research? But my version is more fun for me at least, and I will necessarily touch on the other version also.
So what is the platonic ideal statistician? George Box's rhetorical description of Fisher (Box, 1976) serves well:
"We may ask of Fisher
Was he an applied statistician?
Was he a mathematical statistician?
Was he a data analyst?
Was he a designer of investigations?
It is surely because he was all of these that he was much more than the sum of the parts. He provides an example we can seek to follow."
The ideal is, as it has to be, far beyond our reach, but we can and should reasonably expect our PhDs to emerge with great strength in at least one of these categories and a serious acquaintance with all.
To achieve this, here is an unrealized and probably unrealizable graduate curriculum for the ideal statistics department. I list topics in alphabetical order because I believe they all should
play some part in the education of a statistician. Which ones are emphasized necessarily depends on the inclination and talents of the individual, the range of talents and interests of the faculty, and the ultimate limitation — time.
Computing, including use of standard packages and development of skills in higher-level languages, use of simulation, statistical graphics, writing of usable software.
Data analysis, descriptive and exploratory, in the context of examples of substantive interest.
Data collection, handling, and quality.
Design of experiments.
History of statistics.
Inferences, both frequentist and Bayesian, model construction, testing, and validation.
Probability theory and probabilistic modeling in areas of substantive interest.
Statistical consulting and actual participation in interdisciplinary groups; breaching the language barrier.
Statistics in the law and public policy.
You may be surprised by my failure to include advanced mathematics courses in say, functional analysis, numerical analysis, discrete mathematics, and so forth. This is not to say that I question the truism that the more mathematics one knows the greater one's potential as a contributor to both theory and applications. Rather, I have tried to list topics with which I believe every PhD in statistics should have a serious acquaintance, wherever she is headed: academia, government, or industry.
Some of you may question the inclusion of statistics in the law and public policy. I can cite higher authority — Mosteller (1988) at p. 94 argues for "… a policy course taught in the regular curriculum for graduate statistics students" — and also personal experience at a recent planning meeting of the National Research Council for a further study on forensic use of DNA typing. The issues that arose involved questions of molecular biology, forensic science, population genetics, statistics, and the law — as interdisciplinary as you can get. It seemed to me of critical importance for the ultimate goal, good public policy, that the participants keep in mind the relative importance of the concerns of their fields and the interplay of these with the concerns of the other fields in the final outcome. I refer you to Mosteller (1988) for a subtle and extensive discussion of the need for policy study by statisticians. I would even argue, and I am glad to see that Jon Kettenring concurs, that holistic thinking is valuable in general in interdisciplinary studies rather than just for questions of public policy. Good scientific and technological enterprises have a goal, be it answering an important substantive question or
developing a product, and it is this ultimate goal that statistician participants (and others) need to keep in mind.
Studying the history of statistics has several functions:
To be exhilarated to see that not only are things muddy now, they were even muddier then.
To recognize the highly multidisciplinary origins of our field in astronomy, genetics, agriculture, economics, engineering, and on and on. That is why I believe "interdisciplinary" is redundant when coupled to "statistics."
To see the emergence, submergence, and re-emergence of ideas as new types of data appear, as technology transfer occurs from one field to another, as old concerns are raised anew.
Stigler (1987) is a wonderful starting point, but we need more histories of the Fisherian era and the impact of the computer on our field.
You may have noted that in my references to data analysis and probabilistic modeling I referred to areas of substantive interest. By substantive I do not mean simply of interest to practitioners in some field. It is important to fire the imagination of our students by presenting methodology and analyses, in the context of data whose background is clear, that address well-defined questions that can broadly be perceived as important. At a supposedly elementary level Freedman et al. (1992) is full of such examples. At a more advanced but specialized level, Fitting Equations to Data by Daniel and Wood (1971) has such examples. The space shuttle failure analysis by Dalal et al. (1989) is another fine instance. The Boston housing data (Harrison and Rubinfeld, 1978), to me at least, does not fill the bill.
To what extent do we have such a curriculum at Berkeley? Bits and pieces at best. We have, I believe, excellent courses in statistical computing, general inference, high-level probability theory and a number of excellent "tool box" courses in applied statistics. My colleague David Freedman runs a course on critical analysis of historical papers of great substantive importance such as Snow on cholera (Snow, 1936) and Berkson on smoking and lung cancer (Berkson, 1955). We have a long-standing student consulting service that attracts graduate students and some faculty from many fields, particularly the biological and social sciences. These consultations sometimes result in longer-range interdisciplinary collaborations. A similar pattern is obtained through a summer placement program in industry. Finally, there are a number of ongoing placements of students in interdisciplinary collaborations in biology, engineering, astronomy, and so forth. These tend to come about through various haphazard faculty and student contacts.
Somehow it all does not hang together as well as it should. Certainly we have not built in requirements that ensure that all our students have some exposure in all nine aspects listed above. A major problem is time. All our courses are demanding. It is unreasonable to expect a student to take more than two to three a semester. And then the thesis looms. However, I believe there are solutions, for some of which we have examples. For instance, I would like to see courses divided into a number of shorter units, possibly taught by different instructors
with perhaps some from industry and government with separate unit emphases on context presentation, modeling, data analysis, inference in this and similar contexts, and so on. Theses sometimes do, and certainly more can, have interacting general theory and application to substantive data parts. More mechanisms for facilitating interdisciplinary contacts of students and faculty can be constructed, and more are developing — for instance, the National Institute of Statistical Sciences.
To some extent, I find myself in the position of the comical Polish colonel in a movie some of you may remember, "Me and the Colonel." In this film the colonel and Jacobowsky (Danny Kaye) find themselves in a cafe in a French village during World War II carefully watched by the Gestapo who are only waiting to pick them up after they make contact with the Resistance. The colonel outlines their goals, eluding the Gestapo and getting to a nearby beach where an English submarine is to pick them up. Jacobowsky replies, "My dear colonel, this is all very fine, but how do you propose we do this?" The colonel turns to him in surprise and says, "But my dear Jacobowsky, I have outlined the strategy … the tactics I leave to you.'' I hope the rest of this meeting will provide the tactics.
Box, G. E. P. 1976. Science and statistics. J. Am. Stat. Assoc. 71:798.
Berkson, J. 1955. The statistical study of association between smoking and lung cancer. Mayo Clin. Proc. 30:319-348.
Dalal, S. R., E. B. Fowlkes, and B. Hoadly. 1989. Risk analysis of the space shuttle. J. Am. Stat. Assoc. 84:945-957.
Daniel, C., and F. S. Wood. 1971. Fitting Equations to Data. New York: Wiley.
Freedman, D., R. Pisani, R. Purves, and A. Adhikari. 1992. Statistics. 2nd ed. San Francisco: W. W. Norton. 550 pp.
Harrison, D., and D. L. Rubinfeld. 1978. Hedonic prices and the demand for clean air. J. Environ. Econ. Manage. 5:81-102.
Kish, L. 1978. Chance, statistics, and statisticians. J. Am. Stat. Assoc. 73:1.
Mosteller, F. 1988. Broadening the scope of statistics and statistical education. Am. Stat. 42:93-99.
Snow, J. 1936. Snow on Cholera. New York: Oxford University Press. (Reprinted, 1965, New York: Hafner Press.)
Stigler, S. M. 1987. The History of Statistics (The Measurement of Uncertainty Before 1900). Cambridge, Mass: Belknap Press of Harvard University Press.