Modernizing Graduate Programs in Statistics—Case Study
Prem K. Goel
Ohio State University
First, let me thank CATS for inviting me. About two years ago this topic arose at the summer CATS meeting, and there was a lot of enthusiasm about doing something in this direction. Recognizing that change happens very slowly, if we start working toward it now, maybe the curriculum changes will be in place in most programs by the year 2000.
I will try to avoid repetition of ideas, although many of my perspectives have much in common with those of previous speakers. I will concentrate on graduate curriculum, not because undergraduate programs are unimportant, nor that service courses are not important for the future of statistics, but because when I thought about this symposium in those initial CATS deliberations, I was mainly thinking about the future graduate curricula in statistics. Later, I will say something about what we have done in terms of service courses at Ohio State University (OSU). I will not give details, but rather will describe what we are doing in general.
Earlier this year while visiting Cleveland, I read an interesting story in USA Today about total quality management (TQM) awards given by Rochester Institute of Technology for the best TQM project in industry. The story described a paper mill company in southern Louisiana that won the award. Their motto is "In God We Trust; All Others Must Have Data." They developed this motto because they had previously had many problems in their plant, and realized that the only way they could learn about what was happening in the plant was by going out in the woods and collecting data. The perceptions of the managers were nowhere close to what was actually happening in the plant and the production process. That is where the motto came from. But as far as statisticians are concerned, the problem is, What if you do not know how to use the data?
The reason we ought to be doing ever more interdisciplinary education in our programs is, as many people here have already noted, that our science needs the nourishment provided by interactions with other sciences. The development of statistical science is driven in large part by problems arising in other substantive disciplines. Today's customers, industry and the government agencies, demand that our master's and PhD degree holders be able to work well in research teams. That means we have to design our curricula to meet such demands and instill such qualities.
The three main users of our product, academia, the government, and industry, are all demanding that our students become super-statisticians. All want individuals who have strong theoretical knowledge in statistics and probability; who have strong statistical computing and simulation skills; who know how to model substantive complex problems; and who are adept at using a wide spectrum of the methods in the ever-growing data analytic tool box in statistics; who have strong communication skills; who are nimble problem solvers; who are able to formulate real problems in interdisciplinary projects and not lose sight of those problems; who are well trained in the art and science of consulting; and many more things. We are talking about producing a student today who knows everything about everything, and that is not easy to do.
It is clear to me that what we have been doing in statistics education up to today has to change because the world has changed around us. It is no longer the environment it was 25 years ago when the last major curricular changes took place in statistics. I am not as sure as to how we should change. We at Ohio State are still experimenting and learning what works and what does not, as are many other institutions doing similar things.
The need for change is equally true for undergraduate education, and not only for graduate education. Changes are also needed in undergraduate service education, but it is much easier to reinvent or redesign service courses partly because there is little vested interest in the existing service courses. That is not true, however, at the graduate level, where some people have been teaching courses for the last 20 years. If you now tell them, ''You have to change your course because it is not relevant anymore," many of your colleagues may balk because it is in their nature to be conservative. This imbues the system with an opposing inertia that simply cannot be changed overnight. Change does not come easily and does not happen by itself. As Ed Rothman said, we must take the attitude that if we can change even ourselves, perhaps that would be a contribution to the profession. Hopefully, everybody will over time re-examine and implement the changes that ought to be made in statistics teaching and research.
Most statistics departments are not as fortunate as is Carnegie Mellon's. At CMU there is a fairly homogeneous group of people, and they have found a common purpose. Many older departments have people with very established ideas. For a department chair or a curriculum committee chair to work with those people and gain their support to look at curriculum very carefully, redesign some courses, drop some courses, and create some new courses, is a fairly tough job. It is especially so when the reward system does not reinforce doing that.
Basically, the reward system is predicated on how many papers one published last year. But if that is all that is rewarded, faculty members do not want to put much time or effort into teaching, partially because there is not much benefit in it for them. As departments, we have to think about how we can change our reward structure so that teaching becomes an important activity on a day-to-day basis. Such changes will happen because there is demand by the customers for us to change.
The ideas presented by the other symposium speakers merit serious discussion. What is important is for us to go back to our own departments and start talking about these things with other colleagues, and find ways to assimilate some of these ideas into our own programs. Again it will be a very slow process, and will not be easy. However, if we can make one convert at a time, perhaps we can succeed.
For the last four to six years, the OSU statistics faculty have been discussing what is wrong with what we do in the classroom, how our students learn or do not learn, and whether or not they are able to put their knowledge together in solving problems. The latter is one area in which we notice particular difficulty. Sometimes after a student has taken the mathematical statistics sequence, the probability sequence, and all the applied courses, and is then given a problem in a consulting setting to solve, he or she do not know what to do with it. Some of our faculty believe that may be caused by the way we teach the courses. We teach all these methods courses in isolation and do not build one course on the other. That may be why the students are not learning how to synthesize.
Many trained statisticians act more like technicians rather than scientists, and it may be in part because various methods courses are taught in isolation from each other. Phillip Ross remarked that if you have a hammer, you view everything as a nail; we may have to change some of our teaching to address and counter this tendency. Edward Rothman described statistics as technology and not as science or an art. I certainly do not agree with that, even though Fisher may have said it, because those may have been different times. If you want to be a full partner in the future development of science, you have to be a scientist today, and not just a technologist, because scientists will not accept you as a full partner if you are merely going to be concerned with a little piece of the problem. To work as full partners with scientists, we have to act like scientists and learn like scientists, too.
I believe that our students do not learn how to solve substantive problems, but are essentially looking for a tool that is already in their tool box to fit to a problem. That may be why there is so much emphasis on linear models when many real-world problems are not necessarily close to linear models. Even when dealing with nonlinear models we seem to have missed the boat completely, because we have simply tried to look at the series expansion of the nonlinear expression, and tried to fit a linear model; that is, basically, we do not focus on what actually is taking place. This context of solving substantive problems is one in which things need to change in our teaching.
To address this problem, the OSU faculty agreed that something needed to be done in, as a start, at least the PhD curriculum. The master's program got second priority because its functioning was viewed as acceptable. It was decided that a new course would be developed. In so doing, we have made substantive changes over the last four years in the Ohio State statistics PhD program. First, many graduate students are needed to teach in their very first quarter at OSU because the new resources recently obtained from the university for teaching the general-curriculum statistics courses require a large number of teaching assistants. Consequently, we support about 75 teaching assistants in our program. These people are needed in the classes in the first quarter. With the help of the administration, we arranged for these people to receive training in the summer quarter. We asked them in the first year to come to the university in the summer quarter. We support them for this with funds provided by the university. In that summer quarter, they learn how to teach laboratory courses, how to use the Data Desk or Minitab or whatever other tools or concepts will be discussed in the service courses. They also take what is basically a refresher course in mathematics, because many of them forget calculus. As undergraduates, they take the calculus in their first or second year and thereafter do not use calculus (in contrast to engineering students). So by the time they arrive in their first year of the graduate program, they have forgotten their calculus and do not know such things as how to transform variables, even with only two variables.
When they come, we ask them what courses they have taken, and what concepts they remember, and then we try to fill the gaps in that summer quarter so that, on the one hand, they are ready to take on the teaching assignments and, on the other hand, they are also ready for the first mathematical statistics course that is given in the fall quarter.
The change in our program is not in terms of the requirements, but rather in how the methods courses are presented in the first year. The students are still required to take a three-quarter sequence in a probability-mathematical statistics course and a three-quarter sequence in the first year in real analysis. That is sometimes deadly, because by the time they have taken
the third course in real analysis, they might decide not to continue the program because they thought that they had not come to learn real analysis. However, they have to go through the program, and sometimes they are taking this course with all the math majors, too. Sometimes they cannot compete as well, or, not knowing that they will need it in the second or third year, they lose sight of why they are going through the real analysis sequence; they may decide to convert to a master's program and go out to work. Sometimes it is easier to get a master's degree and go get a better job than to put in six years for a PhD program and not know what will happen afterward; with today's job market in academia, we lose some of the students to a master's program, and, at least in the short term, they think they are better off.
The biggest change in our program was the following: In the PhD program, students used to take all the methods courses in the first year, such things as regression, analysis of variance, design of experiments, time-series analysis — you can as well make the list. Second-level courses were for master's and PhD statistics students, and also for some of the PhD students from engineering and other departments.
We decided to introduce a first-year sequence called Introduction to Statistical Practice. That is a three-quarter sequence of courses in which students learn about these methods through real problems. Two years prior to the introduction of the course, before it was implemented, we asked the faculty which of them would like to help team teach if there were to be a course like that. Three or four people volunteered. We then gave them some release time from their teaching duties in those two years prior to the introduction of the course to develop materials, to look for data sources and find real data, to think about how to structure the course, and things of that sort. This was to avoid having all the students in year one being guinea pigs.
The approach is to teach substantive problem solving through various large and small data assays. The class format is one of less lecture and more open discussion. Groups of five students are formed who will work together during the whole quarter. More attention is paid to asking questions and identifying what the real questions are, than to just solving the problems.
All the things we have been hearing the last two days — formulating the problems, identifying what the questions are, raising questions about the data, finding out what the data are about, determining how the measurements came about — are done in this course. This permits the first discussions in this sequence to address the scientific method, team work on formulation of problems, the art of raising questions, and the use of computing tools. Once they have looked at a problem and the questions have been formulated, then the instructors say, "If you want to solve this problem, how do you answer the questions raised in this discussion using appropriate tools?"
That is how we introduced the methods in this statistical practice course. Methods are not done in great detail in the course because there are too many things to discuss: about 12 real problems had been picked through which they were trying to introduce methods. Consequently, these students do not necessarily learn all about regression modeling, or all about detecting outliers, because they cannot go into that kind of detail in a course that is to give them an overview of the discipline and an overview of how to solve problems.
The idea is that once they have taken this course in year one, and have also gone through the mathematical statistics sequence in year one and have taken the probability sequence in year two, they will start taking some of the advanced topics courses in year two and year three in which they can choose, if they want, to learn multivariate or linear models or design of
experiments or nonparametrics. They can choose which topics they want to learn more about. Hopefully, in this first-year Introduction to Statistical Practice course, the will develop enough curiosity about learning some of the things on their own, or will learn some of these things when they are taking a consulting service course, which is also part of the program.
Every student must take two quarters of a consulting course. The first quarter presented in the classroom includes discussion of the art of consulting, as well as how to deal with the client, report writing, and communications. In the second quarter each student gets involved in a real consulting project that is substantive in nature; students work for almost two quarters with one client on a project.
When you run a consulting service, you cannot tell clients their problems are not important. Each person's problem is important to him or her and therefore should be to you, too. Some of the problems that come to the consulting service are of the kind that can be handled in a couple of meetings; in such cases, we hope that the next time around, if clients have had a good experience from getting this help from us, they will return with a much more substantive project. That is how students also learn to get involved in both short-term and long-term projects through the consulting service.
That is the small innovation we have tried to implement in our graduate statistics program at OSU. The statistical practice course first covers problem formulation, and then techniques to solve problems are introduced through that process. In this give-and-take when the students are tackling these problems every day, five people work with each other. They return to the same techniques several times, because some of the things are used in more than one problem. The focus is taking a synthetic approach to scientific problem solving, not merely introducing methods or a list of methods in the course. Sometimes we have to team teach these courses because one or two individuals may not be able to cover the spectrum of statistics in an advanced applied course. I like this idea of team teaching. I think that bringing people together to team teach these new courses is one way to induce some of the desired changes.
Currently, two people who are experienced in real-world data analysis are teaching the course for the first time. They learned a few things this first year. First, first-year graduate students often want everything to be very structured, perhaps partly because it is such a big transition from an undergraduate program. They want to know what the course is going to cover and how grades will be determined. Grading is very important to them, and if you tell them, "A grade is not important; what is important is the material you should learn from the course," that does not resonate very well. That is especially so if they think you are trying to be vague about the grading system. This was in part the reason that in the first quarter of the course, many of the students were not happy with it. The course material focused on four problems; at first they merely discussed the problems, and students would say, "I have not learned any method yet; what is going on?" It did not entirely satisfy them to be told, "You will learn methods later; we first have to discuss how to ask the right questions before we address how to solve those problem."
This made the first quarter very difficult. Students were complaining that they were not learning anything that they had expected in terms of, say, regression modeling. In reply, they were told that the first quarter introduces computing and discusses how to raise questions, and the second quarter presents how to solve those problems. We have made a commitment to the two faculty members that they will be teaching the course for the next two years and thus
learning from their own experience. They will be talking to other faculty members about it. At the end of two years, they will hopefully have gained experience and understanding that can be further discussed as a case study.
We at OSU consider the consulting service not just as a source of revenue for the department but also as a source for problems. Students who are assigned to substantive projects sometimes also end up getting a thesis problem out of the experience. There were about seven or eight theses in the last four years that came out of projects out in the field, where faculty members were involved on a long-term basis. These are instances of students getting involved enough to extract a problem that was good enough for a PhD thesis, and thus contribute to methodology development. I think it is very important to have a consulting service in any statistics department's program.
There are often difficulties in running a statistical consulting service. Not every faculty member is supportive of the effort because, for instance, some may view consulting with clients as beneath their dignity. That happens in many of the bigger departments. What the department's purpose is may not be agreed on by all the faculty, because their depends on what the faculty think is important and on what they perceive as the basis on which they will be reviewed for promotion and tenure. I believe that the six-year time line for an assistant professor's getting promotion and tenure is sometimes detrimental to the interdisciplinary nature of statistical work. It can take several years for a person to become productively involved in some of the big problems; if other department members do not see any publications from such a person for two years, it can raise concerns over what is happening with that person. I think our job is not only to convince the deans and the provost, but also to convince ourselves and our colleagues that patience is needed. If faculty do not publish something in year one or year two, but we know they are very deeply involved in something that is going to produce results three years down the line, we have to keep that in mind and take it into account, or else we will discourage that kind of involvement for our junior faculty. Junior faculty sometimes need to be protected against that, because if they invest their time doing interdisciplinary work and get "passed over" for not publishing enough papers, it will be a major problem.
We also try to place many students in summer internships in industry and government and have had students go to several companies in Cleveland, Cincinnati, and Columbus. Students have been placed in the Bureau of Census and Bureau of Labor Statistics, and in other government agencies. When they return, they are very pleased because they have learned many things they would not have learned at the university. Internships are very important.
I want to appeal to our industrial colleagues to offer more opportunities for graduate students to be interns, and perhaps also offer faculty exchange programs. If those opportunities are not provided, despite all the talk we will not achieve what we want to achieve in interdisciplinary education. Industry must also be a full partner in the process, because otherwise it will not be possible to implement these interdisciplinary facets.
Before I close, I must acknowledge that some problems exist in the basic statistics educational curriculum at Ohio State. One is that not all faculty members are convinced of the effectiveness of this modern interdisciplinary approach. The problem is that some people do not ever want to change; they simply have developed their habits and that is it. Consequently, any attempt to change that faces opposition. Such people do not believe in this interdisciplinary approach because they never learned it that way yet can solve problems now, and so they
wonder why students now cannot do the same thing that they did when they were graduate students and then develop over time. However, as Jon Kettenring said yesterday, industry does not have the time or resources to provide on-the-job training, and in previous times there was not this kind of pressure on the outgoing graduates to know everything when they get out. Things have changed. With so many more people in the statistics profession, perhaps supply is overtaking demand, and we will therefore have to be more cognizant about what our students should know before they graduate.
Another problem is the vested interest individuals have in the courses they have been teaching for the last 20 years. It is sometimes a big problem because if they are suddenly told, "By the way, we will not offer this course any more because it is not important," their reply might be in effect, "You cannot do that to me; I am a senior faculty member in the department, and you had better take care of me or I will create trouble for you." That is a somewhat facetious possible reaction, but there will be problems of that kind arising when you want to change curriculum. One has to work with everybody, decide on what is important, perhaps put some priorities on certain aspects, and try to change the course so that it incorporates the new higher-priority things, rather than simply cutting the course. It is important to avoid hurting people's egos.
One problem with the Introduction to Statistics Practice course is that no book is available yet. We intended to use two or three books as a collection to be read by students, but having students buy three books for a course is expensive these days. So we have put a lot of data sets on our own departmental machine that students have access to, and for now we provide a lot of handouts for this course. The hope is that after three years, the professors who developed the course will have a course format that can be made available at least in manuscript form for other people to use.
There is also a problem in that some of the students do not seem to be comfortable with a very unstructured kind of course because they are accustomed to a more traditional, seriesmode teaching style that presents a problem, the solution, the next problem, and the next solution, rather than an appeal to try to learn how to formulate problems first. Making that change is not an easy thing for some students to do, and they do complain about it.
Still, the biggest obstacle is that a substantial number of faculty members do not want to work on particular private projects. They feel that theory is more important than are applications, and question why they should "waste their time" working with data. That, I would venture to guess, is a problem in many big, established statistics departments. Reiterating, to solve that problem, we will have to think about the reward structures. At Ohio State faculty are informed that those who help in the consulting projects have something in it for them. Those projects are usually funded ones, and when the money comes to the consulting service it basically is money that can be used for whatever educational purposes we want. It is not taken back by the administration, and so it is a department resource that can be used to buy some books or provide an extra trip to go to a meeting for those who help out in the consulting. We hope that through these kinds of little inducements we can attract more faculty. Some of them are very willing to help on any problem; others simply do not care about consulting. We believe that, over time, through the recruitment of more and younger faculty, this problem of not wanting to participate will diminish on its own.