Modernizing Statistics PhD Programs

John Lehoczky

Carnegie Mellon University

These are some thoughts on modernizing PhD programs in statistics to include more training in cross-disciplinary research. My intention is to provoke discussion by presenting some specific suggestions and raising issues that must be considered in any substantial overhaul of statistics curricula. I am focusing my remarks on the MS and PhD curriculum. Although the undergraduate curriculum is very important, there are so many differences between universities in terms of the student bodies they serve and the contexts of their programs that it is difficult to discuss undergraduate statistics education within a single framework. Nevertheless, if substantial cross-disciplinary training is introduced into the graduate curriculum, this will naturally offer opportunities for advanced undergraduate student involvement, and this will certainly enrich any undergraduate statistics program.

The Future University Environment

The question of modernizing statistics education cannot be addressed without first recognizing the significant forces that have taken shape in the last several years. These forces will have a profound effect on universities and their statistics departments. Some of these, such as the very difficult economic climate within which all institutions are forced to operate, are (it is hoped) only temporary. Others may be more long lived and can have important impacts, both positive and negative, on the field of statistics. They include:

  1. A renewed emphasis on undergraduate education. New assistant professors should have some training in teaching, and current faculty must pay far greater attention to this aspect of their duties.

  2. An increasingly negative connotation in the public's mind to "university research." This is seen as part of the decline in the education of students, and universities are seen as irresponsible institutions because of the very large increases in tuition costs and scandals associated with overhead recovery.1

  3. Defense conversion and a strong national emphasis on economic competitiveness, health care cost reduction, the environment, efficient manufacturing, and so on. All of these

1  

The recent article of Peter J. Denning, "Designing new principles to sustain research in our universities" (P. J. Denning, 1993, Commun. ACM 36(7):99-104), forcefully argues these points. Indeed, he addresses the modernization of the computer science curriculum, and his article offers useful parallels to those thinking about modernizing statistics curricula.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium Modernizing Statistics PhD Programs John Lehoczky Carnegie Mellon University These are some thoughts on modernizing PhD programs in statistics to include more training in cross-disciplinary research. My intention is to provoke discussion by presenting some specific suggestions and raising issues that must be considered in any substantial overhaul of statistics curricula. I am focusing my remarks on the MS and PhD curriculum. Although the undergraduate curriculum is very important, there are so many differences between universities in terms of the student bodies they serve and the contexts of their programs that it is difficult to discuss undergraduate statistics education within a single framework. Nevertheless, if substantial cross-disciplinary training is introduced into the graduate curriculum, this will naturally offer opportunities for advanced undergraduate student involvement, and this will certainly enrich any undergraduate statistics program. The Future University Environment The question of modernizing statistics education cannot be addressed without first recognizing the significant forces that have taken shape in the last several years. These forces will have a profound effect on universities and their statistics departments. Some of these, such as the very difficult economic climate within which all institutions are forced to operate, are (it is hoped) only temporary. Others may be more long lived and can have important impacts, both positive and negative, on the field of statistics. They include: A renewed emphasis on undergraduate education. New assistant professors should have some training in teaching, and current faculty must pay far greater attention to this aspect of their duties. An increasingly negative connotation in the public's mind to "university research." This is seen as part of the decline in the education of students, and universities are seen as irresponsible institutions because of the very large increases in tuition costs and scandals associated with overhead recovery.1 Defense conversion and a strong national emphasis on economic competitiveness, health care cost reduction, the environment, efficient manufacturing, and so on. All of these 1   The recent article of Peter J. Denning, "Designing new principles to sustain research in our universities" (P. J. Denning, 1993, Commun. ACM 36(7):99-104), forcefully argues these points. Indeed, he addresses the modernization of the computer science curriculum, and his article offers useful parallels to those thinking about modernizing statistics curricula.

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium topics, and many others, are very large, cross-disciplinary problem domains that present tremendous opportunities for the statistics profession. Industry's increasing attention to quality. Statistics is one of the important disciplines underlying quality control and quality improvement, and so statistics should be of increasing importance in engineering and business curricula. This represents a significant opportunity and responsibility for the statistics profession. Industry's increasing need for technically trained employees who can be effective contributors to cross-disciplinary teams, and who also have strong oral and written communication skills. A Major Dilemma The focus of this conference is modernizing statistics education in the sense of fostering the inclusion of effective cross-disciplinary training. It is also clear that an increase in cross-disciplinary training can be of great benefit to the field of statistics. Through cross-disciplinary teamwork, statisticians can make important contributions to national problems, can identify problems that will expand statistics' core research agenda, and can increase statistics' audience and funding base. Thus it would seem that the only issue is to find and develop ways to expand cross-disciplinary training in the statistics PhD curricula. However, in thinking about how to foster cross-disciplinary training in statistics PhD programs, I was forced to think about all the material and tools our students must master to be effective statisticians, whether they participate in cross-disciplinary teams or not. As a graduate student 25 years ago, I always felt frustrated that there was an enormously large body of material that I had to learn, far more than could be learned over the course of a PhD program. The last 25 years have brought stunning developments that have greatly increased the amount of material our students must master in statistics and probability to be considered literate, not to mention the computer skills that must be mastered. Pull the latest issue of JASA or The Annals of Statistics from the bookshelf and scan the table of contents. It will show a whole new world of material unknown 25 years ago, using words such as "bootstrap," "Bayes-empirical Bayes," "CART," ''projection pursuit," "ACE," "MARS," "Gibbs sampling," and so on, not to mention the renaissance of experimental design work spawned by new industrial statistics problems, or the tremendous growth of biostatistics both in applications and methodological advances. Statisticians are attacking problems an order of magnitude harder than those discussed 25 years ago; the wide interest in image restoration is an example. The developments are just as exciting (and equally daunting for graduate students) in probability theory. A student wishing to pursue probability theory must master texts on Brownian motion and martingales, and there is now a seemingly limitless amount of material to master on random fields, point processes, and so forth. The interesting aspect is that many of these advances have been spurred by applications. For example, the problems of image restoration, option pricing in finance, survival analysis, statistical mechanics, modeling large-scale distributed computer and communication networks,

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium flexible manufacturing systems, and material fracture modeling have helped to bring about major advances in the many areas associated with probability theory. There are two points: (1) cross-disciplinary studies have consistently led to a broadening and deepening of probability and statistics by posing new questions and bringing about new insights, and (2) the volume of material that a graduate student must master is ever increasing. While the first observation should harden the resolve to find formal mechanisms to help train statistics PhD students to effectively contribute to cross-disciplinary studies, never forget that there is an ever increasing body of material that students are expected to master to be literate statisticians and effective contributors to cross-disciplinary teams. The challenge is not only how to increase cross-disciplinary training, but also to find a way to do it that does not compromise the knowledge and skills a statistician must have to effectively contribute. In addition to the strong computing skills that are essential to being an effective statistician, students are now also expected to have the ability to make confident and articulate presentations, not to mention having good written communication skills. No one would doubt the advisability of possessing these skills, but the question is how to produce such super-statisticians in a finite (even relatively short) amount of time. I do not want to see statistics PhD training moving in the direction of complete specialization. It is common in other sciences for training to be very narrow, but this seems to me to be contrary to the spirit of cross-disciplinary collaborations. CMU's PhD Program I want to describe the Carnegie Mellon University PhD program in order to present some ideas of how we include cross-disciplinary training into our program, to point out strengths and weaknesses of our approach, and to highlight areas where difficulties might be encountered by departments wishing to develop this aspect of their PhD program. Before giving such a description, two points should be made. In my view, true cross-disciplinary activities require the statistician to learn the language of the other discipline and understand the fundamental problems of that discipline. This is in contrast to working with a subject matter expert who translates the problem into the language of the statistician, often into a fairly precisely defined problem that is recognizable to (and perhaps solvable by) a typical well-trained statistician. I believe that the most important statistical skills in cross-disciplinary investigations involve structuring the questions to be asked and developing the methods of inquiry, as opposed to being able to pull an especially appropriate statistical procedure off the shelf. The important activities must be carried out with an understanding of the other discipline, its issues, and its methods, rather than in the language of statistics. For this reason, having each student undertake a consulting lab experience may or may not suffice, depending in part on whether such an experience results in a strong interaction with a subject matter expert over a sustained time period and leads to a good understanding of that field, its language, and its problems. Problems that are fairly precisely defined in the language of statistics do not lead to effective training for cross-disciplinary teams. It is also vital that there

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium be incentives for the student to invest the time and effort that will be required for a strong cross-disciplinary training experience. At CMU, the faculty agree strongly about certain important aspects of PhD training. It is recognized that each student has individual strengths and weaknesses; however, every student who receives a PhD degree must have a solid mastery of basic probability and statistical theory, be knowledgeable about a wide range of statistical methods, be experienced at handling statistical applications, and be capable of computing effectively. The faculty is unified in its view that ALL students must meet the spirit of these requirements. There are no faculty arguments on whether to relax or even waive one of the standards because a student is so gifted in another area. All these abilities are necessary ingredients to functioning as a PhD-level statistician. Students with a solid undergraduate preparation usually receive a PhD within 4 calendar years, including summers. The broad outline of the program follows. Year 1 (MS Program) Fall Semester Perspectives in Statistics (0.5 semester) Statistical Computing (0.5 semester) Intermediate Statistical Inference (1 semester) Intermediate Probability (1 semester) Regression (1 semester) Spring Semester Design of Experiments (1 semester) Discrete Multivariate Analysis (0.5 semester) Continuous Multivariate Analysis (0.5 semester) Time Series I, II (1 semester) Applied Bayesian Methods (0.5 semester) Statistical Practice (0.5 semester) Approximately 50 percent of the curriculum is theoretical and 50 percent focuses on the practice of statistics. The students must pass two exams, one that involves doing problems on the more theoretical material and one in which the students must demonstrate competency in data analysis. Some of the courses on applied subject matter involve applied problems that arise from faculty research. Also, students are expected to write reports. Among the courses listed above,

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium two could be considered to offer some training in cross-disciplinary activity: Perspectives in Statistics, and Statistical Practice. Perspectives in Statistics The Perspectives in Statistics course lasts for seven weeks and is intended to serve as a relaxing, fun introduction to graduate student life in the CMU department. The initial sessions serve as orientations that introduce the students to each other, to the department, and to the facilities on campus. One session is spent on the "grading exercise," discussed in greater detail in the Training in Teaching section, below. The course also offers parallel sessions on computing, an introduction to CMU facilities, and an introduction to a series of standard statistical packages (Minitab, S, SAS, BMDP, and so on). The heart of the course is a series of one-hour lectures presented by each faculty member. The lectures allow the students to have contact with the entire faculty. More importantly, they allow each faculty member a chance to present a personal view of what is interesting in statistics. The only rule is that each lecture must be at a low enough technical level that it can be understood by this group of graduate students. About half of the lectures are on some identifiable area of statistics (for example, design of experiments, statistical graphics, or decision making), while the other half are on applications (for example, clinical trials, statistics and the law, the census, stock and option pricing, and Bayesian paleoethnobotany). This introduction provides important contexts that exemplify the department's broad view of the field of statistics and the role of statistics in many different kinds of problems. Statistical Practice The course on statistical practice introduces the students to consulting and interacting with clients. In the major activity, pairs of students work with a client on a real problem. For example, one pair might work with an analyst in the CMU planning office to develop insights into the undergraduate dropout rate, or why applicants decide to attend schools other than CMU.2 There is a wealth of potential projects and willing subject matter participants, because the statistics faculty have many ongoing collaborations with other researchers at CMU or at the University of Pittsburgh. This course also covers material on report writing and presentation skills, and the students must make formal presentations. Thus, this course begins the teaching of cross-disciplinary research skills, but the course is far too short, and the project involvements are too brief and therefore too shallow. 2    Statistics departments would do well to recognize that they can help to solve the problems that their own universities face. Many problems such as forecasting research revenues, student enrollments, and dropout rates are wonderful examples for projects. Statistics departments are uniquely positioned to offer such assistance, and successful contributions to important university problems will cause administrators to gain first-hand experience about the importance of statistics.

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium Year 2 Advanced Statistical Inference (1 year) Advanced Probability Theory (1 year) Advanced Data Analysis (1 year) Electives (1 year) The first two courses in advanced statistics and advanced probability are common to most PhD programs in statistics, with perhaps the only distinction being that the inference course usually contains material on the foundations of statistics and Bayesian theory. Students must pass qualifying examinations in both subjects. The advanced data analysis course is one in which the students tackle a true cross-disciplinary experience. The course has two major components that run throughout the year. One is a discussion of different types of data analysis problems, tools, and techniques. The students are assigned various topics (for example, ACE or MARS), read some relevant literature, especially examples of the use of the technique, and present the material to the class. The second, and perhaps most important, component consists of each student engaging in a major project involving an applied problem. The project is supervised by a three-person committee, one member of which is often a faculty member from some department other than statistics at CMU or the University of Pittsburgh. The faculty member in charge of the course also serves on the committee. The students write a major report, and all students make 30-minute presentations to the entire department on their projects. These presentations consume approximately two afternoons in May, and the faculty has been very pleased by the high level of achievement and the polished performances given by the students. Participation in a project over a six-to eight-month period provides a strong introduction to true cross-disciplinary activity, and discussions with other students who are doing different projects and having different experiences broaden the training. The committee arrangement facilitates the training in several ways. First, it widens the number of projects available to the students. If a single faculty member were to be in charge of all projects, they would likely all come from a single relatively narrow discipline. This way, the students get to indirectly experience other students' activities in unrelated fields. Second, it also obviously spreads the burden of this training across the department. Third, the wide faculty involvement in the advising and in attending the presentations reinforces to the students that faculty members are serious about this aspect of their training. Years 3 and 4 The third and fourth years of the PhD program focus on dissertation research. Students select an advisor and begin reading and doing research in the summer after their second year. This results in a thesis proposal that is presented during the summer or early fall after their third year. Finally, the hope is that students will defend their dissertations in the summer of their fourth year. Students also have involvements in department research projects. Because our faculty have many cross-disciplinary research projects, this inevitably provides a variety of additional cross-disciplinary research experiences for the students.

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium During these last two years, students also take advanced courses. While a wide array of choices is offered, it is clear that these cannot be sufficient to ensure that the student is literate in all major aspects of statistics and probability. There is far too much material to cram into the remaining two years and still have time for the student to develop the research skills and research products associated with a good dissertation. There is a weakness in the lack of assurance that all students take advanced courses of a traditional sort in standard topics such as non-parametric statistics, multivariate analysis, sequential analysis, and so on. The tendency is to instead have students take courses that require them to read the literature, possibly in some focused area. The department does a fairly good job of discussing the foundations of statistics, and Teddy Seidenfeld offers a course on reading Fisher every other year that is greatly appreciated by all the students. In one very exciting advanced topics course that will be offered next fall, titled "Oldies but Goodies," two faculty members and the graduate students will be reading classic, landmark papers in statistics. There also are workshops in special areas (Bayesian statistics, biostatistics, and industrial statistics, for example) at which research problems are discussed in an informal atmosphere, and outside speakers are invited who present problems. In biostatistics, for example, researchers from the University of Pittsburgh Medical School frequently come to present problems, always preceded by a lecture on the relevant medical background. More recently, working groups consisting of faculty and students have sprung up for short periods of time to study some specific topic such as spatial statistics, Gibbs sampling, or graphics. As one who took a large number of advanced topics courses as a graduate student, I have always felt uncomfortable with the lack of required traditional advanced topics courses such as non-parametric or sequential analysis. Still, I believe that CMU students become very capable of learning on their own, which, after all, may be one of the most important skills a PhD statistician can possess. Training in Teaching Training in teaching and in written and oral communication is one final aspect of graduate training that is worth mentioning. This begins on the second day of Perspectives in Statistics with the grading exercise, and continues with more formal training and monitoring at the end of the first year. A few of our graduate students serve as recitation instructors for one relatively large undergraduate core course given during the year. True teaching experiences come during the summer in which all summer school offerings are taught by our graduate students. CMU has a very active teaching center, and the use of graduate students as instructors is very tightly controlled. All students whose first language is not English must be certified by our English as a Second Language (ESL) Center. Students must have received and passed teacher training. This entails mastering materials on course planning and syllabus construction, lecturing skills (including watching videotapes of good statistics graduate student and faculty lecturers), motivating students, grading and student evaluation, and so forth. The instructors are observed and videotaped, and they are critiqued by teaching center personnel on a confidential basis. The grading exercise mentioned earlier is a device we use to try to achieve more uniform grading practices in our undergraduate courses and to encourage our graduate students to start

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium thinking about teaching. The graduate students are given a copy of a fictitious exam, taken by a fictitious student in a standard undergraduate statistics course, and a solution key. Each student individually grades the entire exam according to an established point scale. In the next class, the results are compared. It is amazing to see the huge variability in the points assigned on each question and the overall scores. Overall grades range from B to D, the differences being due solely to grading practice. This variability is caused by different students having different models for assessing answers, including assigning partial credit for partially correct answers and dealing with irregularities. The variability is exacerbated by cultural differences and by the widely varying undergraduate educational experiences of these new graduate students. It was recognized that this variability had to be dramatically reduced, and it had to be done immediately, because nearly all of these students would be grading papers very shortly. In addition to having completed teacher training, by the time each of our PhD students graduates she or he will have made many in-class presentations and at least three major departmental presentations, including the advanced data analysis project, the thesis proposal, and the thesis defense, each of which is attended by the entire faculty and student body. Even though the atmosphere is non-confrontational, the students generally find their first few presentations to be very stressful. It is felt, however, that after graduation, every student must be prepared to make presentations at conferences, to other professionals, or to management, and so these experiences are very important. Students are also encouraged to attend statistics conferences and present their research. All of this has been quite successful, and the department feels pride in the students' capabilities in this area. Suggestions and Cautions There are a number of suggestions, issues, and problems that must be addressed before PhD programs in statistics can include significant training in cross-disciplinary activity while simultaneously not shortchanging broad and deep statistical training. For example: One must first recognize that graduate students face a sometimes overwhelming challenge to attain a PhD and to feel that they have a mastery of their field. Because this challenge is so great, the students often look to the faculty members for signs as to what is really important, what really counts, and what is really required. If a department is serious about ensuring that its graduates receive training in cross-disciplinary research, this message must be conveyed clearly and consistently to each student, by word and decision. Students will quickly see whether or not something is really important if they see full or sparse faculty participation, or see students who do poorly being required to undertake remediation or being passed in spite of apparent weakness. It will be important to reach a strong consensus among faculty members on the importance of cross-disciplinary training, so that students receive a consistent story from the faculty. This will not be easy to achieve. The CMU department has evolved over time to the point that cross-disciplinary training is a significant component of our overall training. However, departments for which this is not the case may have difficulty in

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium establishing mechanisms for such training, providing incentives to faculty to lead its development, and reaching compromises about what current course work must be removed to make way for the new training. A faculty member will feel threatened if his or her special topic course moves from a required status to an optional status. I believe that the responsibility for cross-disciplinary training should be widely shared among the faculty, not just placed in the hands of a small group interested in applications. This will be the hardest goal to achieve in most departments and will require senior faculty leadership. Moreover, if the faculty agree that this activity should be emphasized, it must be realized that some of them may not have any experience in cross-disciplinary work. Thus, these skills will have to be learned by some faculty as well as all graduate students. The reward system in most departments must change so as to reflect the value of cross-disciplinary activities, both in promotion and tenure proceedings, and in year-to-year performance appraisals. Traditionally, this step is the most difficult for departments that have close associations with (or are administered by) mathematics departments. Even in autonomous departments of statistics, it will take time and leadership for the departmental culture to change so that cross-disciplinary contributions are thought of as being as important as papers published in our core theoretical journals. Departments will need to develop standards for evaluating coauthored papers and papers on substantive topics published in non-statistical journals. The availability of computer facilities and statistical software and the ability of the faculty to integrate computing effectively into applications courses will be vital. This may pose difficulties for some departments that lack the necessary facilities or computer expertise. Effective cross-disciplinary training requires access to interesting problems and subject matter experts who are willing to invest their time. For departments that have not established a large number of such collaborations, this will take a great deal of time and effort. Furthermore, an extra investment of faculty time will be needed to ensure that these collaborators are satisfied and willing to participate again. Statistics departments might rethink the organization of their seminar series, to include outside speakers who will give broad overviews of topics rather than in-depth talks and to include more speakers on cross-disciplinary topics, especially in a workshop atmosphere. The statistics community needs to continue efforts to attract survey articles on both theoretical and applied topics in statistics journals. A centralized mechanism needs to be created for graduate student summer internships in industry, government, and medicine that will deepen students' training in cross-disciplinary activities.

OCR for page 31
Modern Interdisciplinary University Statistics Education: Proceedings of a Symposium Appendix Examples of Statistical Practice Projects National Institute of Mental Health treatment of depression collaborative trial Longitudinal patterns of psychological distress following the Three Mile Island accident Factors related to early mortality among U.S. servicemen following deployment in Vietnam Discounts and quality premiums for illicit drugs Suicidal behavior in schizophrenics Analysis of clinical trial data on the effects of behavioral and pharmacological interventions in children with attention deficit disorder Examples of Advanced Data Analysis Projects Analysis of data from a fiberglass production facility Determining the sources of lead contamination in soil Predicting enrollments at Carnegie Mellon University Survival times in patients with recurrent depression Analysis of oceanographic data A Bayesian analysis of bivariate survival data from a multicenter cooperative cancer clinical trial