Directions for the Future
LEE S. SHULMAN
It is now time for all of us to think about suggestions for next steps the National Center for Education Statistics (NCES) might appropriately take with respect to teacher supply, demand, and quality data and, to the extent that it also is germane, to comment on some of the individual projects that have been described. I will leave this meeting remembering forever what I am now calling Stoltz's corollary to Gresham's Law, namely, that bad data actually encourage good data. This may be the longest enduring generalization from the meeting.
One of the questions that I did not hear discussed much is the purposes to which teacher supply, demand, and quality data are put, can be put, and ought to be put. If these data are the answers, as it were, what are the questions? It is clear that when you get chief state school officers together, they are enormously curious about data on teachers. I have been at some meetings with the chiefs, and they find those transparencies on supply and demand absolutely riveting. The chief from North Carolina immediately wants to compare his or her data to South Carolina and California and wherever.
But I am more concerned about how policy makers at different levels of the system actually use these kinds of data. Who are the relevant policy makers, first of all, and to what extent do they use different parts of the data? I think, for example, of the differences among chiefs, district superintendents, and state boards of education trying to decide whether or not to close a university school of education. The state of Oregon in 1991 substantially reduced the sizes of schools of education at Oregon State University and the University of Oregon, terminating all teacher training at the
latter institution. That was ostensibly predicated on knowing something about supply and demand trends in the state of Oregon. I would hope quality of teaching also was a factor, but I frankly doubt that they really had access to data regarding quality.
So who makes the decisions? I think we would have to identify stake holders and policy makers, examine the kinds of choices and decisions they are making or intending to make, and then begin to map backward and ask. ''What kinds of data would make it possible to render informed decisions instead of simply sticking a damp finger in the air?''
Dick Murnane commented on the misconceptions or misinterpretations that can readily be made by policy makers from looking at individual figures. He showed us an example of the hazard analysis that related number of years in teaching to the incentive value of salary, or at least, that is the way it would be interpreted. He smiled and said that you should not interpret this graph as a demonstration that after seven years one need not worry about teacher salaries.
Yet, that is exactly the misinterpretation that most policy makers will probably make. Therefore, an essential part of the question must be, "What are the most likely misinterpretations of these kinds of data? How can they be so collected and presented as to make it unlikely that they would have unintended and quite harmful consequences for those who are depending on them?" I am not sure how to conduct such research. I can imagine simulation studies in which we bring together policy makers at different levels, give them data, give them some problems to work on, and have them work together so we can study how they make use of the information. You can probably come up with other examples. But it is one thing to call certain data indicators; it is another to ask what they indicate to whom. How do they function for those who are using them as guideposts? We know very little about that.
A second general question is cross-occupational. Every time we saw an interesting data set about teaching, I found myself asking, "I wonder how that compares to nursing, social work, or other occupations?" I was not even sure what the comparable group ought to be. Is it public service occupations? Is it predominantly women's occupations? Are we observing in our data some universal characteristics of certain sorts of occupations? Is teaching a typical instance of a larger set of occupations, or is there something unique about teaching?
I shall now address the question of quality, which is the area of my own research. One question participants asked was how we could build indices of quality into surveys like the Schools and Staffing Survey (SASS). My first impulse was to say that we could not. Then I asked, "If we attempted something like this in medicine, could a small number of survey items tell us a lot about quality?" My answer, for medicine, was "yes."
If I were to ask about the quality of practice in internal medicine in a variety of settings, I could ask if respondents were board certified in internal medicine. To be board certified in internal medicine means an individual is already board certified nationally in general, has completed at least a three-year supervised residency in an approved training program in internal medicine, and has passed a set of examinations in that field.
There is no comparable question I can ask about teachers. There are no proxies for a comparable set of supervised and evaluated residency experiences in teaching. This observation leads to the desirability of NCES connecting their data-gathering initiatives with a number of emerging new programs that are going to provide those kinds of data over the next generation on teachers and their effects on kids.
I shall offer a few examples. On an experimental basis, the National Assessment of Educational Progress (NAEP) is now moving to a state-by-state examining system in reading and mathematics. This will probably not be a one-shot experiment. It will likely continue, and, in fact, all indications are that it will probably move to district-by-district, and maybe even school-by-school comparisons, although the matrix sampling scheme will have to change. If that happens, means should be sought to tie NAEP data to other kinds of data on teachers or on schools. This could become terribly complex, but it must be considered.
The National Board for Professional Teaching Standards will start slowly. It will be a voluntary teacher certification effort. The board will gather rich data on teachers' practices, including data on samples of their students' work. The virtue of starting slowly is that there will be time to determine how to link those data to SASS and other sources of school and teacher data that also will be collected.
The National Teachers Examination (NTE), published by the Educational Testing Service (ETS), will be much more widely used. The successor NTE, tentatively dubbed "Praxis," is advertised as a significant improvement over the current NTE, which is almost entirely composed of multiple choice items. ETS plans three components, much like the national board in medicine, to be given at three different times in a teacher's career. The current plan is for phase three, which unfortunately will be optional, to be based on observations or other documentation of teachers' classroom practices.
The NTE will report teachers' performance as scores. Hundreds of thousands of people will take that new NTE (though far fewer are likely to be observed in phase three). Discussions ought to begin to see how those data might be integrated into the larger data base. Unless a credible phase three attracts widespread use, however, NTE scores should be classified as predictors of quality rather than proxies for quality.
The current initiative in Vermont must be examined very carefully as a
source of teacher quality data. They will use student portfolios in writing and math as an outcome measure at the fourth and eighth grades. How are those going to be scored and interpreted? How can those ratings be related back to teachers? Two things are important here. By their very nature, portfolios are jointly produced by pupils and their teachers. Hence, teachers are linked to student portfolios in very direct ways. This differs greatly from correlating a teacher's practice with a child's test scores nine months to three years later.
Second, Vermont's education superintendent, Rick Mills, intends that the next step is to initiate teacher assessment in Vermont using teaching portfolios that connect to those student portfolios. Here you directly tie together teacher assessment and student assessment. If I read the tea leaves correctly, California is going down this route as well, and California accounts for 10 percent of the nation's students and teachers. Connecticut is another state that has been moving in a similar direction.
It would seem useful for NCES to target a few states in which such experiments are under way to create new kinds of teacher quality data and to plan some collaborative efforts with them to discover how to include those data in a broader set of data bases.
Finally, we should consider the current discussions of a national examination system. I know it is just talk now, but, if that moves ahead, it will change the ball game entirely. If it goes in the direction that Lauren Resnick, Mark Tucker, and their New Standards Project are advocating, the major features of the new examination system will not be multiple choice tests, but rather forms of portfolios and projects. They will have to be scorable and they will be linked to individual teachers, much like Advanced Placement tests. They will be external examinations that students are taught and coached to pass by teachers. The teacher indicators and the student indicators can always be in the same data file. It will be a much tighter coupling than anything we have now.
I urge the NCES staff to plan ahead for these new forms of data that will become available over the next 10 years and to determine ways of accommodating to them.