Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
of the three cultures did. Lois Peak of NCES later made the point that the written questionnaires used carefully chosen language to ask about homework because staff were aware of the issue. Neverthe- less, Stevenson maintained that far more sense could be made of such an issue through observation and interview than through a question- naire. Another example that Stevenson addressed which had already been raised several times during the day was that of "juku," the after-school classes attended by many Japanese students. Stevenson's point was that "juku" is a very vague term that refers not only to intense academic classes, but also to craft classes, sports, and other structured social activities. Many U.S. observers have made the claim that the Japanese students' superior performance can be explained by their attendance at juku because they have assumed that it provided students with rigorous training for college entrance exams and would compensate for any weaknesses in the schools' academic programs. Stevenson claimed that a deeper understanding of the cultural context reveals that this is not true, or at least that it is a seriously oversim- plified portrayal. Stevenson described a few other findings from the study: . The role of the school principal in Japan is very different from that of one in the United States. In Japan, committees of teach- ers have primary responsibility for running the school; the principal serves primarily to "execute" the committee's decisions. . Classifications of student ability come at different times in the three countries. In the United States, the urge to assist children who need it often leads to tracking decisions as early as kindergarten. In Germany, a formal decision is made at the end of fourth grade. In Japan such evaluations are made much later. · The Japanese curriculum is "a set of broad guidelines of the kinds of things that should be accomplished at each grade level." Teachers are then given considerable latitude to develop specific ex- pectations for different children. In Germany, Stevenson found, the situation is more similar to that of the United States in that each state is empowered to adopt its own guidelines. The German states are, however, required to meet broad national guidelines. To provide a sense of the flavor of some of the material the study produced, Stevenson read extended quotations from several teachers. He closed by remarking that "it is these kinds of . . . vivid, vital responses that we think give a meaning to a case study . . . that is very difficult to come up with in any other way." CRITIQUES AND METHODOLOGICAL ISSUES Lynn Paine, one of the session moderators, expressed a key issue facing the participants when she pointed out that they had been shown RESULTS OF THE THIRD INTERNATIONAL MATHEMATICS AND SCIENCE STUDY 15
graphs of international achievement scores for thousands of students in the morning and a videotape of "one classroom, one teacher, a small number of students" in the afternoon. "How," she asked, "do we somehow bring those together?" Her question reflected not de- spair but a sense that the challenge presented by TIMSS is a new one. As was repeatedly pointed out, TIMSS includes data drawn from dif- ferent samples and by means of different methods; moreover, the two three-country studies were added to the original TIMSS design (at the urging of the United States), and there is no detailed blueprint for fitting these elements together. Clearly TIMSS offers risks as well as possibilities. As one of the symposium paper authors, Michael Huberman (1977:1), wrote: "Such a study could run the risk of the centipede, marching off in several directions at once." The results available so far suggest that different, and possibly conflicting, conclusions might be supported by different parts of the study. Moreover, because the qualitative studies are inno- vations, neither means of verifying their results nor standards for evaluating their methods are readily available. This section explores questions raised about aspects of the study and the larger issue of linking its components. Linking the Components of TIMSS A certain amount of ambiguity may be an inevitable outcome of a study so large and complex. Theoretical or political concerns may drive observers to focus more closely on either the implications for curriculum raised by Schmidt's work or the concerns about teacher preparation raised by Stevenson, for example, given that the study itself was not designed to indicate which finding deserves more weight. For purely practical reasons, few observers may have both the time and skill to truly digest all that TIMSS has to offer. This point need not diminish the usefulness of the study's component parts, but it will surely affect attempts to integrate them. Nevertheless, the study components each make a contribution to answering core questions about teaching and learning in mathematics or science, and they should be considered as a package. At the time of the symposium, the first TIMSS reports had just been released, and it will not be until some time in 1998 that the last of the reports documenting the primary analysis for each of the study components will be released. Links among the components of the study were not really forged during this first stage. However, the ways in which these links are forged once the primary analyses are completed will be crucial, and symposium participants stressed the importance of estab- lishing a clear linking framework. Several key points about the links were made at the symposium: · For the components of this study to be effectively linked, rela- tionships among different research disciplines will need to be estab 16 LEARNING FROM TIMSS:
fished. Scholarly communities that are not accustomed to working with one another's data will need to collaborate in innovative ways to make the best use of the findings from TIMSS. · What happens with TIMSS will be a model for the future. Lois Peak reported that NCES is considering using videotapes in future studies, but she noted that using this powerful tool in valid ways is not a straightforward task. Given the initial reaction to what is known about the qualitative studies and the publicity they have received, it is likely that other researchers are already considering applying these methods in other contexts. The education community has a considerable appetite for rich data about teaching and learning, but, as many at the symposium pointed out, these new kinds of data can easily be misused. Simplistic understandings of TIMSS may be misleading. Un- til the links are forged and subjected to rigorous scholarly scrutiny, there is a danger that observers will use "common sense" to link the data from the various components of TIMSS, perhaps yielding mis- leading results. Observers who do not pay close attention might easily miss the fine points in this complex study the fact that some data comes from only 3 nations and some comes from 410r 26, for example and make erroneous conclusions about explanations for achieve- ment results.5 There are obviously many other differences among the study's components that are salient to any analysis that draws on more than one. . The Achievement Study As has been noted, many presenters marveled at the magnitude of what TIMSS accomplished. One described it as "a researcher's trea- sure trove," and many noted that analyses using the data could easily occupy the research community for many years. However, since the achievement component of TIMSS is the base on which the study rests, it is worth noting that several presenters expressed caveats about it. Jan de Lange, noting that multiple-choice items have been out- lawed in his country, The Netherlands, argued that the TIMSS items are primarily useful for testing low-level knowledge and do not nec- essarily represent anyone's idea of a desirable curriculum. In their paper, Atkin and Black (1997) expressed a similar concern, noting, for example, that a total of 11 multiple-choice and 3 free-response items were used to test the middle school population's knowledge of 5Population 2 students in six nations were surveyed in the Survey of Mathematics and Science Opportunities. The topic trace mapping components of the curriculum study covered 46 nations, and that study~s survey of teachers covered Population 2 students in three nations. The videotape study and case studies each involved only Population 2 students in Germany, Japan, and the United States. Finally, as noted, the achievement results were reported for Population 1 students in 26 countries, Population 2 students in 41 countries, and Population 3 students in 21 countries. RESULTS OF THE THIRD INTERNATIONAL MATHEMATICS AND SCIENCE STUDY 17
the portion of the test domain identified as "Environmental Issues and the Nature of Science." First, they argued, from this "small number of questions the results can hardly be a substantial basis for firm conclusions." They also noted that these 11 questions cover two distinct content areas, whose relationship to one another is not ex- plained in the framework (Atkin and Black, 1997:12-13~. Others made similar comments, but most, including de Lange as well as Atkin and Black, acknowledged that it would likely not have been possible to conduct the assessment at all without using methods that are both efficient and well established. Nevertheless, participants noted how easy it is for observers to lose sight of exactly what was assessed as the results are disseminated and applied in various con- texts. The Curriculum Study A number of participants raised questions about the curriculum study, primarily focusing on the conclusions Schmidt drew from his findings. For example, several questions focused on what TIMSS suggests about the ways that control over education systems might interact with achievement. In response to Schmidt's argument that U.S. students' relatively low performance is the result of an incoher- ent curriculum, Atkin and Black made reference to results indicating that TIMSS does not reveal a clear correspondence between centrally controlled, and, by implication, coherent, education systems and achieve- ment. Schmidt responded by noting that even a very focused curricu- lum may not be implemented in the classroom in a coherent manner. Others raised questions about whether the available means of measur- ing and comparing curricula were truly sophisticated enough to sup- port the detailed comparisons that have been made. Still others our , . . . ~ . ~ ~ sued this point from a different angle, questioning whether the impact of the structure of curricula and textbooks can be isolated as a factor, separate from the ways they are translated into classroom instruction. Schmidt argued that it can, though he noted that U.S. curricula and textbooks may not be functioning as they are intended to. For ex- ample, he explained, textbook publishers have made rational market- ing decisions in choosing to reflect a variety of curricula in their books. Their intention has been that teachers will use only the mate- rial that is relevant to the curricula they are following. Schmidt's point was that if the system is not working, only systemic changes can effectively improve student performance. "The problem," he main- tained, "is in the curriculum policy area, and the only way it can be addressed is . . . as a nation." The Qualitative Studies Another issue presented by TIMSS is that both of the qualitative studies took existing methods and "ratcheted them up," in the words 18 LEARNING FROM TIMSS:
of one participant, to new levels of both scale and sophistication. Before even addressing the links among them and the achievement and curriculum data, observers have begun to assess these studies themselves. Not surprisingly, because of its novelty, the videotape study dominated the discussion. Michael Huberman raised several important issues. He offered a general critique of the study's theoretical underpinning (see below, "Policy Issues"), but he also raised some specific questions about the methods of the videotape study. First, he pointed out, although the videotape certainly provides a far more detailed picture of the class- room than questionnaire data could possibly have done, the picture is still far from complete. Students and school culture, for example, contribute a great deal to the nature of a classroom lesson and have considerable influence on teachers' decisions, both large and small. A videotaped lesson, Huberman argued, is not easy to interpret in the absence of knowledge of its context. An understanding of what oc- curred during the days preceding and following the lesson that was videotaped might significantly alter an observer's interpretation of the lesson. A related issue for Huberman was that the videotapes provide a very "teacher-centered" vision of the lesson. They cannot reveal how students have perceived the lesson. Researchers coded teacher re- sponses for "helpfulness" as part of their analysis, for example, al- though they had no means of knowing whether students had per- ceived that they had been helped by the interaction in question. The coding was also an issue for Huberman for another reason. What, he wondered, is the value of collecting data as rich as these videotapes, and then immediately coding it and reducing it to statis- tics that can be put into tables? Moreover, he asks, is there not a danger in the "irresistible analytic convenience" of the software? Might not the software's power in counting the frequency with which cer- tain behaviors occurred have "tricked" researchers into "unearthing 'themes' or 'patterns"' that were not actually there (Huberman, 1997:14)? Huberman also raised questions about the sampling for the study. Pointing out that the sampling was not random, Huberman noted in particular that the three types of schools in Germany, the hauptschule, the realschule, and the gymnasium, which differ in significant ways, were not represented proportionally. He also raised a question about how the high refusal rate (almost 50 percent) among schools that were asked to participate might have affected the outcome. Although the study included a record number of classrooms, it nevertheless runs the risk of seeming to be no more than an unusually rich collec- tion of persuasive anecdotes. Huberman also noted that the effect of the cameras on the teach- ers and students who were filmed could not be known. Stigler had addressed that issue in his presentation because it had been an impor- tant concern for his team. Their conclusion was that while teachers' and students' awareness of the camera may have affected their be RESULTS OF THE THIRD INTERNATIONAL MATHEMATICS AND SCIENCE STUDY 19
havior in a variety of ways, it is not likely that teachers could actu- ally change their teaching in fundamental ways likely to alter the study' s results. If they could, Stigler joked, the installation of cameras in classrooms would be a simple means of improving teaching. A final set of questions Huberman raised concerned the fact that the study filmed a single 50-minute lesson in each of 231 classrooms. Huberman wondered whether filming a series of lessons in a smaller number of classrooms might have yielded more useful results. Many of the coding categories, he noted, were efforts to capture "activities or processes that play out over time," such as building on complex concepts or establishing links with content covered previously, that can not easily be evaluated in the context of a single lesson (Huberman, 1997: 12). Although Huberman's primary contribution was to raise questions about the study, he nevertheless described it as an extremely impres- sive effort. Symposium participants did not have sufficient time to wrestle with all of the questions, or to resolve any of them, but they did refer to many of them in various contexts. After watching two excerpts from the videotapes, participants also raised another concern. As was discussed above, many at the symposium had enthusiastic reactions to the videotapes and launched eagerly into discussions of what the lessons shown demonstrated. But as Lois Peak pointed out, the powerful reactions people had illustrate the risk that the video- tapes could be misused: because they are so much richer and more compelling than written descriptions, viewers may feel a sense of certainty about impressions based on them that is unwarranted. This richness is, of course, their virtue as well. Lynn Paine cited as an example of this something she observed in the two lessons that were shown. Both could be described as decidedly teacher directed, but their ways of being so were dramatically different. In the U.S. lesson, she pointed out, the teacher was evidently perceived as the sole source of both information and ideas; students in the class did not look at others who were speaking, or seem to engage as a team. In contrast, the Japanese teacher had clearly planned the lesson around the idea that different students would come up with different valid means of solving problems. He showed that he intended the students to learn from one another as well as from him, even though he re- tained control of the discussion. Part of Paine's point was that this sort of insight is valuable re- gardless of how representative a particular lesson or behavior might be. In a larger sense, this point applies to many aspects of TIMSS. While forging links among the components will be extremely impor- tant, the separate sets of data can be of significant value on their own to both policy makers and others who are seeking to evaluate policies and strategies, and to practitioners who are seeking insights or inspi- rations. TIMSS is not a research project designed to test pre-existing hypotheses, as Edward Haertel pointed out; its results cannot be used to conclusively prove or disprove assertions. It provides no control 20 LEARNING FROM TIMSS: