Recommendations for Making the Most of TIMSS.

The major conclusion that BICSE has drawn from its observation of the progress of TIMSS thus far is that the study will be incomplete unless a variety of follow-up analyses of the data are undertaken.

Despite the undisputed value of the reports that have already been released, the total cost of TIMSS will be hard to justify if no more comes out of the study than those and others already scheduled for release. Moreover, while the reports that are part of the primary work of TIMSS were all reviewed, they are nevertheless official reports that have not received the kind of open peer review to which independent scholarship is generally subject. The board urges that each of the scholarly communities with an interest in TIMSS explore the hypotheses suggested by TIMSS, the data that have been collected, and the methodological issues the study has raised. This work will be important not only to scholars, but also to the teachers, administrators, and policy makers who need to draw inferences from TIMSS. Although much, but not all, of the existing data have been made available through a variety of channels, the board believes that the needed scholarship should be encouraged and facilitated in several ways.

This report identifies and describes the research approaches that BICSE has identified as the most promising ways of following up on the data from TIMSS. The recommendations outlined in this report address both practical issues surrounding further analyses and intellectual goals for following up on the work that has already been done. The workshop yielded a variety of specific suggestions for secondary analyses, as well as a predictably wide divergence of opinions about the relative merit of particular ones. Although the board did not identify individual research topics on which it would place top priority, it synthesized this discussion, weighed it together with information and reflections related to TIMSS that it has accumulated over a number of years, and developed specific criteria for identifying the approaches that it believes hold the most promise. The board believes—and this position was strongly endorsed at the workshop—that independent scholars should be encouraged to develop their own ideas and not be confined by rigid parameters of acceptable approaches to the data. The purpose of the board's recommendations is not to impede the creative process by which scholars explore possibilities and subject their ideas to the scrutiny of their peers. Rather, the board intends both to speak out about the importance of secondary



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis Recommendations for Making the Most of TIMSS. The major conclusion that BICSE has drawn from its observation of the progress of TIMSS thus far is that the study will be incomplete unless a variety of follow-up analyses of the data are undertaken. Despite the undisputed value of the reports that have already been released, the total cost of TIMSS will be hard to justify if no more comes out of the study than those and others already scheduled for release. Moreover, while the reports that are part of the primary work of TIMSS were all reviewed, they are nevertheless official reports that have not received the kind of open peer review to which independent scholarship is generally subject. The board urges that each of the scholarly communities with an interest in TIMSS explore the hypotheses suggested by TIMSS, the data that have been collected, and the methodological issues the study has raised. This work will be important not only to scholars, but also to the teachers, administrators, and policy makers who need to draw inferences from TIMSS. Although much, but not all, of the existing data have been made available through a variety of channels, the board believes that the needed scholarship should be encouraged and facilitated in several ways. This report identifies and describes the research approaches that BICSE has identified as the most promising ways of following up on the data from TIMSS. The recommendations outlined in this report address both practical issues surrounding further analyses and intellectual goals for following up on the work that has already been done. The workshop yielded a variety of specific suggestions for secondary analyses, as well as a predictably wide divergence of opinions about the relative merit of particular ones. Although the board did not identify individual research topics on which it would place top priority, it synthesized this discussion, weighed it together with information and reflections related to TIMSS that it has accumulated over a number of years, and developed specific criteria for identifying the approaches that it believes hold the most promise. The board believes—and this position was strongly endorsed at the workshop—that independent scholars should be encouraged to develop their own ideas and not be confined by rigid parameters of acceptable approaches to the data. The purpose of the board's recommendations is not to impede the creative process by which scholars explore possibilities and subject their ideas to the scrutiny of their peers. Rather, the board intends both to speak out about the importance of secondary

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis analysis—additional research designed to follow up on the study's initial findings—and to offer some considered judgments about the possibilities the data hold. The recommendations are followed by some explanation and description of a few specific examples of approaches that illustrate how they might be applied; the board has not taken a position on the priority that should be assigned to any one of these research approaches. RECOMMENDATION: FOCUS ON ANALYSES THAT TAKE ADVANTAGE OF WHAT IS UNIQUE ABOUT TIMSS TIMSS is different from other studies in several specific ways, and for well thought-out reasons. Most obviously, TIMSS is big. It assessed the knowledge of students in nearly fifty countries and made significant compromises in the development and administration schedule, selection of the content to be studied, and other issues to do so. It addressed both mathematics and science learning, which entailed compromises in sampling and other difficulties.5 It assessed intact classes of students. TIMSS was also, of course, complicated by the inclusion of the curriculum, videotape, and case studies. Each of these decisions multiplied the difficulty of interpreting results and linking findings, but they also made possible a number of approaches to the data. Thus far, little of the work that has been done exploits the complex design of the study, but there are many ways to do so. The board strongly urges that such work be undertaken. Participants in the workshop spent a good deal of time discussing specific approaches to the data that would exploit the multiple-country, multiple-age, two-subject study design. The general approach that received the most attention is that of probing much more deeply into the cross-national comparisons, though a variety of ways of doing this are possible. As David Baker noted, it is the cross-national design that will “enable TIMSS to have its biggest impact on policy formation in the United States,” (Baker, 1998:5), because this is what makes it possible to identify system-level factors that vary across countries—such as school governance, standards enforcement, textbook dominance, and many other features—and look at how they relate to differences in national performance. One approach to this line of inquiry is the comparison of patterns of within-country variance. For example, Douglas Grouws noted that 5   For example, for Population 2, researchers identified intact mathematics classes and administered assessments in both subjects to those students. Application of this method was complicated by the fact that students in a mathematics class do not always stay together for their science instruction, so science teachers were linked by other means. This method did make possible the linking of student, school, and system characteristics, and of mathematics and science results (Robitaille and Garden, 1996. 47–48).

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis TIMSS has shown that U.S. 8th graders spend more hours in mathematics classes than do students in Germany and Japan (Grouws, 1998:1–2). He pointed out that this information, though presumably accurate, is not sufficient reason to conclude that the number of hours spent in class did not contribute to differences in achievement (Japanese Population 2 students performed better than both U.S. and German Population 2 students). It may be, he explained, that in one or more of the countries there is extreme variance from school to school in the hours spent in class; such a finding could open up a new avenue for investigation and would also, of course, mitigate against policy recommendations based on the mean alone. Richard Murnane made a similar point when he stressed the importance of comparing across nations the degree of variation in student achievement within nations, and tying this variation to measures of socioeconomic inequality (Murnane, 1998). Similarly, Elizabeth King advocated such comparisons of within-country variation because of what they can reveal about “the importance each country accords equality across groups or its success in achieving this equality” (King, 1998:3). 6 By going deeper than the achievement rankings in this way, they and others argued, researchers can make much more useful connections between TIMSS and the kinds of policy questions that are of most interest in the United States Mary Metz and numerous others remarked on the value of cutting the data in different ways than has been done and comparing, within countries, subgroups of schools, classes, and students; subsets of the domain; performance on particular item types; etc. (Metz, 1998:7). Other areas of interest would include variations in opportunities to learn science at the three grade levels, particular characteristics (such as structure, format, content) of the mathematics and science items that were scaled together, and the relationship of the items to school curricula as perceived by teachers. Fleshing out the details of the performance of students within a particular country can help to answer a variety of questions about its system and about what its students are taught. Other ways of pushing beyond the achievement rankings were discussed. Gender differences, for example, were observed across nations in the performance of Population 2 students but not in that of Population 1 students. This finding could make an important contribution to the debate about the underlying causes of gender differences in science and mathematics achievement but needs to be examined further through exploration of the specific topics that do and do not yield differences; changes in curriculum, teaching strategies, and other issues during the years from the elementary to middle school; and patterns of gender difference across countries. 6   Elizabeth King prepared a written summary of her ideas in advance but was unable to attend the workshop.

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis Many participants also stressed the importance of comparing performance across mathematics and science. As Aaron Pallas noted, Since within a country the same group of students are being tested in math and science, different profiles of math and science achievement cannot be accounted for by differences in students’ social backgrounds and educational resources, … there may be some merit to looking carefully at what is going on in countries that appear to be doing appreciably better in one subject area than the other (Pallas, 1998:4–5). TIMSS was designed to provide opportunities for many different comparisons—using many different criteria but the same set of students—in the hope that the contextual factors with the greatest effects on student learning could be identified. More of the possible comparisons need to be made in order for TIMSS to be more than a collection of separate studies. RECOMMENDATION: REVISIT THE CONCEPTUAL FRAMEWORK ON WHICH TIMSS WAS BASED As has been discussed, the design of TIMSS grew out of a conceptual framework that embodies some propositions about the ways students learn. The framework lays out four basic research questions (cited earlier) to be addressed—about the role of the intended and implemented curricula in influencing learning, and about the attained curriculum. This framework reflects specific ideas about which factors affect learning and how they interact with one another (Schmidt, Jorde et al., 1996). Now that the data have been collected, it is important that researchers take time to reassess that framework and the assumptions to which it led in the design and development of the TIMSS study. For example, the selection of questionnaire topics reflects a variety of assumptions about the role of school structure and many other “inputs” in student learning; the design of the curriculum study reflects some assumptions about the relationship between teaching and curricular exposure; decisions made in the course of video and case studies reflect implicit and explicit ideas about factors that are most likely to influence learning and how those might be explored. It is important now to look back and assess the extent to which the results fit the study model. A second step in such an evaluation would be to consider alternative framework models to determine whether others might actually fit the data better. RECOMMENDATION: DRAW LESSONS FROM THE CONDUCT OF TIMSS Because TIMSS was, in a sense, experimental in both its scale and its goal of integrating essentially disparate kinds of data, the board

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis believes it is extremely important that the research and policy communities look closely for lessons TIMSS can teach about practical, methodological, and conceptual issues. The board identified three particular areas that offer lessons for the development and design of future studies. Recommendation: Explore Methodological Issues There is clearly a need for research that can yield not only challenges to or support for particular ideas about teaching and learning, but also useful insights about how such ideas can be tested and explored. Because TIMSS was methodologically experimental, it is important that the scholarly community learn all that it can from the experience. Numerous questions suggest themselves, including: How well did the cognitive items in the various subdomains work? What has been learned about the design and conduct of a classroom video study? How might one go about explicitly linking very different kinds of data? What are useful, creative alternatives when explicit links are not possible? Ways of bridging the gaps between different components of the study—another obvious way of exploiting its unique design—generated considerable discussion. One important message that emerged was that moving across the datasets will be tricky. The workshop participants clearly held a wide range of opinions about the value of doing so and, indeed, about the relative usefulness of the different components of the study. Part of what these discussions revealed was how little opportunity researchers in different fields have to interact with one another in substantive ways and to engage with the issues pertinent in other fields. For example, participants with qualitative backgrounds were disinclined to think that the questionnaire data regarding professional development for teachers was of much use, while more quantitatively inclined participants were equally cautious about how the videotapes and the case studies could best be used. The conversations that pushed participants with different views to consider new perspectives made plain that simple resolutions for many of the issues are not likely. There was, as Lynn Paine put it, “a great deal of discomfort” about the feasibility of many of the possible links among the substudies. Several participants for whom TIMSS was new remarked that the conceptual underpinnings of the study were not easily apparent in either the discussions they had witnessed or the results about which they had heard and read. In attempting to describe the essential problem, the group noted that the components of TIMSS seem to stand

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis alone and that the means of establishing desired links among them are not easy to see. Moreover, it was clear that the hoped-for links may not all be possible. The board's conclusion is that such links could be established only through careful further research conducted by interdisciplinary teams. Such work might seem to some to be a purely intellectual exercise. The board believes it will be integral not only to making the most of TIMSS, but also to moving comparative education research forward and enabling it to provide information that can be of real value for decision makers. Recommendation: Use the Problems and Successes of TIMSS to Inform the Planning of Future Studies Testing and exploration of the conceptual framework of TIMSS is an important step in the long-term intellectual inquiry of which TIMSS is a part. In more immediate terms, however, it is also very important that practical lessons with implications for future studies be distilled from TIMSS now, both so that problems that arose with TIMSS will not be replicated and so that its successes can be. One problem that was mentioned more than once at the workshop related to the sampling issues that entailed important trade-offs (discussed below). Follow-up analyses that explore the costs and benefits of these choices would be of great use to planners of future studies of the numerous other aspects of TIMSS that were significant challenges, and that could yield practical lessons, a few examples are: the instrument was translated into more than 30 languages; open-ended responses from students in more then 40 countries were scored; questionnaires were designed to obtain useful information from students, teachers, and administrators in vastly different contexts. The technical manual for TIMSS (Martin and Mullis, 1996) provides a lot of detail about many of the steps that were taken, but further objective exploration is needed. There may well be an increased demand for large-scale studies such as TIMSS in the future, but practical lessons learned from the present study can benefit more modest comparative studies as well. Recommendation: Lay the Groundwork for Making the Most of Future Results The value of TIMSS will clearly be enhanced if future data collection efforts build on the foundation TIMSS has laid. For example, TIMSS results indicate that, at least in the United States, there may be a decline in achievement relative to other countries between the 4th and 8th grades. This inference, however, is a tentative one since TIMSS compares two different populations of students (4th and 8th graders in 1995), not the same population at two different points in time. If researchers focus now on questions that could be followed up

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis as cohorts mature or questions that could be pursued in more depth in future studies, those studies will be better able to take advantage of such opportunities. In addition, by obtaining a more detailed picture of the curriculum over those years and of the case study findings about those years, and following up on other relevant clues in the data, researchers can prepare for other important analyses. Future studies may also include countries that did not participate in TIMSS, and this is another area in which advance planning could be beneficial. Plans are already in place for a replication of TIMSS for Population 2, known as TIMSS-R. TIMSS-R will be administered in 1999, when the students who were in Population 1 for the original TIMSS will have moved into Population 2. It will include administration of a refurbished set of TIMSS cognitive items,7 slightly modified questionnaires, and a video study of mathematics and science classrooms. Approximately 40 countries will participate. The National Center for Education Statistics (NCES) also has plans to conduct an additional follow-up study in 2003, to collect data about the original Population 1 students as they prepare to leave secondary school. Work should be done now to prepare to make the most of these efforts, and others as well. RECOMMENDATION: EVALUATE CLAIMS THAT HAVE ALREADY BEEN MADE BASED ON THE DATA. Role of Curriculum One particular set of claims based on TIMSS data—that cross-national variation in academic achievement is influenced by and accounted for by cross-national variation in the content of curricular frameworks for science and mathematics, and that the U.S. students’ weak performance can be attributed to a curriculum that is “a mile wide and an inch deep”—was the subject of special attention at the workshop.8 BICSE members chose to address this claim in part because it is the most specific causal claim that has been put forth in TIMSS reports and in part because this claim has received a significant amount of public attention. Noting that the publications describing and supporting this claim have not been reviewed by their authors’ academic peers (because of their status as official project reports), the board believes strongly that follow-up studies that will explore, test, and confirm or contradict the claim are needed. This belief was 7   Some of the original TIMSS items have been publicly released; these have been replaced for TIMSS-R with new items designed to be similar to the old ones. 8   This claim is laid out in A Splintered Vision (Schmidt, McKnight, and Raizen, 1996).

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis reinforced for the board by the marked differences of opinion in evidence at the workshop about not only the validity of the claims, but also the standards by which they ought to be evaluated. Similar issues would be likely to arise with any TIMSS-based claims, and BICSE's view is that other serious claims ought to be explored and tested as they emerge. A number of participants agreed that the claim that there is a connection between what is taught and what is learned is reasonable, so far as it goes. The finding that intended curricula in the United States generally include a greater number of topics, more repetition and review, and less intellectual focus than do those in other countries was intriguing but not entirely persuasive to the group—not all were convinced that this description of the U.S. curriculum tells the whole story. The further claim that it is the lack of coherence in U.S. curricula—and within the network of assessments, standards, professional development for teachers, and other factors that affect learning—that sets the United States apart from other countries and actually accounts for the weaknesses U.S. students demonstrated on TIMSS received even greater challenges. This claim was viewed by many in the group as a hypothesis not adequately supported by the evidence that has been put forward thus far. Participants suggested a number of counter hypotheses that deserve to be examined. Among these were that rapid changes in curriculum in some countries may have, in effect, distorted the comparative picture that seemed evident to the authors of A Splintered Vision. TIMSS, by necessity, provides a snapshot of conditions at a particular moment; conclusions about differences in complex educational structures may require a longer view. Others suggested that important characteristics of the various countries and education systems studied that did not show up in the curriculum study may play a significant role in achievement differences. Heinrich Mintrop pointed out, for example, that East European countries are currently in the process of rendering their systems far less coherent than they had been before the fall of the Berlin wall because of their perception that a too-rigid coherence had been detrimental to learning (Mintrop, 1998). This point led to the suggestion that a curriculum that is focused and coherent is not necessarily also of high quality. Aaron Pallas pointed out, for example, that the study does not address the question of resources, noting that ambitious standards and effective alignment among texts, tests, and other elements might exist in a given system without the resources necessary to meet the standards. Pallas also pointed out that “even in a country with a centralized education system, no student experiences the national implemented curriculum; rather, students are exposed to a particular implemented curriculum” (Pallas, 1998:3). The causal link between the configuration of curriculum and student learning may not have been forged in the extant reports, but many saw the data from the curriculum study as the source of intriguing directions for future study. Most participants agreed that although

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis the curriculum study has provided a valuable macro-level view, the data need to be probed at a much more detailed level. James Shymansky noted that counting the number of topics covered per country, as was done in the curriculum study on which the claim about coherence was based, did little to illuminate the quality of teaching and learning (Shymansky, 1998:2). Jeremy Kilpatrick explained that for the curriculum study to address so many countries and grade levels, it was necessary for the notion of topic coverage to be conceived in a fairly crude way. He argued strongly for further research that would look much more closely at the U.S. curriculum, specifically to see what it “might be attempting that is not captured by the TIMSS framework” (Kilpatrick, 1998:1). It would also be useful, he explained, to develop more descriptive pictures of the treatment of particular topics in the curricula studied and to look more closely at the relationship between specific coverage of a topic and details about student performance on it that could emerge from item analysis. More fine-grained analyses such as these would not only be valuable in themselves, but could also yield further evidence with which to evaluate the claim that the U.S. curriculum's shallowness has caused the relatively poor performance of U.S. students. More detailed pictures of the education systems in a number of countries would also be of use in this context because, as a number of participants pointed out, the degree to which a coherent curriculum is in place in a given nation is only a small part of the story of how learning occurs there. Investigation of specific factors that affect curricula, such as standards, teacher qualifications, and general population characteristics, could significantly enrich understanding of both intended and implemented curricula. More detailed explorations of how curricula in a few nations are constructed, and of the roles of different actors in their respective systems, could be of significant value. Augmenting the existing data with information of this kind could help to support the claims that have been made, or offer new ways of looking at the curriculum data. A number of participants noted that a variety of cultural factors must surely play a role in shaping student learning and that a better understanding of those factors will illuminate the apparent curricular differences that have been identified. Moreover, as Gary Natriello suggested, analyses that leave citizenship aside and group the sampled population according to characteristics of their educational experience might be very useful in exploring the ways the intended and implemented curricula interact and affect learning (Natriello, 1998:5). A number of participants shared the view that the reports of the curriculum study may not have distinguished adequately between exposure to curriculum and the instructional modes through which the curriculum was delivered. As Aaron Pallas noted, “It may well be that the effect of exposure to a particular topic on achievement is contingent on the nature of that exposure—that is, the context in which the exposure occurs, the specific classroom (and extramural)

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis activities that are intended to enable students to learn the topic, even perhaps the sequencing of topics that both precede and follow a given topic” (Pallas, 1998:5). The workshop discussion only reinforced the board's concern that it would be easy for policy makers and the public to deduce from the public discussion of the curriculum study that it has proved a set of assertions about the effects of curriculum that have not actually been proved. Follow-up research is needed to pursue the most intriguing implications of and questions about this data and to provide policy makers and practitioners with a more nuanced picture of the role of curriculum in student learning. Other claims that have been, or may be, made based on the TIMSS data require follow-up as well. For example, a 1998 U.S. Department of Education report contains the claim that “the TIMSS results indicate a pervasive and intolerable mediocrity in mathematics teaching and learning in the middle grades and beyond” (Silver, 1998:1). As Gerald LeTendre pointed out, claims such as these “are then countered by rhetorical invectives that move the dialogue…away from sustained academic debate” (LeTendre, 1998:1). In the board's view, academic debate about what TIMSS shows is precisely what is needed. A Word About Some of the Criticisms of TIMSS In the context of the need to evaluate claims about TIMSS, it is important to address some criticisms of the TIMSS data that have received public attention. These criticisms concern relatively complex technical issues, and the details of the critical debate need to be explicitly aired. Indeed, in the board's judgment, the criticisms embody hypotheses about data-based comparisons that ought to be evaluated along with other claims. The primary criticism is that the Population 3 (students completing secondary schooling) results are misleading because the ages and numbers of years of schooling of the students in the sample vary so widely around the world. Certainly the Population 3 data are complicated. The sampling design for Populations 1 and 2 were relatively straightforward by comparison; the goal was to discover how much students had learned by two ages, so in each country, the two grades serving the largest numbers of 9-and 13-year-olds, respectively, were sampled. For Population 3, however, the issues were more complex for a number of reasons. First, different systems structure their secondary schooling in different ways and conclude it at different ages. Second, students generally have far more choice about their studies in the late secondary years than they do in the elementary or middle-school years, so defining the content to be tested for this population was more difficult than for the others. Moreover, students in many countries, including the United States, are placed in academic tracks that dictate the curriculum to which they will be exposed. These and a host of other factors had to be considered when the

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis designers of TIMSS determined which accessible data about secondary students would be of most value. The designers of TIMSS determined, collaboratively, that for Population 3, the most useful results would be for students “who are at the point of leaving school and entering the workforce or postsecondary education” (Mullis et al., 1998:13). Further decisions were made about the means of identifying subsets of this population who had done advanced work in physics and calculus. Critics have claimed that it is unfair to compare the performance of 21-year-old students, the age of students sampled in Iceland, with that of 17-or 18-year-olds, the age of students sampled in the United States and that the results are misleading (Bracey, 1998; Rotberg, 1998). The board has three comments about this discussion. First, it is clear that the complexity of the issues surrounding the Population 3 data was well understood by the TIMSS researchers, as they are thoroughly documented in the report of the results (Mullis et al., 1998:11–28). That report discusses the problems the researchers faced and the decisions they made. Tables in the report present details about the cohort sampled, the domain covered, characteristics of each participating country, and many other issues. An appendix to the report supplies further details about the upper secondary system in each country, particularly details about the academic tracks in each. Although some critics of TIMSS have seemed to charge that the TIMSS researchers ignored or overlooked the issues with the Population 3 data, this is clearly not the case.9 Second, the criticisms of TIMSS may have (or already have had) the effect of undermining TIMSS as a whole, yet they relate almost exclusively to the Population 3 sampling. It is important to note that the sampling for Populations 1 and 2 was, as has been mentioned, more straightforward and has not been criticized on the same grounds. Finally, the criticisms themselves need to be evaluated carefully. As David Baker pointed out (1998:7), “Most of the arguments that the study's critics make about the ‘fairness’ of the simple cross-national comparisons of achievement are really cross-national hypotheses about which national factors shape the process by which 9   This issue is further complicated by criticisms of the study's predecessors, particularly the Second International Mathematics Study (SIMS), with regard to sampling. Critics of SIMS charged that, in the earlier study, some countries sampled only their ablest students while others assessed a broadly representative sample. The result, these critics argued, was rankings that were both unfair and inaccurate. Cognizant of these criticisms, the TIMSS designers addressed the issue early in the process. Having learned from SIMS that it would not be possible to ensure complete compliance with all sampling procedures in all countries, they made the existence of sampling compromises in particular countries very clear in the published reports and in the released data. The question of how rigorously the sampling rules were followed and documented is, of course, separate from the question of determining which secondary students should be sampled.

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis learning takes place in school.” In other words, the decision to test students just completing whatever secondary schooling is offered in their country, regardless of their age, was made deliberately; therefore, what is needed is a debate to address the pros and cons of that decision rather than a dismissal of all that resulted from it. The challenge is actually not to the accuracy of the numbers, but rather to the premises underlying their collection and the inferences made on the basis of their analysis. These criticisms are really assertions that a different set of data ought to have been collected. The basis for these assertions is, presumably, a counter hypothesis about which factors affecting learning are most important. Such a hypothesis needs to be evaluated by the scholarly community just as other TIMSS-based claims do. RECOMMENDATION: EXPLORE WAYS TO RELATE THE TIMSS DATA TO OTHER SOURCES OF DATA Each country that participated in TIMSS has now received its own national data, and all have access to the international data. Anecdotal reports indicate that secondary research is already well under way in a number of countries, and it is likely that both direct collaboration with non-U.S. scholars as well as research that builds on existing work could considerably enhance the value to be gained from TIMSS. Data from other sources within the United States, such as the National Educational Longitudinal Study and the National Assessment of Educational Progress (NAEP), as well as other international data might also provide possibilities for interesting follow-up analyses. As is discussed elsewhere in this report, more detailed portraits of the education systems and contexts for learning in other countries could be of great value as researchers attempt to explore puzzles in the data and pursue specific comparisons between individual countries. Germany and Japan are obvious targets for future research because of the three-country studies that were part of TIMSS, but high-achieving countries such as Singapore, and others that stand out for various reasons, may be producing research that follows up on their own national data. As secondary analysis moves forward, it will also be important for the TIMSS research community to find ways to take note of findings that confirm, complement, or contradict one another. Such findings can obviously prevent unnecessary duplication of effort; they can also strengthen policy conclusions and generate promising leads. Finally, it will be important for researchers to determine the extent to which the TIMSS data confirm or contradict existing sources of data that bear on mathematics and science learning at the three target age levels. Work has been done to establish statistical links between TIMSS and NAEP (U.S. Department of Education, National Center for Education Statistics, 1998). Because there are many differences between the two assessment systems, only limited formal links have been es-

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis tablished, but the possibilities inherent in such links, and in looser connections with other sources of data, need to be explored. On a broader scale, the TIMSS results should be evaluated in light of findings from the considerable existing literature on mathematics and science learning. RECOMMENDATION: CONSIDER OTHER APPROACHES TO THE ANALYSIS OF THE ACHIEVEMENT DATA Researchers who did not participate in the creation of the achievement scores have suggested that alternative models might yield different results, and possibly more profound findings that bear on important questions about cognition and curriculum effects. Current total (and subtest) scores capitalized on what is common among all test items and ignore important parts of the test design that included three dimensions—(1) content areas, such as mechanics and magnetism; (2) performance expectations, such as recall, problem solving, and practical investigations; and (3) affective measures, such as attitudes and teacher and student beliefs—and an over-sampling of items in curricular areas commonly found across countries. Moreover, Kupermintz and Snow (1997) and Hamilton et al. (1997) have shown that mathematics and science achievement tests similar to the TIMSS tests are more complex than admitted by the current TIMSS scores10 and that once the complexity is taken into account, these scores are sensitive to curriculum effects. Without in any way disparaging the results already produced by the TIMSS Study Center, the board strongly recommends that alternative models for creating achievement scores be explored by independent scholars. Given that the achievement results, and the rankings in particular, have been accorded a significant measure of political importance within the United States, it is vital that any additional insights these data might yield be mined and made public. RECOMMENDATION: RECOGNIZE THAT TIMSS DATA CANNOT ANSWER EVERY IMPORTANT QUESTION Although TIMSS is obviously a very rich dataset, it is, of course, not comprehensive. It is important that both the research and policy 10   For example, the science achievement test included on the NELS:88 longitudinal survey can be empirically decomposed into components that measure (1) basic knowledge and reasoning, (2) quantitative science, and (3) spatial and mechanical ability. These components can be predicted, differentially, by student course-taking patterns.

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis communities recognize that TIMSS cannot provide an answer to every question; this is important to note not only because limited resources for research should not be wasted on lines of inquiry that are not promising, but also because recognition of the gaps in the TIMSS findings can spur valuable research targeted at questions about which information is needed. An important example of this point is the relationship between the achievement scores and particular modes of teaching. Observers of TIMSS have noted that relatively little of the resulting discussion has seemed to focus on specific implications of the results for teaching; some discussion at the workshop specifically probed this point. Each component of the study attempted in its way to elicit information about the ways different aspects of teaching affect student learning. The questionnaires included a number of questions about teaching strategies, pedagogical beliefs, preparation and support for teachers, and other issues. The case studies explicitly probed aspects of the professional development, conditions of employment, and other factors in teachers’ lives in three countries. The video study provides not only footage of lessons, but also coded analysis of different teacher actions and a questionnaire about the filmed teachers’ views of the taped lessons. The curriculum study provides information about the official documents and textbooks that teachers use. Yet many who are professionally interested in teaching have concluded that although many of these results are of great interest, and provide a wealth of examples and ideas, they have not coalesced into clear, supportable conclusions about the effects different modes of teaching have on learning. The questionnaires, for example, struck many at the workshop as providing only fairly limited information about teachers’ preparation and practice. Some noted that because the questionnaire data relied on self-reports, their value was limited. Others noted that they provide very little detail about important questions; for example, a question about the highest level of education teachers have received does not address either the nature of the degree or the number of teachers’ degrees that are in fields other than those in which they are teaching. Another question, about teachers’ sources of information when creating lessons, provides a sense of how different kinds of resources are used, but, as Gerald LeTendre pointed out, one needs to understand the nature and role of these resources in particular settings to make much use of this data. A number of specific questions one might hope to answer through TIMSS were posed for discussion at the workshop, and many of them seemed to many in the group difficult to address through the existing data. For example, Senta Raizen noted that “there simply isn't enough information available in any of the TIMSS datasets to differentiate sufficiently among countries either as to training and support received by teachers in each country or regarding teachers’ willingness to teach in a particular way, let alone link these variables” (Raizen, 1998:7).

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis Other questions are more approachable through the data, many in the group noted. Many seemed to adopt Mavis Sanders's formulation that with regard to links among professional development, teaching, and student achievement, TIMSS data serves to “highlight topics… that require further exploration” and to “caution [us] against divorcing professional development, teaching, and student achievement from cultural context” (Sanders, 1998:1). The videotapes provide compelling images of teachers at work in their classrooms for which a variety of uses have already been found. According to some, they even provide persuasive portraits of distinctly “Japanese,” “German,” and “American” teaching styles. (Further analysis is, of course, needed to ascertain whether the degree of variation in achievement between countries is actually higher than that within countries.) The tapes in particular, but presumably other TIMSS material as well, have provided practitioners and others with extremely valuable views of possibilities they may not have considered and new ways of addressing common challenges. Such information is an important benefit of TIMSS, but does not provide answers to broad questions about the relationship between teaching practice and student achievement. Similar limitations in what the case study data reveals about teacher practice were also discussed, but the point of the discussion was decidedly not to criticize work that has been done. Rather, the discussion, and the board's concern, was focused on the risk in attempting to derive from TIMSS insights it cannot empirically support. Questions that are of particular concern to many—about successful practice, the impact of the National Council of Teachers of Mathematics standards and other reforms, the efficacy of different models of teacher induction and support—may need to await further research. RECOMMENDATION: A CLEARINGHOUSE TO PROVIDE RESOURCES AND SUPPORT FOR SCHOLARS IS NEEDED The board has been following the release of the data thus far very closely. Many of the TIMSS researchers have assisted the board in its efforts to keep track of the data that have been collected and to understand the structures of the various datasets. The board greatly appreciates this assistance and has taken note of the efforts that have been made by the TIMSS Study Center staff, the National Center for Education Statistics, and others to release the data as quickly as possible and to make datatapes, users’ guides, and the like available on the web and elsewhere; to hold training sessions; and to field questions. The efforts that have been made by all involved to disseminate the data quickly are also appreciated. Nevertheless, because TIMSS is so large and complex, the board believes there is a need for an infrastructure that will facilitate access to the data. Although there

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis are a variety of steps that could be of help to scholars, BICSE believes that the funding of a clearinghouse for TIMSS information would be the simplest and most effective one. A number of workshop participants described specific reasons why it may be difficult for independent scholars to use the TIMSS data. First, it is clear that even for those who have paid close attention to TIMSS, it has been difficult to follow the relationships among the various portions of data that have been collected and made available; currently there is no one source of comprehensive, detailed information about this. The curriculum study, for example, includes data for varying numbers of countries, some that cover kindergarten through the end of secondary school, some that are specific to the age populations targeted by TIMSS, and so forth. Workshop discussions about possibilities for linking one part of the study to another made clear that it is not efficient for independent scholars to, for example, spend time tracking down correspondence among different sets of data; this is work that, once done, should be made easily available to anyone considering possibilities for secondary analysis. It is also clear that both relatively straightforward issues of translation, as well as more potentially significant questions about cultural context—the precise meaning of the words in different languages for “homework,” for example—could be a hindrance to many scholars. A clearinghouse for information accumulated about such issues as interpretation of particular vocabulary used in the translation of a questionnaire item or detailed information about educational structures in TIMSS countries would be of great value. The opportunities for networking and taking advantage of what is already known would be not only a convenience for many researchers, but also a means of amplifying the value of each contributing piece of research. A further obstacle for many researchers is the extremely complex design, particularly of the achievement study. Jane Hannaway, a workshop participant, voiced a concern echoed by many at the workshop when she said that “the set-up costs of getting up and running with TIMSS are exceedingly high,” and recommended that ways be found to “lower the entry barriers.” Many capable researchers would need guidance in traversing a dataset of this kind—a source for answers to specific questions about the BIB (balanced incomplete block)-spiraled design, conditioned and unconditioned variables, and the like. A variety of other questions that were aired further illustrate the need for a centralized source of information. Each nation, for example, had the option of excluding particular items from its questionnaires or including additional ones for its own purposes. Moreover, as has been noted, secondary research is already under way in a number of TIMSS countries. Clearly independent scholars would benefit a great deal from access to up-to-date information on TIMSS-related data as it becomes available; possibilities for collaboration would increase, and many other kinds of information could be shared.

OCR for page 11
Next Steps for TIMSS: Directions for Secondary Analysis The provision of technical support for individual scholars, established procedures for dealing with particular complexities in the data, and facilitated coordination and communication among individual scholars and scholarly communities about methodological as well as conceptual issues (by means of a website for example) would be of significant value to the research community. RECOMMENDATION: INDEPENDENT SCHOLARS OR TEAMS OF SCHOLARS SHOULD RECEIVE FUNDING TO CONDUCT SECONDARY ANALYSIS. As this report has attempted to make clear, the board would like to see a broad array of scholars from a variety of disciplines use the TIMSS data in a variety of ways. Much of the funding for such work will need to come from the institutions that have already funded much of TIMSS and from others with significant resources for education research. Such funding might go to teams of investigators, individuals, or both, but it is important to note that the establishment of a few dedicated teams of investigators would by no means preclude the conduct of a variety of field-initiated studies or the ordinary process of peer review. In recommending that funders consider their responsibility to ensure that the work of TIMSS is completed, the board does not mean to suggest that other work, funded through other means, is any less important. The board's primary concern is that without funding, much of the secondary analysis that is a crucial part of a large-scale study such as TIMSS will not be done.