Introduction
OVERVIEW
International comparisons have become a key element in the public discussion of how the U.S. educational system is functioning and what needs to be done to reform it. Curiosity about such comparisons may have begun with the urge to match or exceed the performance of our economic rivals, but it has extended to a much broader range of issues. The importance of cross-national comparisons of virtually all aspects of the delivery of education—curriculum, resources, governance and structure, and teacher development, to name a few—is now widely recognized. The Third International Mathematics and Science Study (TIMSS), one in a series of large-scale international assessments that have been conducted by the International Association for the Evaluation of Educational Achievement (IEA), has played an important role in illustrating the potential value of such comparisons.
TIMSS was a bold step forward not only for the relatively small community of scholars who have been particularly interested in international comparisons of education, but also for the much broader community of scholars, policy makers, teachers, and others concerned with education generally. The study was large—nearly 50 countries and more than half a million students participated—and complex. The list of U.S. publications devoted just to reporting the initial results stands at fourteen and counting.1 But the study's boldness lay not just in its scale—its mission was extremely ambitious as well. Those who planned and executed the study intended not only to overcome the problems with sampling and other technical issues that plagued previous international studies, but also to collect a range of data sufficient to yield powerful insights into the contexts in which learning takes place around the world, as well as to identify compelling links between specific factors and high achievement (see Bracey, 1996; Rotberg, 1990).
The combination of generally recognized high technical quality and ambitious goals has brought the TIMSS results considerable attention in the United States at a time when education issues have been a major political focal point.2 TIMSS findings have been made widely available through the production of a “Resource Kit” for districts and administrators, Internet discussion links and websites, and
1 |
Appendix A is a “road map” to TIMSS that provides a brief description of each of its components, the data available, and sources for further information, as well as titles of the TIMSS reports published to date. |
2 |
Questions that have been raised about the technical quality of TIMSS are addressed later in this document. |
a variety of other means.3 TIMSS findings have been cited in political discourse from the presidential level to the school-board level. The desire for data could be used to influence policy decisions was clearly a prime reason the United States chose to participate in the study and to commit funds to cover not only the costs of collecting U.S. data, but also a significant portion of the costs of collecting much of the international data, and the costs of the three-country (United States, Japan, and Germany) qualitative studies that helped put the achievement findings in context. However, the very richness and complexity of the study has been a source of dissonance between the research and policy communities. Although policy makers are eager for information they can readily use and apply in planning and decision making, many of those who know TIMSS best are leery of the oversimplification and misinterpretation that can creep in when quick answers are sought.
As with any study of its kind, the value of TIMSS will lie not just in the official reports of the data collected, but also in the efforts of the many researchers—not part of the original TIMSS team—who are expected to conduct secondary analyses, using the initial data as the basis for further explorations of specific questions. TIMSS was deliberately designed to make use of a variety of methodologies so that questions about the factors that influence young people's learning could be examined using a variety of types of data. Figure 1 illustrates the basic structure of TIMSS. Although the conceptual framework on which the study was based called for the various components of the study—the studies of achievement and curriculum and the video and case studies—to be linked together, there is little direct precedent for doing so on the scale of TIMSS. Moreover, exploring such links was not the principal focus of the researchers who produced the first wave of reports from TIMSS, since it was necessary to complete each of the parts before thoughtful links among them could be attempted. However, the members of the Board on International Comparative Studies in Education (BICSE) have perceived, in public discussion of the published reports from TIMSS, a growing sense that the study is complete and that the only work left to be done is translation of the results already released into guidelines for practice and policy. What has been largely ignored in this discussion is the important challenge of establishing links among the study's components. The board believes strongly that a great deal more work needs to be done with TIMSS, and that without further effort, extremely valuable opportunities to learn from an intensive, and expensive, data collection effort are in danger of being lost. The board is not concerned only about the scholarly community's pursuit of intellectual issues arising from TIMSS, though those are important of more immediate concern for the board are two issues: first, that much-needed analyses have not
3 |
See Appendix A for information about the Resource Kit, websites, and other TIMSS resources. |
been done, and second, that policy makers, practitioners, and others are drawing conclusions from analyses that may not be supported by further exploration of the data. Such conclusions, the board fears, are influencing both practical and policy decisions being made every day.
To stimulate discussion about the questions and issues arising from TIMSS that merit further exploration in the dataset, the board convened a workshop. Thirteen scholars, representing a range of interests, expertise, and seniority, as well as degrees of direct experience with the TIMSS data ranging from none to extensive, were asked to respond in written memoranda to a set of specific questions. These questions, which are discussed in detail below, were designed to assist the board in addressing specific concerns it had identified about establishing priorities for future research, identifying the kinds of knowledge claims best supported by TIMSS data, standards for the kinds of support such claims should have, and ways of combining different types of data. A number of other interested observers participated as well; all were given an opportunity in advance to review TIMSS documents as well as the memoranda from the scholars asked to respond to the questions BICSE had formulated. The workshop was then structured around a series of focused discussions, some with subsets of the group and some that included the entire group. These discussions were specifically designed to promote interdisciplinary thinking because of the unique breadth of TIMSS's design.
The board then deliberated on the results of the workshop discussions, tying them to the extensive discussions of TIMSS that the board itself has had during the past decade, and agreed on a set of recommendations that it intends as a useful guide and a spur to action for those eager to mine the dataset. These recommendations are the primary focus of this document, but they are not intended only to provide guidance to potential researchers and funders. Many groups—politicians, parents, math and science educators, and others—have paid attention to TIMSS and hope to use it to help students learn. BICSE is convinced that for TIMSS data to benefit students in this way, those who are using it need to understand both its complexity and its limitations.
ORIGINS: THE CONCEPTUAL BASIS FOR TIMSS AND THE BOARD'S ROLE
BICSE's particular interest in TIMSS grew out of an overall concern with the contribution that international comparative studies can make both to the intellectual study of education and to the discussions of policy choices that occur at all levels of the U.S. education system. The board framed its views of the value of international comparative studies, and of ways in which they could be improved, in a report published in 1993, A Collaborative Agenda for Improving International Comparative Studies in Education (National Research Council,
1993). The board returned to some of the issues identified in that report as it began to reflect on the contribution TIMSS has made and can make to education research and policy. One issue highlighted in the 1993 report has played a particularly significant role in TIMSS: the importance of considering the contexts in which learning takes place whenever comparisons are made. The board wrote:
… comparisons of achievement levels are not meaningful unless one can, first, identify the educational inputs and processes that contribute to observed outcome differences between countries; second, make some estimate of the contribution of each educational input to realized outcome levels, and, third, consider how these effects vary by context. (National Research Council, 1993:12)
The board also outlined reasons why both qualitative studies and large-scale surveys are needed to provide complete and useful comparisons among countries. Qualitative studies such as ethnographies, case studies, and others, the board noted, “allow us to understand what it means to be educated in diverse settings around the world.” Such studies can provide the deep understanding of societies that makes it possible to interpret and explain the results obtained from large-scale studies. Large-scale achievement studies, in turn, provide “the only way to obtain a simple numerical comparison of a large number of countries on a common set of measures.” In the case of TIMSS, the size of the study also provides the possibility of a variety of comparisons on a scale far beyond what an independent researcher could hope to address (National Research Council, 1993:22–23).
TIMSS was, of course, designed to produce, on a previously untried scale, the rich comparisons that a combination of qualitative and quantitative studies could generate. The study design was based on a conceptual model that was also used in the Second International Mathematics Study (SIMS), the first IEA study to explicitly address the contexts in which learning takes place. That model was described this way:
The study was…conceptualized as an examination of mathematics curricula at three levels: the intended curriculum as transmitted by national or system level authorities; the implemented curriculum as interpreted and translated by teachers according to their experience and beliefs for particular classes; and the attained curriculum, that part of the intended curriculum learned by students which is manifested in their achievement and attitudes…. The curriculum at each of these levels is influenced by the context in which it occurs and the contexts themselves are determined by a number of antecedent conditions and factors.
Four general research questions, grounded in this model, were developed for TIMSS, and each component of the study was devised in response to one or more of these questions (see Robitaille and Garden, 1996:37–43):
-
How do countries vary in the intended learning goals for mathematics and science; and what characteristics of educational systems, schools, and students influence the development of these goals?
-
What opportunities are provided for students to learn mathematics and science; how do instructional practices in mathematics and science vary among nations; and what factors influence these variations?
-
What mathematics and science concepts, processes, and attitudes have students learned; and what factors are linked to students’ opportunity to learn?
-
How are the intended, the implemented, and the attained curriculum related with respect to the contexts of education, the arrangements for teaching and learning, and the outcomes of the educational process? (Robitaille and Garden, 1996:37–43)
The study addressed three separate populations, students at age 9 (Population 1); students at age 13 (Population 2); and students, regardless of age, in their last year of secondary school (Population 3). TIMSS was designed to gather data about what mathematics and science the students had learned and augment it with information about their schools, teachers, textbooks, and curricula. It included a test of student achievement containing both multiple-choice and open-ended items, as well as performance items for some students; a set of questionnaires directed at students, teachers, and school administrators; a curriculum study that provides a comparative picture of standards and curricula in participating countries; a set of case studies of schooling in Germany, Japan, and the United States; and a videotape study of mathematics instruction for middle-school students in the same three countries.4 It is important to note that different countries participated in different subsets of the study components—a point that is easily lost in discussions of the relationships among the components. Even this brief description of the study illustrates the complexity that resulted from the attempt to produce the context-embedded comparisons called for by the framework.
It was not the study designers’ intention that these components of TIMSS stand alone, but rather that their results be used, and combined, to produce deep understanding of the factors that most influence learning. Although the intention that the contextual data should be used in this way is clear in the conceptual framework, the extent to which precise empirical links among particular components of the study were expected, and the means by which they could be established, were not articulated. The process by which questions about the relationships among the components can be pursued has not been completed—and may never be completed without conscious attention and support from both the research and funding communities.
At this phase of the study's progress, the board wanted to focus its
4 |
See Appendix A for a more detailed description of the study components. |
workshop on both learning from what has been accomplished and understanding what is involved in completing the comparisons the study makes possible. With these goals in mind, board members, led by Lynn Paine and Francisco Ramirez, planned a workshop to explore ways in which further analyses could best build on the work that has already been done. Their overarching goal was to find ways to identify research directions that would meet the dual criteria of being both approachable through the TIMSS data and of real importance to the education policy and practice communities.
To stimulate the desired conversations, it was necessary to include a wide variety of perspectives not only on TIMSS itself, but also on the standards by which evidence ought to be judged. Three sets of questions—see boxes on pages 8, 9, and 10—were used to focus a diverse group of scholars’ attention on three fundamental elements of the context for learning addressed by the TIMSS conceptual framework: the effects of curriculum on achievement; the links among professional development, teaching, and achievement; and the factors that influence individual achievement. Brief memoranda prepared in advance about these sets of questions served as the starting point for the workshop. The goal for both the memoranda and the workshop itself was to stimulate ideas rather than to present carefully crafted arguments. The discussion ranged widely—from a number of focused discussions of possible future analyses to broad consideration of goals for TIMSS. Questions arose about the infrastructure that will be required to meet those goals, about some of the study's contributions and disappointments, and about the perspectives different disciplines bring to bear on the study as a whole and on its components.
What Does TIMSS Tell Us About Curriculum and Its Effects on Achievement? Conclusions drawn from the curriculum component of TIMSS have perhaps received more sustained public attention than any other aspect of the study. The particular claim that has received the most attention—that cross-national variation in academic achievement is influenced by and accounted for by cross-national variation in the content of science and mathematics curricular frameworks—is an important one. TIMSS researchers have highlighted several different dimensions along which they found cross-national variation:
How might the claim that these factors account for variation in performance be assessed using the full range of data TIMSS has made available? |
What Does TIMSS Tell Us About the Links Among Professional Development, Teaching, and Student Achievement? No single substudy answers this question directly, yet virtually every component of TIMSS provides data that can help us explore this line of inquiry. In the teacher questionnaires, case studies, and videotapes of teaching, we get glimpses of teachers' preservice preparation, their professional development opportunities, instructional choices they make, and features of their classroom practice. How do we move across these various types of data? Which aspects of these questions is TIMSS well equipped to answer, and which are basically beyond the reach of TIMSS? More specifically,
|
What Does TIMSS Tell Us About the Factors That Relate to or Influence Individual Achievement? Obviously TIMSS was designed with the explicit goal of identifying factors that relate to achievement, and virtually every component of TIMSS provides data that can help us explore this line of inquiry. What is less clear is how to move across the various types of data. Decisions about which research questions to pursue should be based on a thoughtful balance between the importance of the possible answers and the relative power of the TIMSS data to shed light on the questions. Which aspects of these questions is TIMSS well equipped to answer, and which are basically beyond the reach of TIMSS? More specifically, how can the study help with these questions:
|