Conclusion



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement Conclusion

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement This page in the original is blank.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement 11 Large-Scale, Cross-National Surveys of Educational Achievement: Promises, Pitfalls, and Possibilities Brian Rowan* Large-scale, cross-national surveys of schooling and student achievement have been part of the education landscape in the United States for nearly 30 years. The origins of such work can be traced to the twelve-country First International Mathematics Study conducted by the International Association for the Evaluation of Educational Achievement (IEA) in 1964. Since then, at least one, and sometimes several such surveys of student achievement have been conducted each decade. The United States already has participated in six cross-national surveys of mathematics and science achievement, four surveys of reading/literacy/language achievement, two surveys of civics education, and several surveys of student achievement in other domains (for a list of studies, see Table 4-1 of Chromy [this volume]). What is striking about this corpus of work, besides its growing size, is that cross-national surveys of achievement have been fielded at a more rapid pace each decade since the 1960s. Only one such survey was fielded in the 1960s, and it covered only mathematics achievement. In the 1970s, there was again just a single study, but this one covered six academic subjects. Beginning in the 1980s, however, the pace accelerated. There were two surveys of math and science achievement in the 1980s, and another two in the 1990s. Moreover, the 1990s saw a reading survey, an *   Brian Rowan is a professor of education and study director for the Study of Instructional Improvement in the School of Education at the University of Michigan.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement adult literacy survey, an early childhood survey, a language education survey, and a civics education study. In this chapter, I discuss these large-scale, cross-national surveys of student achievement. I confine my discussion of this work to a small set of questions about the methodology used in the studies, questions raised by the Board on International Comparative Studies in Education (BICSE) and addressed by the chapters in this volume. In particular, this chapter considers the following three questions: Looking at the history of large-scale, cross-national surveys of student achievement, what progress has been made in conducting studies that are increasingly valid and increasingly informative? What opportunities lie ahead for improving the quality of such studies, both methodologically and in terms of information yield? How important is it to have international surveys of student achievement on a regular basis and with participation of a constant set of countries? In the following pages, I do not address these questions directly. Rather, my approach is to answer these questions in the context of a larger discussion about the purposes of cross-national studies of education, particularly studies focused on issues of student achievement. Clearly, one cannot think wisely about the validity of large-scale, cross-national education surveys, or about the methods they use, the information they yield, or how regularly they should be conducted, without also thinking about the goals such studies are intended to achieve. The problem, of course, is that the large-scale, cross-national surveys discussed in this volume have been complex studies, designed to achieve multiple purposes and to inform multiple audiences of researchers, policy makers, and citizens from participating countries around the world. In this light, an evaluation of such studies, and a discussion about how they might be improved, requires us to think carefully about the goals we want such studies to achieve. An excellent discussion of these goals can be found in a 1993 monograph published by the National Research Council (1993). This brief monograph advances the view that the world’s varied education systems provide a kind of “natural laboratory” that allows interested parties in the United States to look at variations in schooling cross-nationally, to connect variations in educational organization and practice to variations in student achievement, and to use these analyses to think about how to improve the U.S. education system. In this view, data from cross-national surveys of student achievement can be analyzed in two important ways to inform instructional improvement in the United States. First, data on

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement student achievement in the United States can be compared to data on student achievement in other countries, and such comparisons can be used in a “benchmarking” process that sets standards for student achievement in the United States. Second, data from cross-national surveys can be used to investigate how cross-national variations in school and classroom characteristics affect variations in student achievement in the hopes that this sort of analysis will tell us something about how to alter patterns of schooling and improve student achievement in this country. In this chapter, I treat these two fundamental goals as the context for a discussion of the promises, pitfalls, and possibilities of cross-national surveys of student achievement. I turn first to the problem of cross-national comparisons of student achievement and to the use of such comparisons as benchmarks for student achievement in the United States. In discussing these issues, I pay special attention to the chapters by: Linn, Chromy, Hambleton, Raudenbush and Kim, and Smith in this volume, each of which takes up methodological problems relevant to the benchmarking issue. I then turn to the use of data from cross-national surveys to estimate how changes in schooling practices can improve student achievement in the United States. Here I pay special attention to the chapters prepared for this volume by: Bempechat, Jimenez, and Boulay; Buchmann; Floden; LeTendre; Raudenbush and Kim; and Smith. Only after having looked at these issues do I directly address the questions posed for me by BICSE, and at that, only in the limited way permitted by page constraints. INTERNATIONAL COMPARISONS AS BENCHMARKS As Smith (this volume) shows, large-scale, cross-national surveys of student achievement have figured centrally in debates about educational standards in the United States since the 1980s, when findings about the performance of U.S. students on the first international surveys of student achievement were called to the attention of the American public in A Nation at Risk (National Commission on Excellence in Education, 1983). Since that time, the cross-national studies have evolved into a kind of decennial “cognitive Olympics” in the United States, as Husen (1987, p. 131), one of the most thoughtful advocates of cross-national research in education, feared they might. In this environment, each new release of international data is given widespread attention, not only by researchers and policy makers, but also—as a result of widespread press coverage— by the public at large. Indeed, the international comparisons have become one of the few grand spectacles in American education, surpassing even the release of data from National Assessment of Educational Progress in terms of pure drama in coverage.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement All of this has been controversial, especially in the eyes of those who think reports based on international comparisons have unfairly portrayed the performance of U.S. schools (Berliner & Biddle, 1995; Bracey, 1997). As we shall see, there is room for improvement in the ways that international comparisons of student achievement are reported to audiences in the United States. But a case can be made that even the poorly crafted, early reports on international studies of educational achievement performed an important service in American education. Since A Nation at Risk, the cross-national surveys of student achievement have served to dramatize issues of educational performance in the United States, mobilizing friend and foe of the system alike to articulate their aspirations for our education system and helping to launch what has become an important, and continuing, public debate about education standards in this country. A discussion of the consequences of this debate, and the ensuing focus on standards in American education, is beyond the scope of this chapter. Suffice it to say for now, however, that there are two views on this issue. On the positive side, many observers believe the international comparisons (and other attempts to dramatize student achievement in the United States) are leading to the development of much more ambitious and appropriate standards for student learning in American schools. But other observers see a dark side to this development, especially the increased use of standardized test results to dramatize problems of student achievement in the United States. Increasingly, critics are arguing that standardized tests have become the nearly exclusive “coin of the realm” in judging the adequacy of America’s schools and that the heavy reliance on such tests as tools of education accountability is leading to an unnecessary and harmful narrowing of instructional goals and processes in American schools. It is for this reason, then, that the role of the international comparisons in setting “benchmarks” for student achievement requires special scrutiny, for good benchmarks must not only portray American students’ achievement fairly in comparison to students in other nations, but also must assess academic goals that we truly want to hold for our students. I will raise two interrelated sets of questions about these issues. One set of questions concerns whether the international studies of educational achievement have been designed and managed to produce fair comparisons of student achievement across nations. In examining this problem, I will discuss the extent to which the various tests used in cross-national comparisons are aligned with curricular content emphasized in the nations participating in the comparisons and the extent to which the samples used in making comparisons provide the kind of level playing field required for a fair benchmarking process. Here, I will argue that steady progress is being made despite difficult challenges.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement A second set of questions concerns the validity of international comparisons as benchmarks of student achievement in the United States. Here I will inquire more deeply into the curricular content of the achievement tests used in the international studies, discuss the wisdom of comparing achievement at the various age levels sampled in international comparisons, and quibble with the ways the results of these comparisons have been reported and interpreted, not only in the popular press, but also among responsible researchers. My point in this section will be that a variety of issues need attention before international comparisons can be used as clear and unambiguous benchmarks for educational achievement in the United States. Fielding the Cross-National Studies Three chapters in this volume discuss the difficulties associated with fielding “fair” cross-national surveys of student achievement and the progress that has been made in this area since the earliest surveys were mounted. Linn’s chapter discusses the problems associated with developing achievement tests for the surveys. Chromy discusses problems associated with selecting and realizing samples of students. Hambleton discusses the translation or adaptation of research instruments for use in multiple countries. Overall, each of these chapters notes particular methodological problems faced by the researchers conducting cross-national surveys, but each also communicates a sense that significant progress has been made in addressing these problems. Consider Linn’s chapter on the achievement tests used in cross-national surveys. It is apparent from his discussion that there are difficult problems related to constructing achievement tests for cross-national comparisons, in large part because of differences that exist in national curricula across participating nations. But Linn’s chapter also shows the increasing care that researchers have given to the task of aligning tests to national curricula in successive cross-national studies. In the latest studies, for example, achievement tests were constructed only after extensive examination of national curricula and detailed consultation with curriculum experts from participating countries. From this perspective, it appears that sound efforts have been made to ensure “fairness” in testing by allowing analysts from particular nations not only to have detailed knowledge about the alignment of all test items to national curricular goals, but also by allowing analysts from different nations to analyze achievement results using only items that meet an alignment standard for their own country. Linn also details important progress in constructing and scaling achievement tests over time. The latest studies use elaborate content ma-

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement trices to choose test content, and they use complex matrix sampling designs and item response theory to allow researchers to place all respondents on the same achievement scale (and subscales) even though a given respondent has answered only a subset of test items. Developments in scaling, in particular, allow researchers to conduct much more sophisticated analyses of the achievement test data from cross-national surveys. In the latest studies, for example, researchers can examine variations in student achievement within nations more completely and in a much more fine-grained way than previously possible. All of this further enhances the fairness of cross-national comparisons by allowing researchers to examine differences in student achievement among groups of students and/ or across curricular domains that are more or less aligned to national standards. In combination with Linn’s chapter, Hambleton’s chapter shows that progress also has been made in adapting achievement tests and other data collection protocols for use in the many different nations involved in the cross-national studies. Hambleton, for example, lists the various steps now being used by responsible testing agencies to develop tests for use in cross-national settings, where language and culture are important considerations in test construction. Linn describes the care taken in the most recent cross-national surveys to pretest items and examine item parameters in different national populations. All of this should lead the consumer and user of cross-national data to the conclusion that—despite enormous difficulties—careful instrument development procedures can be (and are being) used to improve the validity and appropriateness of cross-national survey data. Finally, Chromy’s chapter discusses issues of sampling in the cross-national surveys of student achievement, tracing the various sampling designs used in different studies and the strategies used to ensure that these sampling designs are realized in different nations. His chapter describes a process in which sampling procedures of increasing rigor were developed through time, not only through more careful delineation of sampling plans, but also through more careful development of procedural manuals, reporting forms, and other approaches that enhance the comparability of data across participating nations and that allow analysts to take into account deviations from the uniform sampling procedures when these occurred. In fact, the development of these procedures, and careful monitoring of sample realization, is what now allows analysts (like Smith, this volume) to be able to observe (and take into account) the ways in which deviations from the standard sampling plan affect inferences about national differences in student achievement. The discussion to this point, then, suggests that much progress has been made in developing sound procedures for fielding cross-national

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement surveys of student achievement. This is no mean feat, because the problems here are formidable. The most difficult part of any research effort is the sheer mounting of the data collection effort, and all the more so when that effort is large and complex. Such problems have been especially pressing in large-scale, cross-national surveys because many of the participating countries have lacked the required “technical” infrastructure to mount complex survey research efforts prior to their participation in the surveys. In fact, the maintenance of the research infrastructure required to mount complex surveys in nations that historically have lacked such capacity is one reason to consider mounting large-scale, cross-national surveys on a frequent basis, for delaying successive waves of research in some nations runs the risk of allowing investments in the research infrastructure to be eroded.1 Reporting and Interpreting the Results The progress just reported addresses some of the criticisms made about early cross-national comparisons of student achievement, especially complaints that the U.S. didn’t face a “level” playing field in such comparisons. But a number of problems remain to be addressed before we can conclude that international comparisons of student achievement provide us with truly useful benchmarks for student achievement in the United States. In this section, for example, I discuss how the results of cross-national studies can be analyzed to better inform the overall debate about educational standards in this country, and I point to possibilities for future studies that might provide even more useful information than we now gain from the cross-national surveys. Overall, my message is that deeper and more probing analyses of the cross-national data are needed if they are to be used in a truly informative debate about education standards in this country. Issues Related to Achievement Tests One problem I want to address is the extent to which the achievement tests used in cross-national surveys supply the kinds of “benchmarks” for student achievement that we want in the United States. It is well known that the average test scores for American students in cross-national surveys rarely lie at the top of the cross-national performance distribution and that our students frequently perform more in the middle of the pack (or below), depending on the test. What we do not know from most published reports of these comparisons, however, is the kinds of academic performances being measured on cross-national achievement tests or the extent to which these tests reflect our desired standards for student learn-

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement ing. In fact, Linn’s chapter (this volume) presents some fascinating insights into test content and format that call into question the extent to which the tests used in cross-national surveys provide the most useful benchmarks for student learning in American schools. Even the most casual observer probably knows that current discussions of academic standards in American education increasingly present an ambitious set of goals for what we want students to know and be able to do at different points in their education careers. The emergence of these ambitious standards, however, has been only partly driven by the results of international surveys of student achievement. Equally important to this development has been a sea change in how instructional psychologists and psychometricians in the United States think about school learning. Increasingly, American educators are becoming concerned not simply with the extent to which items on achievement tests adequately sample various content domains in the school curriculum, but also with the level of “cognitive demand” of test items and the types of performance these items are designed to elicit from students. Linn’s discussion (this volume) reflects this interest, and he therefore spends considerable time discussing not only how cross-national surveys arrive at their tables of curricular content, but also how items are constructed to reflect more ambitious levels of “cognitive demand” and more authentic forms of academic performance. In this regard, Linn warns us that despite much progress, the achievement tests used in the most recent cross-national studies continue to include a preponderance of multiple-choice items that have a fairly low level of cognitive demand (e.g., knowledge of simple facts and procedures as opposed to the application of knowledge in nonroutine problem-solving situations). Still, Linn does note that newer items increasingly are being included in cross-national achievement tests—particularly “constructed response” items that present a higher level of cognitive demand and a more “authentic” demonstration of what students know and are able to do. Nevertheless, as Linn points out, the use of newer item formats is inherently limited by restrictions on testing time in cross-national surveys and by the need to increase the sheer number of test items in particular content domains to enhance test reliability. This inherent tradeoff, Linn argues, explains why cross-national achievement tests still have a preponderance of conventional, multiple-choice test items pitched at lower levels of cognitive demand. If Linn’s comments about item formats and cognitive demand suggest that the achievement tests currently used in cross-national surveys don’t fully reflect more ambitious views of academic standards for students, his comments about the curricular content included in such tests is even more eye opening. Earlier, I discussed the problems faced by test developers seeking to match the content of cross-national achievement

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement tests to varying national curricula around the world. In discussing this problem, Linn notes two potential test construction strategies that can be used to build “fair” achievement tests in cross-national settings. One strategy is to include test items representing the union of curriculum objectives in all national curricula; an alternative is to include only items occurring at the intersection of national curricula. In point of fact, however, even the most current cross-national achievement tests are not based on either approach. Instead, for reasons having to do with the greater capacity of U.S. agencies to provide test items, and because of restrictions on test length, Linn reports that most of the achievement tests used in cross-national surveys have a distinctly American content and item-format bias. In light of these arguments, it is worth revisiting the problem of how to use the cross-national surveys to set standards for American education. One thing should be clear from the discussion thus far. If we seek information about how American students are performing in relationship to ambitious standards for academic content and cognitive demand, the achievement tests used in most cross-national surveys don’t provide the appropriate information. Instead, such tests continue to reflect American curriculum content as it now stands, and they continue to contain items that reflect a lower level of cognitive demand. From this perspective, comparisons of average student performance in the United States to average student performance in other nations are not—in and of themselves— an especially good yardstick for judging progress toward our most ambitious vision of educational standards. Instead, an appropriate international benchmarking process would more thoroughly investigate national curricula outside the United States and develop more challenging achievement tests.2 From this perspective, the goal of bringing the average performance of American students on current tests up to the national averages found in “higher performing” countries serves only as a useful starting point in achieving higher educational standards in American schools, for even if we achieved this goal, we would still not know whether we had met our most ambitious goals for student learning. Issues Related to Reporting Test Scores Despite these caveats, many analysts and observers continue to treat cross-national comparisons of student performance as a reasonable standard for judging the performance of our education system. Moreover, many scholars (including myself) would argue that such comparisons, although limited, provide useful insights into how our education system functions, especially in comparison to others. But as the chapters in this volume suggest, there are ways in which these comparisons can be made even more informative.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement that education in all societies works in roughly equivalent ways, leading to a second assumption—that practices imported from other countries will work in the United States in ways that are equivalent to how they worked in other countries. The evidence from cross-national surveys, however, suggests that this will not necessarily be the case. Consider, once again, the findings of Schmidt and colleagues regarding curriculum effects on student achievement. The data presented in Schmidt et al. (1999) strongly suggest the presence of interactions, demonstrating the need for caution about these simplistic assumptions. In light of this, there is a real need for policy analysts and researchers to think more explicitly about the assumptions they are making regarding comparative research. As Tilly (1984) shows in his short and insightful monograph on comparative cross-national research, we can make several assumptions when developing societal comparisons. One might be that all societies are unique and cannot easily be compared, implying that processes occurring in one society might not easily (or ever) be duplicated in others (this view is close to the one developed by Bempechat et al., this volume). Another assumption might be that there are certain “types” of societies, and that processes occurring within groups of similar societies can be duplicated, but relationships occurring in societies classified as being in one group cannot be duplicated in societies classified as being in other groups. This approach places a premium on measuring societal characteristics and on investigating how societal characteristics condition relationships among variables at constituent system levels. Yet another assumption would hold that national societies are embedded within a larger “world system” of societies (a system in which national societies increasingly are engaged in social relationships with and influenced by one another). In this view, processes occurring within societies often depend less on unique circumstances within societies than on a given society’s location in a worldwide system of international relationships, where national societies hold unequal statuses in a dense network of international relationships and participate in an increasingly uniform, worldwide culture. All three of these perspectives have figured centrally in cross-national research on education. The work of Heyneman and Loxley cited earlier, for example, is an instance of research that examines “types” of societies and that cautions against generalizing about educational processes across nations at different levels of economic development. Another example is the interesting work of Stevenson and Baker (1991) on the effects of educational governance regimes on consistency of content coverage in schools. In contrast, the TIMSS video studies, and the qualitative case studies recently included as companions to cross-national surveys (as discussed by LeTendre, this volume), are consistent with a more holistic

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement form of cross-national analysis, in which societies are seen as relatively unique, and educational practices are seen as deeply embedded in national culture and therefore not easily transported across national boundaries. The work of Meyer, Ramirez, and colleagues exemplifies a third approach to comparison, one derived from a “world-systems” viewpoint on education, where educational developments within countries are seen as resulting not so much from internal social and cultural circumstances, but rather from a given society’s position in a global cultural and social system (Meyer, Kamens, & Benavot, 1992; Ramirez & Boli, 1987). The larger point is that judgments about the “validity” of data and findings from cross-national surveys depend to some extent on the assumptions one makes about appropriate forms of cross-societal comparisons. For example, to the extent that we believe there are “types” of societies, a key concern becomes the types of societies to include in the research, and how these societies differ on system-level properties—for example, governance regimes, economic development, ethnic homogeneity, school system types, and so on. In this view, the validity of cross-national studies, and the degree to which the results are informative, depends crucially on whether the types of societies one needs to compare in testing one’s theory of societal processes are present in sufficient numbers in the sample to perform such a test and whether sufficient measures of societal properties have been developed for use in comparing system-level properties. In fact, attention to theory-driven thinking at this level of analysis, as well as discussions of how to measure societal-level properties critical to this research agenda, seem oddly lacking in this volume. As a result, readers of this volume would do well to revisit the arguments presented by BICSE (National Research Council, 1993, pp. 20-21), which explicitly attended to this issue. More prevalent in this volume, but only barely so, is the attention paid to issues of research design and reporting arising from an assumption that national societies are unique and need to be understood on their own terms. This assumption has fostered the demand for qualitative case study research in cross-national comparisons of educational systems. As LeTendre discusses in this volume, well-conducted case studies can contribute in important ways to cross-national surveys by capturing the unique, culturally embedded nature of educational practices in nations. But LeTendre’s discussion also shows that a great deal of ambiguity remains within the research community about how to use the information derived from case studies in relation to surveys, as well as the extent to which insights from case studies should drive issues of survey design, and how conclusions from case studies can be reported so that various members of the research and policy communities find them “valid.” In fact, the simple contrast between LeTendre’s discussion of the uncertain-

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement ties and misunderstandings surrounding the use of qualitative data in the latest TIMSS work and BICSE’s elegant statement of the role such work can play in cross-national survey research found in the National Research Council report (1993, pp. 22-23) is striking and shows that we have a long way to go before the use of qualitative research will be optimized in cross-national surveys of student achievement. Thus, the papers in this volume suggest that we still need to make progress in articulating the theories of comparison we think should guide cross-national surveys of achievement. We might, for example, need to go beyond the simple assumption that all societies work in the same way, and in doing so, also develop a more realistic set of assumptions about how the findings from cross-national research can be applied to problems of school improvement. In this regard, it is interesting to note that the practitioner community in American education seems to be doing just this, carefully recreating practices imported from other nations and testing them in their own educational settings.9 But this real progress in applying cross-national findings to problems of educational improvement is not much reflected in the current volume, except perhaps in Raudenbush and Kim’s advice that hypotheses derived from cross-national comparisons should be carefully tested within the United States and in Smith’s cautions about making inferences from cross-national studies to guide education policy. One hopes, therefore, that BICSE will pay more attention to this problem in its future discussions of the validity of large-scale cross-national research and articulate more clearly how cross-national findings can be used to stimulate school improvement in the United States. CONCLUSION Having considered the purposes that various constituencies hold for cross-national surveys of educational achievement, and some of the problems associated with achieving these purposes, I will now address directly the questions posed at the outset of this chapter. The first question is: • Looking at the history of large-scale, cross-national surveys of student achievement, what progress has been made in conducting studies that are increasingly valid and increasingly informative? The answer to this question, as I suggested at the outset, depends on the purposes one hopes to achieve through such studies. If the purpose is to use cross-national surveys to set standards for achievement in U.S. education, I would argue that a great deal of progress has been made in designing studies that are increasingly informative. Advances in test con-

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement struction and scaling, coupled with better standardization of sampling procedures, have given us achievement tests that have better content validity with respect to national curricula and samples that are more standardized across nations. All of this helps produce comparisons that are fair. Moreover, because of these developments, we can now take better account of within-country variation in achievement, place confidence intervals around measures of central tendency, and do better analyses of subgroup performance, all at a more fine-grained level of curricular detail than in previous studies. These technical advances are welcome and enhance the utility of cross-national comparisons. However, they do not guarantee that the tests of achievement used in the cross-national surveys are “valid” indicators of the educational standards to which we aspire, or that the cross-national comparisons based on these studies give us a valid picture of where the United States stands in terms of meeting these standards. As I argued in the body of this paper, despite advances, the cross-national achievement tests used most recently still do not reflect our most ambitious learning goals for students, and the ages at which students are tested in the cross-national surveys might not reflect the goals we actually hold for our education system. Concerning the goal of using cross-national surveys to inform the process of school improvement in the United States, the picture is less clear. Cross-national studies certainly continue to make important contributions to our understanding of educational issues. The recent contributions of Schmidt and colleagues (1997, 1999) on the nature of curriculum organization in different countries, as well as the insights from the TIMSS video studies of teaching practice, represent two particularly stellar accomplishments in this area. Moreover, various segments of the practitioner community in American education seem to be developing interesting and sophisticated strategies for applying the findings of cross-national surveys to the school improvement process, as the efforts of the New Standards Project, the First in the World Consortium, and other groups suggest. However, it is my view that the scientific community has yet to articulate a sound logic for how to link the findings from cross-national surveys to issues of school improvement. There are too few within-nation tests of hypotheses developed from cross-national comparisons, no clearly articulated perspectives on how to measure the features of national education systems in ways that guide cross-national comparative work or elucidate the sampling of societies for cross-national comparison, and too little clarity about what constitutes the valid use of qualitative case study data and how they can (or cannot) be used alongside survey data to improve our understanding of educational processes in different societ-

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement ies. In this sense, much more thinking is needed if we are to clearly articulate the role of cross-national studies in improving schools in the United States. One thing is clear, however. Sound information about how to change American schooling in ways that improve student learning cannot be based on cross-national surveys alone, or even on surveys that seek to assess hypotheses drawn from cross-national analyses using within-nation analyses of survey data. Instead, to truly understand how practices imported from other countries might affect student achievement in the United States, it appears that we will have to recreate these practices in American settings through careful intervention and then investigate the effects of these practices in carefully controlled experimental work. That is a logical—and needed—addition to the cross-national research agenda, and one that I believe BICSE should support. The second question is: • What opportunities lie ahead for improving the quality of such studies, both methodologically and in terms of information yield? This is a difficult question to answer in the absence of information about levels of funding for future cross-national research. Certainly, advances in the development of computerized adaptive testing are worth exploring as means of improving cross-national achievement tests, especially if this approach to achievement testing can be used to improve the information yielded per item in measures of achievement, thereby reducing the number of items required in testing. If that possibility exists, perhaps reductions in the required amount of time for testing resulting from this process can allow for the development of more items that assess “authentic” forms of academic performance and at higher levels of cognitive demand, bringing the tests used in cross-national surveys more in line with our most ambitious standards for student learning. The challenges here are enormous, however, and the resources required to make such advances could be beyond budgetary reach. I would also like to see an expansion in age groups included in the cross-national surveys, not only to include an older population of school leavers, but also to include a group of preschool students. The inclusion of such populations could allow for the kinds of investigations into achievement across the life course that I believe are truly needed to understand the role of schooling in the distribution of human capital in societies, and to better understand how this distribution varies across nations with different educational ideals and/or systems of education. The absence of data on what students know before they enter schooling makes it particularly difficult to assess the true contribution of schooling to learning, as

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement does the lack of pretest and posttest measures of achievement and adequate data on home background. At a minimum, one easy recommendation for improving cross-national surveys is to ensure that achievement is measured at two points in time in each age group under study and to ensure that state-of-the-art measures of home background are included. The inclusion of younger and older age groups, while desirable for a full analysis of the role of schooling in the distribution of cognitive development in societies, might confront too many budgetary and technical problems to prove feasible, although the use of household samples and a redefinition upward of the age at which we can consider individuals to be “school leavers” would be a welcome addition to cross-national comparisons of achievement. Within the realm of achievable improvements, I would also encourage the continued use of qualitative case study research as a companion to survey work. I would, however, recommend that work on this front proceed slowly, starting first with a clarification (or at least a sustained discussion) of the approach to comparison that underlies the use of such research methods, and how data from these efforts will be used to inform issues of survey research design and to interpret survey results and be reported to the public. I would also encourage more thought about ways to characterize societies—as entities worthy of study in and of themselves. Very little attention was given in this volume to how to improve the ways in which we conceptualize and measure properties of different education systems or the societies in which they are embedded, yet a clear understanding of these issues is the key to any good, comparative, cross-national theory of educational processes. Lacking good theories and explicit attention to the development of measures at the societal level of analysis, I fear that much of what passes as cross-national comparison will be based on hunch, myth, and uninformed secondary data analysis, rather than carefully crafted cross-national theories of education. All of this leads to the third question: • How important is it to have international surveys of student achievement on a regular basis and with participation of a constant set of countries? The evidence on this point seems fairly clear. It is precisely because the cross-national surveys have been conducted continuously over a span of 30 years that this body of work has made the progress it has. Consider as examples of this point our changing understanding of the relationship of socioeconomic status and achievement, or the increasingly sophisticated conceptualization and measurement of opportunity to learn. These

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement developments suggest the importance of continuing the cross-national surveys using samples of nations that at least approximate the kinds of nations used in prior studies. As to whether cross-national studies strictly require a panel (that is, a constant set) of nations, I am uncertain, although for all intents and purposes, voluntary participation by societies around the world has been constant enough to result in a sample that comes close to being a panel of nations.10 To argue for a panel of nations is to give weight to a goal of cross-national surveys that was not discussed much in this volume— looking at changes in educational processes over time and thinking about how and why educational processes change in different kinds of societies. To the extent that there are strong theories about this, I would urge that a panel be formed that includes the kinds of societies needed for testing those theories, but lacking such theories, voluntary participation in the cross-national surveys seems sufficient. Finally, I am uncertain about how frequently cross-national surveys of student achievement should be conducted. On one hand, I think such studies should be mounted no more than once a decade, especially because this seems to be the amount of time the research community takes to fully digest the last round of studies and formulate new and better theories to test in the next wave. Too frequent a cycle of surveys, I fear, will simply routinize the work, leading to studies that gather the same data over and over again, and to data analyses that only partially digest any one set of findings. Still, I am mindful of the need to maintain a research infrastructure in nations that might otherwise lack it should funding for large-scale, cross-national surveys disappear; as a result, I can propose one alternative to a once-a-decade approach to cross-national surveys. That would be to conduct studies focused on one or a few academic subjects at intervals less than a decade—say every three years or so—rather than mounting one large, multisubject study each decade. In this design, there would be time for conceptual work between cycles of subject-matter testing, but there also would be a constant stream of work and data for the cross-national research community. In closing, let me reiterate a point I hope I made clear throughout this essay. Despite the caveats I raised about cross-national surveys of student achievement, and despite my concerns about their validity for setting educational standards and informing the process of school improvement in the United States, I firmly believe that such studies meet many of the goals their advocates hold for them. The cross-national surveys have helped stimulate a national and public debate about educational standards in the United States, and have done so on a regular basis. They have also pointed the way to some very interesting designs for educational improvement in the United States that have given rise to some very interesting efforts at school improvement. What this line of work needs to

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement improve, in my view, is no more or less than what any other well-conceived and well-conducted research program needs—more time to develop methods and theories that can be tested and revised on a regular basis. What the collection of papers in this volume contributes to this process is some very good insights into the theories and methods that can guide this development. NOTES 1.   Thanks to Larry Suter of the National Science Foundation for this insight. 2.   In fact, this was precisely the approach used by the New Standards project described in National Research Council (1995). 3.   The difficult problems of selection bias present in this age cohort in TIMSS are discussed in more detail in both Raudenbush and Kim (this volume) and Smith (this volume). 4.   What would we conclude, for example, if it was found that the education system in the United States worked to reduce initial (preschool) dispersions in achievement among different social groups and led to “average” or above levels of achievement in international comparisons at age twenty-two? Would we not argue that our system was in fact achieving its main purposes, despite room for improvement? And would it not be interesting to compare dispersions in achievement, as these unfolded over the life course, as well as average levels of achievement at different time points, across a well-chosen sample of nations? 5.   I am thinking here of the line of studies that originated in the 1970s with the National Longitudinal Survey, that progressed through High School and Beyond, and has continued with the National Educational Longitudinal Survey and the Early Childhood Longitudinal Study. 6.   In fact, the development of carefully constructed measures of OTL were pioneered in the cross-national surveys, where elaborate lists of curricular content were prepared and where teachers were asked to report the extent to which they “covered” such content on self-administered questionnaires. Such elaborate measures of OTL (or content coverage) have begun to appear in large-scale surveys of student achievement in the United States, especially studies sponsored by the U.S. Department of Education’s Planning and Evaluation Services (see, for example, the Instructional Dimensions Study, the Sustaining Effects Study, and Prospects), where the measures show statistically significant relationships with student achievement. Good measures of curriculum coverage in schools have not, however, been a hallmark of NCES-sponsored longitudinal studies. 7.   Raudenbush and Kim (this volume) raise an additional set of concerns about the various strategies used by comparative educationists to make causal inferences about system-level change using cross-national surveys, including cautions about missing data problems and how these problems affect cross-national analyses of cohort differences in achievement and system-level analyses of changes in achievement over time. I do not discuss these forms of analysis or Raudenbush and Kim’s critique of them in this chapter, except to note that these kinds of analyses, and the problems discussed by Raudenbush and Kim, are central to the important goal of cross-national work discussed by BICSE (National Research Council, 1993), namely, using repeated surveys to examine trends in schooling within and between nations of the world. 8.   I thank Bill Schmidt for suggesting this as a real possibility. 9.   I am thinking here again of the New Standards project and the work of the First in the World Consortium. 10.   I thank David Baker for this insight.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement REFERENCES Baker, D. P. (1993). Compared to Japan, the U.S. is a low achiever . . . really: New evidence and comment on Westbury. Educational Researcher, 22(3), 18-20. Baker, D. P., Goesling, B., & LeTendre, G. K. (2000). Social class, school quality, and national development: A cross-national analysis of the “Heyneman-Loxley” effect. Unpublished manuscript, Pennsylvania State University, College of Education. Berliner, D. C., & Biddle, B. J. (1995). The manufactured crisis: Myths, fraud and the attack on America’s public schools. Reading, MA: Addison Wesley. Bracey, G. W. (1997). Setting the record straight: Responses to misconceptions about public education in the United States. Alexandria, VA: Association for Supervision and Curriculum Development. Carroll, J. (1963). A model for school learning. Teachers College Record, 64, 723-733. Central Advisory Council for Education. (1967). Children and their primary schools. Vols. 1-2: The Plowden Report. London: Her Majesty’s Stationery Office. Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: U.S. Department of Health, Education, and Welfare. Heyneman, S. P., & Loxley, W. A. (1983). The effect of primary school quality on academic achievement across twenty-nine high- and low-income countries. American Journal of Sociology, 88, 1162-1194. Husen, T. (1987). Policy impact of IEA research. Comparative Education Review, 31(1), 129-136. Meyer, J. W., Kamens, D., & Benavot, A. (1992). School knowledge for the masses: World models and national primary curriculum categories in the twentieth century. London: Falmer Press. National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government Printing Office. National Research Council (1992). Assessing evaluation studies: The case of the bilingual education strategies. Panel to Review Evaluation Studies of Bilingual Education, M. M. Meyer & S. E. Fienberg, eds. Committee on National Statistics, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (1993). A collaborative agenda for improving international comparative studies in education. Board on International Comparative Studies in Education, D. M. Gilford, ed. Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (1995). International comparative studies in education: Descriptions of selected large-scale assessments and case studies. Board on International Comparative Studies in Education, Commission on Behavioral and Social Sciences and Education. Washington, D.C.: National Academy Press. Ramirez, F. O., & Boli, J. (1987). The political construction of mass education: European origins and worldwide institutionalization. Sociology of Education, 60, 2-17. Schmidt, W. H., McKnight, C. C., Cogan, L. S., Jakwerth, P. M., & Houng, R. T. (1999). Facing the consequence: Using TIMSS for a closer look at U.S. mathematics and science education. Boston: Kluwer Academic. Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A splintered vision: An investigation of U.S. science and mathematics education. Boston: Kluwer Academic. Stevenson, D. & Baker, D. (1991). State control of the curriculum and classroom instruction. Sociology of Education, 64, 1-10. Stigler, J., & Hiebert, J. (1999). Teaching is a cultural activity. American Educator, Winter, 4-11.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement Tilly, C. (1984). Big structures, large processes, huge comparisons. New York: Russell Sage Foundation. Westbury, I. (1992). Comparing American and Japanese achievement: Is the United States really a low achiever? Educational Researcher, 22(3), 18-24.

OCR for page 319
Methodological Advances in Cross-National Surveys of Educational Achievement This page in the original is blank.