Read "Methodological Advances in Cross-National Surveys of Educational Achievement" at NAP.edu

Page 319 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

Conclusion

Page 320 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

This page in the original is blank.

Page 321 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

11
Large-Scale, Cross-National Surveys of Educational Achievement: Promises, Pitfalls, and Possibilities

Brian Rowan*

Large-scale, cross-national surveys of schooling and student achievement have been part of the education landscape in the United States for nearly 30 years. The origins of such work can be traced to the twelve-country First International Mathematics Study conducted by the International Association for the Evaluation of Educational Achievement (IEA) in 1964. Since then, at least one, and sometimes several such surveys of student achievement have been conducted each decade. The United States already has participated in six cross-national surveys of mathematics and science achievement, four surveys of reading/literacy/language achievement, two surveys of civics education, and several surveys of student achievement in other domains (for a list of studies, see Table 4-1 of Chromy [this volume]).

What is striking about this corpus of work, besides its growing size, is that cross-national surveys of achievement have been fielded at a more rapid pace each decade since the 1960s. Only one such survey was fielded in the 1960s, and it covered only mathematics achievement. In the 1970s, there was again just a single study, but this one covered six academic subjects. Beginning in the 1980s, however, the pace accelerated. There were two surveys of math and science achievement in the 1980s, and another two in the 1990s. Moreover, the 1990s saw a reading survey, an

*	Brian Rowan is a professor of education and study director for the Study of Instructional Improvement in the School of Education at the University of Michigan.

Page 322 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

adult literacy survey, an early childhood survey, a language education survey, and a civics education study.

In this chapter, I discuss these large-scale, cross-national surveys of student achievement. I confine my discussion of this work to a small set of questions about the methodology used in the studies, questions raised by the Board on International Comparative Studies in Education (BICSE) and addressed by the chapters in this volume. In particular, this chapter considers the following three questions:

Looking at the history of large-scale, cross-national surveys of student achievement, what progress has been made in conducting studies that are increasingly valid and increasingly informative?
What opportunities lie ahead for improving the quality of such studies, both methodologically and in terms of information yield?
How important is it to have international surveys of student achievement on a regular basis and with participation of a constant set of countries?

In the following pages, I do not address these questions directly. Rather, my approach is to answer these questions in the context of a larger discussion about the purposes of cross-national studies of education, particularly studies focused on issues of student achievement. Clearly, one cannot think wisely about the validity of large-scale, cross-national education surveys, or about the methods they use, the information they yield, or how regularly they should be conducted, without also thinking about the goals such studies are intended to achieve. The problem, of course, is that the large-scale, cross-national surveys discussed in this volume have been complex studies, designed to achieve multiple purposes and to inform multiple audiences of researchers, policy makers, and citizens from participating countries around the world. In this light, an evaluation of such studies, and a discussion about how they might be improved, requires us to think carefully about the goals we want such studies to achieve.

An excellent discussion of these goals can be found in a 1993 monograph published by the National Research Council (1993). This brief monograph advances the view that the world’s varied education systems provide a kind of “natural laboratory” that allows interested parties in the United States to look at variations in schooling cross-nationally, to connect variations in educational organization and practice to variations in student achievement, and to use these analyses to think about how to improve the U.S. education system. In this view, data from cross-national surveys of student achievement can be analyzed in two important ways to inform instructional improvement in the United States. First, data on

Page 323 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

student achievement in the United States can be compared to data on student achievement in other countries, and such comparisons can be used in a “benchmarking” process that sets standards for student achievement in the United States. Second, data from cross-national surveys can be used to investigate how cross-national variations in school and classroom characteristics affect variations in student achievement in the hopes that this sort of analysis will tell us something about how to alter patterns of schooling and improve student achievement in this country.

In this chapter, I treat these two fundamental goals as the context for a discussion of the promises, pitfalls, and possibilities of cross-national surveys of student achievement. I turn first to the problem of cross-national comparisons of student achievement and to the use of such comparisons as benchmarks for student achievement in the United States. In discussing these issues, I pay special attention to the chapters by: Linn, Chromy, Hambleton, Raudenbush and Kim, and Smith in this volume, each of which takes up methodological problems relevant to the benchmarking issue. I then turn to the use of data from cross-national surveys to estimate how changes in schooling practices can improve student achievement in the United States. Here I pay special attention to the chapters prepared for this volume by: Bempechat, Jimenez, and Boulay; Buchmann; Floden; LeTendre; Raudenbush and Kim; and Smith. Only after having looked at these issues do I directly address the questions posed for me by BICSE, and at that, only in the limited way permitted by page constraints.

INTERNATIONAL COMPARISONS AS BENCHMARKS

As Smith (this volume) shows, large-scale, cross-national surveys of student achievement have figured centrally in debates about educational standards in the United States since the 1980s, when findings about the performance of U.S. students on the first international surveys of student achievement were called to the attention of the American public in A Nation at Risk (National Commission on Excellence in Education, 1983). Since that time, the cross-national studies have evolved into a kind of decennial “cognitive Olympics” in the United States, as Husen (1987, p. 131), one of the most thoughtful advocates of cross-national research in education, feared they might. In this environment, each new release of international data is given widespread attention, not only by researchers and policy makers, but also—as a result of widespread press coverage— by the public at large. Indeed, the international comparisons have become one of the few grand spectacles in American education, surpassing even the release of data from National Assessment of Educational Progress in terms of pure drama in coverage.

Page 324 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

All of this has been controversial, especially in the eyes of those who think reports based on international comparisons have unfairly portrayed the performance of U.S. schools (Berliner & Biddle, 1995; Bracey, 1997). As we shall see, there is room for improvement in the ways that international comparisons of student achievement are reported to audiences in the United States. But a case can be made that even the poorly crafted, early reports on international studies of educational achievement performed an important service in American education. Since A Nation at Risk, the cross-national surveys of student achievement have served to dramatize issues of educational performance in the United States, mobilizing friend and foe of the system alike to articulate their aspirations for our education system and helping to launch what has become an important, and continuing, public debate about education standards in this country.

A discussion of the consequences of this debate, and the ensuing focus on standards in American education, is beyond the scope of this chapter. Suffice it to say for now, however, that there are two views on this issue. On the positive side, many observers believe the international comparisons (and other attempts to dramatize student achievement in the United States) are leading to the development of much more ambitious and appropriate standards for student learning in American schools. But other observers see a dark side to this development, especially the increased use of standardized test results to dramatize problems of student achievement in the United States. Increasingly, critics are arguing that standardized tests have become the nearly exclusive “coin of the realm” in judging the adequacy of America’s schools and that the heavy reliance on such tests as tools of education accountability is leading to an unnecessary and harmful narrowing of instructional goals and processes in American schools. It is for this reason, then, that the role of the international comparisons in setting “benchmarks” for student achievement requires special scrutiny, for good benchmarks must not only portray American students’ achievement fairly in comparison to students in other nations, but also must assess academic goals that we truly want to hold for our students.

I will raise two interrelated sets of questions about these issues. One set of questions concerns whether the international studies of educational achievement have been designed and managed to produce fair comparisons of student achievement across nations. In examining this problem, I will discuss the extent to which the various tests used in cross-national comparisons are aligned with curricular content emphasized in the nations participating in the comparisons and the extent to which the samples used in making comparisons provide the kind of level playing field required for a fair benchmarking process. Here, I will argue that steady progress is being made despite difficult challenges.

Page 325 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

A second set of questions concerns the validity of international comparisons as benchmarks of student achievement in the United States. Here I will inquire more deeply into the curricular content of the achievement tests used in the international studies, discuss the wisdom of comparing achievement at the various age levels sampled in international comparisons, and quibble with the ways the results of these comparisons have been reported and interpreted, not only in the popular press, but also among responsible researchers. My point in this section will be that a variety of issues need attention before international comparisons can be used as clear and unambiguous benchmarks for educational achievement in the United States.

Fielding the Cross-National Studies

Three chapters in this volume discuss the difficulties associated with fielding “fair” cross-national surveys of student achievement and the progress that has been made in this area since the earliest surveys were mounted. Linn’s chapter discusses the problems associated with developing achievement tests for the surveys. Chromy discusses problems associated with selecting and realizing samples of students. Hambleton discusses the translation or adaptation of research instruments for use in multiple countries. Overall, each of these chapters notes particular methodological problems faced by the researchers conducting cross-national surveys, but each also communicates a sense that significant progress has been made in addressing these problems.

Consider Linn’s chapter on the achievement tests used in cross-national surveys. It is apparent from his discussion that there are difficult problems related to constructing achievement tests for cross-national comparisons, in large part because of differences that exist in national curricula across participating nations. But Linn’s chapter also shows the increasing care that researchers have given to the task of aligning tests to national curricula in successive cross-national studies. In the latest studies, for example, achievement tests were constructed only after extensive examination of national curricula and detailed consultation with curriculum experts from participating countries. From this perspective, it appears that sound efforts have been made to ensure “fairness” in testing by allowing analysts from particular nations not only to have detailed knowledge about the alignment of all test items to national curricular goals, but also by allowing analysts from different nations to analyze achievement results using only items that meet an alignment standard for their own country.

Linn also details important progress in constructing and scaling achievement tests over time. The latest studies use elaborate content ma-

Page 326 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

trices to choose test content, and they use complex matrix sampling designs and item response theory to allow researchers to place all respondents on the same achievement scale (and subscales) even though a given respondent has answered only a subset of test items. Developments in scaling, in particular, allow researchers to conduct much more sophisticated analyses of the achievement test data from cross-national surveys. In the latest studies, for example, researchers can examine variations in student achievement within nations more completely and in a much more fine-grained way than previously possible. All of this further enhances the fairness of cross-national comparisons by allowing researchers to examine differences in student achievement among groups of students and/ or across curricular domains that are more or less aligned to national standards.

In combination with Linn’s chapter, Hambleton’s chapter shows that progress also has been made in adapting achievement tests and other data collection protocols for use in the many different nations involved in the cross-national studies. Hambleton, for example, lists the various steps now being used by responsible testing agencies to develop tests for use in cross-national settings, where language and culture are important considerations in test construction. Linn describes the care taken in the most recent cross-national surveys to pretest items and examine item parameters in different national populations. All of this should lead the consumer and user of cross-national data to the conclusion that—despite enormous difficulties—careful instrument development procedures can be (and are being) used to improve the validity and appropriateness of cross-national survey data.

Finally, Chromy’s chapter discusses issues of sampling in the cross-national surveys of student achievement, tracing the various sampling designs used in different studies and the strategies used to ensure that these sampling designs are realized in different nations. His chapter describes a process in which sampling procedures of increasing rigor were developed through time, not only through more careful delineation of sampling plans, but also through more careful development of procedural manuals, reporting forms, and other approaches that enhance the comparability of data across participating nations and that allow analysts to take into account deviations from the uniform sampling procedures when these occurred. In fact, the development of these procedures, and careful monitoring of sample realization, is what now allows analysts (like Smith, this volume) to be able to observe (and take into account) the ways in which deviations from the standard sampling plan affect inferences about national differences in student achievement.

The discussion to this point, then, suggests that much progress has been made in developing sound procedures for fielding cross-national

Page 327 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

surveys of student achievement. This is no mean feat, because the problems here are formidable. The most difficult part of any research effort is the sheer mounting of the data collection effort, and all the more so when that effort is large and complex. Such problems have been especially pressing in large-scale, cross-national surveys because many of the participating countries have lacked the required “technical” infrastructure to mount complex survey research efforts prior to their participation in the surveys. In fact, the maintenance of the research infrastructure required to mount complex surveys in nations that historically have lacked such capacity is one reason to consider mounting large-scale, cross-national surveys on a frequent basis, for delaying successive waves of research in some nations runs the risk of allowing investments in the research infrastructure to be eroded.¹

Reporting and Interpreting the Results

The progress just reported addresses some of the criticisms made about early cross-national comparisons of student achievement, especially complaints that the U.S. didn’t face a “level” playing field in such comparisons. But a number of problems remain to be addressed before we can conclude that international comparisons of student achievement provide us with truly useful benchmarks for student achievement in the United States. In this section, for example, I discuss how the results of cross-national studies can be analyzed to better inform the overall debate about educational standards in this country, and I point to possibilities for future studies that might provide even more useful information than we now gain from the cross-national surveys. Overall, my message is that deeper and more probing analyses of the cross-national data are needed if they are to be used in a truly informative debate about education standards in this country.

Issues Related to Achievement Tests

One problem I want to address is the extent to which the achievement tests used in cross-national surveys supply the kinds of “benchmarks” for student achievement that we want in the United States. It is well known that the average test scores for American students in cross-national surveys rarely lie at the top of the cross-national performance distribution and that our students frequently perform more in the middle of the pack (or below), depending on the test. What we do not know from most published reports of these comparisons, however, is the kinds of academic performances being measured on cross-national achievement tests or the extent to which these tests reflect our desired standards for student learn-

Page 328 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

ing. In fact, Linn’s chapter (this volume) presents some fascinating insights into test content and format that call into question the extent to which the tests used in cross-national surveys provide the most useful benchmarks for student learning in American schools.

Even the most casual observer probably knows that current discussions of academic standards in American education increasingly present an ambitious set of goals for what we want students to know and be able to do at different points in their education careers. The emergence of these ambitious standards, however, has been only partly driven by the results of international surveys of student achievement. Equally important to this development has been a sea change in how instructional psychologists and psychometricians in the United States think about school learning. Increasingly, American educators are becoming concerned not simply with the extent to which items on achievement tests adequately sample various content domains in the school curriculum, but also with the level of “cognitive demand” of test items and the types of performance these items are designed to elicit from students.

Linn’s discussion (this volume) reflects this interest, and he therefore spends considerable time discussing not only how cross-national surveys arrive at their tables of curricular content, but also how items are constructed to reflect more ambitious levels of “cognitive demand” and more authentic forms of academic performance. In this regard, Linn warns us that despite much progress, the achievement tests used in the most recent cross-national studies continue to include a preponderance of multiple-choice items that have a fairly low level of cognitive demand (e.g., knowledge of simple facts and procedures as opposed to the application of knowledge in nonroutine problem-solving situations). Still, Linn does note that newer items increasingly are being included in cross-national achievement tests—particularly “constructed response” items that present a higher level of cognitive demand and a more “authentic” demonstration of what students know and are able to do. Nevertheless, as Linn points out, the use of newer item formats is inherently limited by restrictions on testing time in cross-national surveys and by the need to increase the sheer number of test items in particular content domains to enhance test reliability. This inherent tradeoff, Linn argues, explains why cross-national achievement tests still have a preponderance of conventional, multiple-choice test items pitched at lower levels of cognitive demand.

If Linn’s comments about item formats and cognitive demand suggest that the achievement tests currently used in cross-national surveys don’t fully reflect more ambitious views of academic standards for students, his comments about the curricular content included in such tests is even more eye opening. Earlier, I discussed the problems faced by test developers seeking to match the content of cross-national achievement

Page 329 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

tests to varying national curricula around the world. In discussing this problem, Linn notes two potential test construction strategies that can be used to build “fair” achievement tests in cross-national settings. One strategy is to include test items representing the union of curriculum objectives in all national curricula; an alternative is to include only items occurring at the intersection of national curricula. In point of fact, however, even the most current cross-national achievement tests are not based on either approach. Instead, for reasons having to do with the greater capacity of U.S. agencies to provide test items, and because of restrictions on test length, Linn reports that most of the achievement tests used in cross-national surveys have a distinctly American content and item-format bias.

In light of these arguments, it is worth revisiting the problem of how to use the cross-national surveys to set standards for American education. One thing should be clear from the discussion thus far. If we seek information about how American students are performing in relationship to ambitious standards for academic content and cognitive demand, the achievement tests used in most cross-national surveys don’t provide the appropriate information. Instead, such tests continue to reflect American curriculum content as it now stands, and they continue to contain items that reflect a lower level of cognitive demand. From this perspective, comparisons of average student performance in the United States to average student performance in other nations are not—in and of themselves— an especially good yardstick for judging progress toward our most ambitious vision of educational standards. Instead, an appropriate international benchmarking process would more thoroughly investigate national curricula outside the United States and develop more challenging achievement tests.² From this perspective, the goal of bringing the average performance of American students on current tests up to the national averages found in “higher performing” countries serves only as a useful starting point in achieving higher educational standards in American schools, for even if we achieved this goal, we would still not know whether we had met our most ambitious goals for student learning.

Issues Related to Reporting Test Scores

Despite these caveats, many analysts and observers continue to treat cross-national comparisons of student performance as a reasonable standard for judging the performance of our education system. Moreover, many scholars (including myself) would argue that such comparisons, although limited, provide useful insights into how our education system functions, especially in comparison to others. But as the chapters in this volume suggest, there are ways in which these comparisons can be made even more informative.

Page 330 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

Consider, for example, a frequently noted, but easily addressed, complaint about the use of achievement data in making cross-national comparisons. Many careful observers, including Raudenbush and Kim (this volume), argue that too much emphasis is placed on comparing country means on achievement. In this argument, a focus on mean differences in achievement across countries is seen as concealing as much as it reveals (see also Berliner & Biddle, 1995, and Bracey, 1997). It is my view that the focus on country means is largely a legacy of the relatively primitive scaling procedures used in the earliest cross-national surveys, for as Linn (this volume) reports, it was not until the use of more sophisticated scaling techniques that the possibility of detailed reporting of within-country variation in student achievement scores emerged. As a result, the focus on mean scores in cross-national comparisons could be seen as a simple case of cultural lag, that is, a phenomenon in which the average analyst has yet to catch up with the new possibilities for data analysis resulting from changes in test scaling procedures. If that is the case, more sophisticated reporting should begin to appear in the near future.

Issues Related to Sampling

The question, of course, is the form that such sophisticated reporting should take, for more complicated forms of data analysis abound. One obvious approach, appearing more and more frequently in publications originating in the United States, is to report not only mean achievement across countries, but also the dispersion of scores around national means and the uncertainty in estimates that results from this dispersion. Obviously, such statistics give a much better sense than do simple rankings of mean scores about whether, in fact, the differences in country means are statistically significant, and about how large such differences are in terms of the total dispersion in test scores (for a discussion, see Raudenbush and Kim, this volume).

Beyond that, it appears that cross-national comparisons of achievement also could be analyzed in ways that are more sensitive to differences in the composition of national samples. We know, for example, that the country-level samples in cross-national surveys conducted to date have varied in terms of the percentage of students participating in different curricular programs (or “tracks”), in terms of age and gender composition, in terms of socioeconomic and ethnic composition, and/or in terms of the percentage of students living in different residential locations (e.g., urban, rural). Raudenbush and Kim (this volume) argue that these differences can—and often do—affect student achievement and that reasonable comparisons of student achievement across nations need to: (a) demonstrate such effects and (b) adjust for differences in sample composition when reporting on cross-national differences in achievement (see also

Page 331 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

Baker, 1993; Berliner & Biddle, 1995; Floden, this volume; and Westbury, 1992).

It should be noted that these recommendations involve more than just technical matters. The problem of whether or not to adjust country means for sample composition cuts to the very heart of setting standards for student learning in American society. For example, many critics of the cross-national surveys have argued that focusing on mean differences across nations obscures many of the unique challenges faced by the American educational system. In this view, cross-national comparisons need to take into account that the education system in the United States is called on to educate more children in poverty than the education systems in many of the “top-achieving” countries, that the United States has more ethnic and linguistic diversity than many countries, and that students in the United States live in more diverse residential locations than students in some of the “top-performing” countries. In this view, reporting unadjusted means does a real disservice to our nation’s embattled educators, who are working against great odds to produce the results they do.

I believe this view should be treated with great caution, however. For one thing, a case can be made that a focus on unadjusted means represents what we truly desire—that all students in our country achieve at the highest levels. In fact, my own sympathies lie with this latter view, although I also favor more careful, disaggregated displays of achievement data for a number of reasons. For one, an examination of achievement among subpopulations in the United States gives us a much better sense about how American society currently distributes human capital among its members and about the pernicious patterns of inequality that still exist in our nation. Moreover, an examination of achievement patterns across subgroups need not undermine our concerns about educating all students well. In fact, a careful look at subgroup differences in achievement tells us precisely where we are succeeding and where we are not. Thus, although I favor disaggregated presentations of cross-national data, I do not favor this approach because I want to defend our educational system. Rather, I think such data are more informative and more telling in their description of educational outcomes in American society. In fact, I believe such analyses need to be extended beyond the analysis of U.S. data. Attention to the ways in which different educational systems in the world distribute human capital among members helps analysts in the United States reflect more thoughtfully on how our own system functions and on its consequences for academic learning across broad segments of the population.

There is one further way in which issues of sampling interact with the problem of setting standards for achievement in the United States. As Chromy (this volume) and Raudenbush and Kim (this volume) discuss, cross-national surveys have varied in how they define the samples to be selected for cross-national comparisons. In some studies, samples have

Page 332 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

been selected on the basis of students’ locations in the graded educational system, but in others, students were selected to represent specified age cohorts. In particular, Raudenbush and Kim argue that sampling students who are in attendance at certain grade levels presents a host of problems, including the fact that patterns of promotion from grade to grade and patterns of school leaving (i.e., dropping out) vary across nations, presenting potentially intractable problems of selection bias in cross-national comparisons of student achievement. As a result, Raudenbush and Kim press for more consideration of age-based, household sampling.

Raudenbush and Kim’s analysis of these issues is informative, and their advice is worth considering to the extent that it doesn’t undermine researchers’ ability to also examine school effects on student achievement outcomes. But again, the issue of what populations to sample in cross-national surveys is more than a technical issue and goes to the heart of the standards we want to hold for student achievement in American society. A number of observers, including Berliner and Biddle (1995), have argued that cross-national comparisons of achievement reflect on more than the simple efficiency of schools; they also reflect on the relative societal “press” for academic achievement at different stages of the life course in different societies. In this view, the performance of American students on cross-national surveys of achievement reflects not only the performance of schools in the United States, but also the very different strategies that parents and communities in the United States use to pass on human, social, and cultural capital to children, especially in comparison to the strategies used in many “high-performing” nations. In many countries around the world, emphasis is placed on school achievement early in the life course, especially when early achievement is required to advance in education systems that are more stratified and selective than our own, and that educate fewer students at higher system levels than we do. The United States, in contrast, maintains an education system that is very open—offering many “second chances” for slow starters—all of which allows for parenting strategies that emphasize early investments in human, social, and cultural capital that are only loosely related to the narrow goal of acquiring school knowledge.

To the extent that these observations are true, and they have been a stable feature of arguments about the American education system for decades in comparative sociology, cross-national comparisons of achievement among school-aged populations may not adequately reflect patterns of ultimate achievement in the United States. Following this argument, American society might have developed a pattern of education that promotes a slower pace of achievement. To the extent that we can tolerate (and afford) this educational strategy, a better way to think about educational standards in the United States would be to compare educational

Page 333 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

achievements at later ages, for example, at age 22 or so, after many American students have had an opportunity to complete postsecondary forms of education. Here, in fact, one could use household samples to great effect and also gain a much better sense of what American students (as well as students from other nations) end up knowing at a more realistic “school-leaving” age than the one most recently defined in the Third International Mathematics and Science Study (TIMSS).³

The bigger point, of course, is that we shouldn’t let the age group comparisons available in current cross-national studies drive the standards-setting process in American education. Instead, we should use our own sense of desirable standards for learning at particular ages to define the strategy for selecting samples in cross-national studies, and this might involve sampling older students on a household basis. In this regard, the International Adult Literacy Survey, conducted by Statistics Canada, represents a welcome addition to the portfolio of cross-national surveys of educational achievement. One would hope, however, that future surveys of older populations—especially a realistic sample of “school leavers” suited to the American context—would include achievement assessments in more subject areas than just literacy.

All of this raises an interesting possibility for the design of future cross-national surveys. For one, cooperating agencies conducting this research might consider expanding the age groups sampled in such studies, including not only an older sample of school leavers, but also preschool populations.⁴ In fact, an expansion of the age groups sampled in cross-national surveys would give us a much better picture of educational achievement across the life course in different societies, providing crucial information about patterns of achievement as these unfold prior to entry into schooling, at critical junctures during the school-age years, and at a more realistic end point than the one typically defined in current and past cross-national surveys. Moreover, analyses of achievement across the life course would give us a much better sense of how patterns of schooling in different countries affect the distribution of human capital in society, especially as people move across the life course. Currently, we get some sense of how human capital is distributed at various stages of the life course within the United States in the longitudinal studies program of the National Center for Education Statistics. But I know of no systematic program of cross-national research dealing with this critical question.

CROSS-NATIONAL SURVEYS AND THE STUDY OF SCHOOL IMPROVEMENT

To this point, I have been discussing the use of cross-national surveys to set benchmarks for student achievement in the United States. In the

Page 334 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

discussion, I suggested that with additional developments in test design and with some changes in sampling design, international assessments would more usefully serve as benchmarks for student achievement in this country. But as we have seen, many educational researchers and policy analysts want cross-national surveys to be useful for more than setting benchmarks. First of all, they view the cross-national surveys as an opportunity to learn more about education systems around the world, especially how alternative systems are structured and how they function. In addition, many advocates want cross-national surveys to provide good information about how to improve our own education system. There is especially a notion—by no means universally held in the research community—that cross-national surveys might allow us to look at education systems in nations that do better than the United States in cross-national achievement comparisons, identify the practices in these countries that are leading to superior results, and then import these practices into our own system as a means of educational improvement.

Buried in all of this are many important questions about the cross-national surveys of achievement conducted to date and their promise for building sound knowledge about education systems. For one, we might ask what we have learned from three decades of investment in largescale, cross-national surveys of student achievement, especially in comparison to the three-decade-old program of longitudinal studies supported by the National Center for Education Statistics (NCES).⁵ But more importantly, we might ask how survey research can be used to study issues of educational improvement by probing more deeply into the idea that the world’s varied education systems provide us with a “natural laboratory” allowing examination of the effects of alternative educational arrangements on student achievement and thereby informing education policy in the United States. Here, as Smith (this volume) notes, we confront sticky issues of causal inference from survey data, as well as issues related to how we might build theories of educational practice from cross-national research.

The Scholarly Yield of Cross-National Surveys of Achievement

One justification for conducting cross-national surveys of student achievement is that they will yield important insights about how education systems work, ideas that might or might not inform educational improvement, but that will move our basic understanding of educational processes forward. In fact, this justification figured centrally in a statement of the goals for cross-national research held by BICSE. As the Board noted, “A[n] . . . important purpose of cross-national research is the development of knowledge . . . [that] enriches and expands [our] under-

Page 335 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

standing of the world and its complexities” (National Research Council, 1993, p. 14). BICSE also argued that achieving this goal would require the “collect[ion] of cross-national data at societal levels over reasonably long periods of time . . . to facilitate the identification of worldwide, regional, and national trends and permit the analysis of sources and effects of cross-national variation in education organization, policy, and practice” (p. 14). The question, then, is whether investments of research dollars in cross-national research have indeed contributed to the goal of building sound knowledge about education systems, and if so, what these insights have been.

My personal view on this question, which arises from interest and experience in conducting research on schooling in the United States, is that the cross-national surveys of achievement have led to a number of important insights that have relevance, not only to researchers in the field of comparative education, but also to those interested in educational processes in the United States. In fact, like the BICSE members who contributed to the 1993 monograph, I see an important “cross-walk” occurring between studies of educational achievement conducted solely in a U.S. setting and studies conducted cross-nationally.

One area where cross-national surveys have contributed important insights is in analyses of how socioeconomic origins and educational achievement are related. As Buchmann (this volume) points out, analyses of this issue have been central to educational research since publication of the Coleman Report (1966) in the United States and the Plowden Report in Great Britain (Central Advisory Council for Education, 1967). However, an important study by Heyneman and Loxley (1983) using cross-national surveys of student achievement added additional, and important, insights into the nature of this relationship. Using data from the early cross-national surveys of student achievement, Heyneman and Loxley suggested that the relationship of socioeconomic status and educational achievement was not uniform across societies. In fact, their analyses showed that the relationship among these variables was lower in less developed countries than in more developed countries. More recently, Baker, Goesling, and LeTendre (2000) used data from TIMSS to argue that this is no longer the case and that the relationship between socioeconomic status and achievement in less developed countries now approximates the relationship found in more developed countries. This is precisely the kind of progress in basic knowledge that BICSE (National Research Council, 1993) argued would result from repeated cross-national surveys in education. A major generalization derived from research in a particular kind of country (i.e., advanced industrial nations) is qualified by cross-national research, and then repeated cross-national surveys qualify these findings yet again, uncovering emerging trends in world societies.

Page 336 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

An even more telling contribution of cross-national surveys to educational research can be found in Floden’s discussion (this volume) of the concept of opportunity to learn (OTL). Although the origins of this concept can be traced to Carroll’s (1963) studies of foreign language learning in the United States, the use of this concept in analyzing and explaining student achievement has been greatly advanced in repeated cross-national surveys of student achievement. In the cross-national studies, this concept first was used simply to control for differences in national curricula when comparing student achievement across nations, but in successive waves of the cross-national survey work, the concept of OTL figured more and more centrally as the single most important explanatory variable in the data. Moreover, the centrality of OTL in explaining student achievement has not gone unnoticed in studies of student achievement conducted exclusively in American settings. In fact, as Floden describes, conceptions of OTL now are being used to explain within- and among-school differences in student achievement in the United States.⁶

The evolution of this concept in repeated cross-national studies, coupled with the more careful curriculum analyses discussed by Linn (this volume), has had an added benefit for research on schooling. It is now leading to important conceptions of how curricula are organized and how this organization affects student learning—the kind of “synthetic” theories of education that Smith (this volume) discusses. In the latest TIMSS work, for example, Schmidt and colleagues (Schmidt, McKnight, & Raizen, 1997; Schmidt, McKnight, Cogan, Jakwerth, & Houng, 1999) have developed fascinating and important ideas about the fragmentation of the U.S. mathematics and science curricula and used these ideas to great effect as explanations for the performance of American students on TIMSS achievement tests. The importance of these ideas to the discussion here is that they gain credence—and power—precisely because they have been developed in a cross-national context where patterns of curricular organization other than our own can be glimpsed and where curriculum coverage can be viewed as a central determinant of student achievement. Despite the fact that Schmidt and colleagues’ ideas about the American curriculum are still new, they have already begun to have an important influence on educational research and in debates about how to improve educational practice in the United States. They are, therefore, precisely the kinds of insights that BICSE sought to achieve in endorsing continued support for cross-national surveys in education.

Yet another example of the contributions of cross-national research to educational analysis resulted from the addition of videotaped case studies of teaching to the TIMSS portfolio. This work has been important both conceptually and methodologically. On the conceptual front, the video studies have pioneered a view of teaching as fundamentally a cultural

Page 337 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

activity, with its components composing a system of culturally embedded and interrelated practices (see Stigler and Hiebert, 1999). This insight takes us well beyond a mere mechanical vision of teaching as a set of technical procedures that can be easily packaged and repackaged toward a much more nuanced and realistic understanding of the constraints placed on instructional change by national culture. The TIMSS video studies also are a methodological advance, allowing teaching events to be studied and restudied by observers who are not physically present, using different coding schemes for understanding the events that transpire in a given setting. Already, the use of video studies of teaching is attracting the attention of NCES and the U.S. Department of Education’s Planning and Evaluation Services, and we can expect to see more such studies conducted in American settings in the near future.

The larger point, of course, is that the cross-national studies are meeting one of the goals espoused for them by BICSE. These studies have become an important source of defining ideas about the nature of schooling in the United States, and about its consequences. In fact, the contributions made by this line of work—contributions that I would argue have been facilitated by the United States’ repeated participation in such studies—rank alongside contributions made by, and make important contributions to, the national longitudinal studies and other large-scale evaluations conducted by the U.S. Department of Education. Thus, as a source of exciting scholarly ideas, continued support for the cross-national surveys of achievement seems warranted.

Cross-National Surveys and School Improvement

In many circles, the ambitions held for cross-national surveys go well beyond a contribution to basic knowledge. Many advocates of these surveys also hold that careful cross-national research will yield important insights about how to improve schools in the United States. The logic of this assertion is well stated by BICSE (National Research Council, 1993), where it is argued that the cross-national studies take advantage of “a natural worldwide laboratory of education systems” (p. 4) and that a “comparison of natural variation [across systems of education] is usually a feasible way to study the effects of differing [educational] policies and practices,” especially given “that many people are reluctant to conduct controlled experiments with our children’s education” (p. 3). Of course, these observations were offered with appropriate cautions, especially notation of the National Research Council’s recommendation for greater use of controlled experiments in educational research (National Research Council, 1992). But the general thrust of the argument—and a position one suspects is widely shared in the educational research community—is

Page 338 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

that the cross-national surveys are a straightforward source of good ideas about school improvement in the United States (see, for example, Smith’s discussion of this issue in this volume).

Two issues need to be addressed in thinking about this claim. The first has to do with how causal inferences can be made from nonexperimental data and the extent to which cross-national surveys are making progress in confronting this problem. Clearly, without good data on which to make causal inferences, the claim that we can identify why some education systems are more effective than others cannot be sustained. The second problem has to do with the theories of comparison that we bring to bear in cross-national research. As I will discuss, those who believe that we can use practices developed in other countries to the same effect in the United States are operating under an assumption that causal processes operate in the same way in all societies. As we shall see, however, this is only one of several possibilities.

Let’s begin with the problem of making causal inferences from nonexperimental data. By the typical standards of school effects research, one could argue that cross-national surveys took a real step backward with the design of TIMSS. For example, there is a large literature in this research area demonstrating how estimates from education production functions lead to faulty inferences in the absence of pretest data on achievement outcomes and good measures of home background and socioeconomic status. Yet, in contrast to the Second International Mathematics and Science Studies, TIMSS included no pretest measures of achievement and inadequate measures of home background (on the latter point, see Buchmann in this volume). The obvious recommendation for future cross-national studies, then, is to return to a design that includes achievement testing at two points in time and to take advantage of the advances in the measurement of home background and socioeconomic status described by Buchmann. With these steps, cross-national surveys at least will be able to yield credible education production functions, even if they cannot produce the kinds of sound causal inferences gained from randomized experiments.⁷

For our purposes, an even more telling discussion of how cross-national research can be used to inform issues of school improvement is provided by Raudenbush and Kim (this volume). Near the end of their commentary, Raudenbush and Kim caution readers not to assume that causal inferences about relationships among school characteristics and student achievement based on between-nation analyses apply to how such relationships might unfold in the U.S. setting. In their view, between-nation analyses can be used to form hypotheses about school improvement in the United States, but these hypotheses need to be tested explicitly through within-nation analyses of U.S. data. This is an extremely impor-

Page 339 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

tant point that requires further discussion, especially because it goes to the very heart of how to use cross-national data and findings to inform educational improvement decisions in the United States.

Consider, for example, one of the central findings from TIMSS—one that many researchers, policy makers, and practitioners think has immediate relevance to improving schooling in the United States. Schmidt and colleagues (1997, 1999) have argued that curriculum characteristics (especially curriculum coherence, focus, and rigor) account for much of the cross-national differences in student achievement in TIMSS data. But their analyses do not demonstrate that increased curricular coherence, focus, and rigor explain differences in student achievement within the United States, as Raudenbush and Kim (pp. 290-291) point out. Therefore, there is a need for analyses in which researchers first identify variations in curriculum coherence, focus, and rigor that are actually experienced by U.S. students and then estimate the effects of these real variations on variation in student achievement within the United States. The ways in which such analyses enhance causal inferences about educational processes in the United States are discussed in detail by Raudenbush and Kim, but suffice it to say that there is no a priori reason to expect that statistical relationships identified in between-nation analyses necessarily will be present in within-nation analyses.

There are, of course, several potential problems with the call for within-nation analyses of survey data to study the effects of educational practices appearing in other countries. One problem is that researchers might not be able to find sufficient variation in such practices in the U.S. education system, preventing a true test of hypotheses arising out of a cross-national context. For example, what would happen in the example above if no school system in the United States had a curriculum that even approximated the kind of curricular coherence, rigor, and focus characteristic of “high-performing” nations?⁸ Under these conditions, we might be forced to actually create school conditions that approximate arrangements in other countries before testing their effects on student achievement in the United States, and from this perspective, the work of groups like the New Standards Project and the First in the World Consortium seem to be necessary first steps in translating at least some cross-national findings into practice in the United States, with sound causal inferences awaiting true randomized experiments.

This discussion raises another important issue in cross-national research—the approach to societal comparison being used by researchers in cross-national research and how that approach should be used to inform issues of study design in cross-national surveys. In the United States, discussions of the policy ramifications of cross-national surveys seem to be guided by two relatively simplistic, and related, assumptions. One is

Page 340 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

that education in all societies works in roughly equivalent ways, leading to a second assumption—that practices imported from other countries will work in the United States in ways that are equivalent to how they worked in other countries. The evidence from cross-national surveys, however, suggests that this will not necessarily be the case. Consider, once again, the findings of Schmidt and colleagues regarding curriculum effects on student achievement. The data presented in Schmidt et al. (1999) strongly suggest the presence of interactions, demonstrating the need for caution about these simplistic assumptions.

In light of this, there is a real need for policy analysts and researchers to think more explicitly about the assumptions they are making regarding comparative research. As Tilly (1984) shows in his short and insightful monograph on comparative cross-national research, we can make several assumptions when developing societal comparisons. One might be that all societies are unique and cannot easily be compared, implying that processes occurring in one society might not easily (or ever) be duplicated in others (this view is close to the one developed by Bempechat et al., this volume). Another assumption might be that there are certain “types” of societies, and that processes occurring within groups of similar societies can be duplicated, but relationships occurring in societies classified as being in one group cannot be duplicated in societies classified as being in other groups. This approach places a premium on measuring societal characteristics and on investigating how societal characteristics condition relationships among variables at constituent system levels. Yet another assumption would hold that national societies are embedded within a larger “world system” of societies (a system in which national societies increasingly are engaged in social relationships with and influenced by one another). In this view, processes occurring within societies often depend less on unique circumstances within societies than on a given society’s location in a worldwide system of international relationships, where national societies hold unequal statuses in a dense network of international relationships and participate in an increasingly uniform, worldwide culture.

All three of these perspectives have figured centrally in cross-national research on education. The work of Heyneman and Loxley cited earlier, for example, is an instance of research that examines “types” of societies and that cautions against generalizing about educational processes across nations at different levels of economic development. Another example is the interesting work of Stevenson and Baker (1991) on the effects of educational governance regimes on consistency of content coverage in schools. In contrast, the TIMSS video studies, and the qualitative case studies recently included as companions to cross-national surveys (as discussed by LeTendre, this volume), are consistent with a more holistic

Page 341 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

form of cross-national analysis, in which societies are seen as relatively unique, and educational practices are seen as deeply embedded in national culture and therefore not easily transported across national boundaries. The work of Meyer, Ramirez, and colleagues exemplifies a third approach to comparison, one derived from a “world-systems” viewpoint on education, where educational developments within countries are seen as resulting not so much from internal social and cultural circumstances, but rather from a given society’s position in a global cultural and social system (Meyer, Kamens, & Benavot, 1992; Ramirez & Boli, 1987).

The larger point is that judgments about the “validity” of data and findings from cross-national surveys depend to some extent on the assumptions one makes about appropriate forms of cross-societal comparisons. For example, to the extent that we believe there are “types” of societies, a key concern becomes the types of societies to include in the research, and how these societies differ on system-level properties—for example, governance regimes, economic development, ethnic homogeneity, school system types, and so on. In this view, the validity of cross-national studies, and the degree to which the results are informative, depends crucially on whether the types of societies one needs to compare in testing one’s theory of societal processes are present in sufficient numbers in the sample to perform such a test and whether sufficient measures of societal properties have been developed for use in comparing system-level properties. In fact, attention to theory-driven thinking at this level of analysis, as well as discussions of how to measure societal-level properties critical to this research agenda, seem oddly lacking in this volume. As a result, readers of this volume would do well to revisit the arguments presented by BICSE (National Research Council, 1993, pp. 20-21), which explicitly attended to this issue.

More prevalent in this volume, but only barely so, is the attention paid to issues of research design and reporting arising from an assumption that national societies are unique and need to be understood on their own terms. This assumption has fostered the demand for qualitative case study research in cross-national comparisons of educational systems. As LeTendre discusses in this volume, well-conducted case studies can contribute in important ways to cross-national surveys by capturing the unique, culturally embedded nature of educational practices in nations. But LeTendre’s discussion also shows that a great deal of ambiguity remains within the research community about how to use the information derived from case studies in relation to surveys, as well as the extent to which insights from case studies should drive issues of survey design, and how conclusions from case studies can be reported so that various members of the research and policy communities find them “valid.” In fact, the simple contrast between LeTendre’s discussion of the uncertain-

Page 342 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

ties and misunderstandings surrounding the use of qualitative data in the latest TIMSS work and BICSE’s elegant statement of the role such work can play in cross-national survey research found in the National Research Council report (1993, pp. 22-23) is striking and shows that we have a long way to go before the use of qualitative research will be optimized in cross-national surveys of student achievement.

Thus, the papers in this volume suggest that we still need to make progress in articulating the theories of comparison we think should guide cross-national surveys of achievement. We might, for example, need to go beyond the simple assumption that all societies work in the same way, and in doing so, also develop a more realistic set of assumptions about how the findings from cross-national research can be applied to problems of school improvement. In this regard, it is interesting to note that the practitioner community in American education seems to be doing just this, carefully recreating practices imported from other nations and testing them in their own educational settings.⁹ But this real progress in applying cross-national findings to problems of educational improvement is not much reflected in the current volume, except perhaps in Raudenbush and Kim’s advice that hypotheses derived from cross-national comparisons should be carefully tested within the United States and in Smith’s cautions about making inferences from cross-national studies to guide education policy. One hopes, therefore, that BICSE will pay more attention to this problem in its future discussions of the validity of large-scale cross-national research and articulate more clearly how cross-national findings can be used to stimulate school improvement in the United States.

CONCLUSION

Having considered the purposes that various constituencies hold for cross-national surveys of educational achievement, and some of the problems associated with achieving these purposes, I will now address directly the questions posed at the outset of this chapter. The first question is:

• Looking at the history of large-scale, cross-national surveys of student achievement, what progress has been made in conducting studies that are increasingly valid and increasingly informative?

The answer to this question, as I suggested at the outset, depends on the purposes one hopes to achieve through such studies. If the purpose is to use cross-national surveys to set standards for achievement in U.S. education, I would argue that a great deal of progress has been made in designing studies that are increasingly informative. Advances in test con-

Page 343 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

struction and scaling, coupled with better standardization of sampling procedures, have given us achievement tests that have better content validity with respect to national curricula and samples that are more standardized across nations. All of this helps produce comparisons that are fair. Moreover, because of these developments, we can now take better account of within-country variation in achievement, place confidence intervals around measures of central tendency, and do better analyses of subgroup performance, all at a more fine-grained level of curricular detail than in previous studies.

These technical advances are welcome and enhance the utility of cross-national comparisons. However, they do not guarantee that the tests of achievement used in the cross-national surveys are “valid” indicators of the educational standards to which we aspire, or that the cross-national comparisons based on these studies give us a valid picture of where the United States stands in terms of meeting these standards. As I argued in the body of this paper, despite advances, the cross-national achievement tests used most recently still do not reflect our most ambitious learning goals for students, and the ages at which students are tested in the cross-national surveys might not reflect the goals we actually hold for our education system.

Concerning the goal of using cross-national surveys to inform the process of school improvement in the United States, the picture is less clear. Cross-national studies certainly continue to make important contributions to our understanding of educational issues. The recent contributions of Schmidt and colleagues (1997, 1999) on the nature of curriculum organization in different countries, as well as the insights from the TIMSS video studies of teaching practice, represent two particularly stellar accomplishments in this area. Moreover, various segments of the practitioner community in American education seem to be developing interesting and sophisticated strategies for applying the findings of cross-national surveys to the school improvement process, as the efforts of the New Standards Project, the First in the World Consortium, and other groups suggest.

However, it is my view that the scientific community has yet to articulate a sound logic for how to link the findings from cross-national surveys to issues of school improvement. There are too few within-nation tests of hypotheses developed from cross-national comparisons, no clearly articulated perspectives on how to measure the features of national education systems in ways that guide cross-national comparative work or elucidate the sampling of societies for cross-national comparison, and too little clarity about what constitutes the valid use of qualitative case study data and how they can (or cannot) be used alongside survey data to improve our understanding of educational processes in different societ-

Page 344 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

ies. In this sense, much more thinking is needed if we are to clearly articulate the role of cross-national studies in improving schools in the United States.

One thing is clear, however. Sound information about how to change American schooling in ways that improve student learning cannot be based on cross-national surveys alone, or even on surveys that seek to assess hypotheses drawn from cross-national analyses using within-nation analyses of survey data. Instead, to truly understand how practices imported from other countries might affect student achievement in the United States, it appears that we will have to recreate these practices in American settings through careful intervention and then investigate the effects of these practices in carefully controlled experimental work. That is a logical—and needed—addition to the cross-national research agenda, and one that I believe BICSE should support.

The second question is:

• What opportunities lie ahead for improving the quality of such studies, both methodologically and in terms of information yield?

This is a difficult question to answer in the absence of information about levels of funding for future cross-national research. Certainly, advances in the development of computerized adaptive testing are worth exploring as means of improving cross-national achievement tests, especially if this approach to achievement testing can be used to improve the information yielded per item in measures of achievement, thereby reducing the number of items required in testing. If that possibility exists, perhaps reductions in the required amount of time for testing resulting from this process can allow for the development of more items that assess “authentic” forms of academic performance and at higher levels of cognitive demand, bringing the tests used in cross-national surveys more in line with our most ambitious standards for student learning. The challenges here are enormous, however, and the resources required to make such advances could be beyond budgetary reach.

I would also like to see an expansion in age groups included in the cross-national surveys, not only to include an older population of school leavers, but also to include a group of preschool students. The inclusion of such populations could allow for the kinds of investigations into achievement across the life course that I believe are truly needed to understand the role of schooling in the distribution of human capital in societies, and to better understand how this distribution varies across nations with different educational ideals and/or systems of education. The absence of data on what students know before they enter schooling makes it particularly difficult to assess the true contribution of schooling to learning, as

Page 345 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

does the lack of pretest and posttest measures of achievement and adequate data on home background. At a minimum, one easy recommendation for improving cross-national surveys is to ensure that achievement is measured at two points in time in each age group under study and to ensure that state-of-the-art measures of home background are included. The inclusion of younger and older age groups, while desirable for a full analysis of the role of schooling in the distribution of cognitive development in societies, might confront too many budgetary and technical problems to prove feasible, although the use of household samples and a redefinition upward of the age at which we can consider individuals to be “school leavers” would be a welcome addition to cross-national comparisons of achievement.

Within the realm of achievable improvements, I would also encourage the continued use of qualitative case study research as a companion to survey work. I would, however, recommend that work on this front proceed slowly, starting first with a clarification (or at least a sustained discussion) of the approach to comparison that underlies the use of such research methods, and how data from these efforts will be used to inform issues of survey research design and to interpret survey results and be reported to the public.

I would also encourage more thought about ways to characterize societies—as entities worthy of study in and of themselves. Very little attention was given in this volume to how to improve the ways in which we conceptualize and measure properties of different education systems or the societies in which they are embedded, yet a clear understanding of these issues is the key to any good, comparative, cross-national theory of educational processes. Lacking good theories and explicit attention to the development of measures at the societal level of analysis, I fear that much of what passes as cross-national comparison will be based on hunch, myth, and uninformed secondary data analysis, rather than carefully crafted cross-national theories of education.

All of this leads to the third question:

• How important is it to have international surveys of student achievement on a regular basis and with participation of a constant set of countries?

The evidence on this point seems fairly clear. It is precisely because the cross-national surveys have been conducted continuously over a span of 30 years that this body of work has made the progress it has. Consider as examples of this point our changing understanding of the relationship of socioeconomic status and achievement, or the increasingly sophisticated conceptualization and measurement of opportunity to learn. These

Page 346 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

developments suggest the importance of continuing the cross-national surveys using samples of nations that at least approximate the kinds of nations used in prior studies.

As to whether cross-national studies strictly require a panel (that is, a constant set) of nations, I am uncertain, although for all intents and purposes, voluntary participation by societies around the world has been constant enough to result in a sample that comes close to being a panel of nations.¹⁰ To argue for a panel of nations is to give weight to a goal of cross-national surveys that was not discussed much in this volume— looking at changes in educational processes over time and thinking about how and why educational processes change in different kinds of societies. To the extent that there are strong theories about this, I would urge that a panel be formed that includes the kinds of societies needed for testing those theories, but lacking such theories, voluntary participation in the cross-national surveys seems sufficient.

Finally, I am uncertain about how frequently cross-national surveys of student achievement should be conducted. On one hand, I think such studies should be mounted no more than once a decade, especially because this seems to be the amount of time the research community takes to fully digest the last round of studies and formulate new and better theories to test in the next wave. Too frequent a cycle of surveys, I fear, will simply routinize the work, leading to studies that gather the same data over and over again, and to data analyses that only partially digest any one set of findings. Still, I am mindful of the need to maintain a research infrastructure in nations that might otherwise lack it should funding for large-scale, cross-national surveys disappear; as a result, I can propose one alternative to a once-a-decade approach to cross-national surveys. That would be to conduct studies focused on one or a few academic subjects at intervals less than a decade—say every three years or so—rather than mounting one large, multisubject study each decade. In this design, there would be time for conceptual work between cycles of subject-matter testing, but there also would be a constant stream of work and data for the cross-national research community.

In closing, let me reiterate a point I hope I made clear throughout this essay. Despite the caveats I raised about cross-national surveys of student achievement, and despite my concerns about their validity for setting educational standards and informing the process of school improvement in the United States, I firmly believe that such studies meet many of the goals their advocates hold for them. The cross-national surveys have helped stimulate a national and public debate about educational standards in the United States, and have done so on a regular basis. They have also pointed the way to some very interesting designs for educational improvement in the United States that have given rise to some very interesting efforts at school improvement. What this line of work needs to

Page 347 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

improve, in my view, is no more or less than what any other well-conceived and well-conducted research program needs—more time to develop methods and theories that can be tested and revised on a regular basis. What the collection of papers in this volume contributes to this process is some very good insights into the theories and methods that can guide this development.

NOTES

1.	Thanks to Larry Suter of the National Science Foundation for this insight.
2.	In fact, this was precisely the approach used by the New Standards project described in National Research Council (1995).
3.	The difficult problems of selection bias present in this age cohort in TIMSS are discussed in more detail in both Raudenbush and Kim (this volume) and Smith (this volume).
4.	What would we conclude, for example, if it was found that the education system in the United States worked to reduce initial (preschool) dispersions in achievement among different social groups and led to “average” or above levels of achievement in international comparisons at age twenty-two? Would we not argue that our system was in fact achieving its main purposes, despite room for improvement? And would it not be interesting to compare dispersions in achievement, as these unfolded over the life course, as well as average levels of achievement at different time points, across a well-chosen sample of nations?
5.	I am thinking here of the line of studies that originated in the 1970s with the National Longitudinal Survey, that progressed through High School and Beyond, and has continued with the National Educational Longitudinal Survey and the Early Childhood Longitudinal Study.
6.	In fact, the development of carefully constructed measures of OTL were pioneered in the cross-national surveys, where elaborate lists of curricular content were prepared and where teachers were asked to report the extent to which they “covered” such content on self-administered questionnaires. Such elaborate measures of OTL (or content coverage) have begun to appear in large-scale surveys of student achievement in the United States, especially studies sponsored by the U.S. Department of Education’s Planning and Evaluation Services (see, for example, the Instructional Dimensions Study, the Sustaining Effects Study, and Prospects), where the measures show statistically significant relationships with student achievement. Good measures of curriculum coverage in schools have not, however, been a hallmark of NCES-sponsored longitudinal studies.
7.	Raudenbush and Kim (this volume) raise an additional set of concerns about the various strategies used by comparative educationists to make causal inferences about system-level change using cross-national surveys, including cautions about missing data problems and how these problems affect cross-national analyses of cohort differences in achievement and system-level analyses of changes in achievement over time. I do not discuss these forms of analysis or Raudenbush and Kim’s critique of them in this chapter, except to note that these kinds of analyses, and the problems discussed by Raudenbush and Kim, are central to the important goal of cross-national work discussed by BICSE (National Research Council, 1993), namely, using repeated surveys to examine trends in schooling within and between nations of the world.
8.	I thank Bill Schmidt for suggesting this as a real possibility.
9.	I am thinking here again of the New Standards project and the work of the First in the World Consortium.
10.	I thank David Baker for this insight.

Page 348 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

REFERENCES

Baker, D. P. (1993). Compared to Japan, the U.S. is a low achiever . . . really: New evidence and comment on Westbury. Educational Researcher, 22(3), 18-20.

Baker, D. P., Goesling, B., & LeTendre, G. K. (2000). Social class, school quality, and national development: A cross-national analysis of the “Heyneman-Loxley” effect. Unpublished manuscript, Pennsylvania State University, College of Education.

Berliner, D. C., & Biddle, B. J. (1995). The manufactured crisis: Myths, fraud and the attack on America’s public schools. Reading, MA: Addison Wesley.

Bracey, G. W. (1997). Setting the record straight: Responses to misconceptions about public education in the United States. Alexandria, VA: Association for Supervision and Curriculum Development.

Carroll, J. (1963). A model for school learning. Teachers College Record, 64, 723-733.

Central Advisory Council for Education. (1967). Children and their primary schools. Vols. 1-2: The Plowden Report. London: Her Majesty’s Stationery Office.

Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: U.S. Department of Health, Education, and Welfare.

Heyneman, S. P., & Loxley, W. A. (1983). The effect of primary school quality on academic achievement across twenty-nine high- and low-income countries. American Journal of Sociology, 88, 1162-1194.

Husen, T. (1987). Policy impact of IEA research. Comparative Education Review, 31(1), 129-136.

Meyer, J. W., Kamens, D., & Benavot, A. (1992). School knowledge for the masses: World models and national primary curriculum categories in the twentieth century. London: Falmer Press.

National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government Printing Office.

National Research Council (1992). Assessing evaluation studies: The case of the bilingual education strategies. Panel to Review Evaluation Studies of Bilingual Education, M. M. Meyer & S. E. Fienberg, eds. Committee on National Statistics, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.

National Research Council. (1993). A collaborative agenda for improving international comparative studies in education. Board on International Comparative Studies in Education, D. M. Gilford, ed. Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.

National Research Council. (1995). International comparative studies in education: Descriptions of selected large-scale assessments and case studies. Board on International Comparative Studies in Education, Commission on Behavioral and Social Sciences and Education. Washington, D.C.: National Academy Press.

Ramirez, F. O., & Boli, J. (1987). The political construction of mass education: European origins and worldwide institutionalization. Sociology of Education, 60, 2-17.

Schmidt, W. H., McKnight, C. C., Cogan, L. S., Jakwerth, P. M., & Houng, R. T. (1999). Facing the consequence: Using TIMSS for a closer look at U.S. mathematics and science education. Boston: Kluwer Academic.

Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A splintered vision: An investigation of U.S. science and mathematics education. Boston: Kluwer Academic.

Stevenson, D. & Baker, D. (1991). State control of the curriculum and classroom instruction. Sociology of Education, 64, 1-10.

Stigler, J., & Hiebert, J. (1999). Teaching is a cultural activity. American Educator, Winter, 4-11.

Page 349 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×

Tilly, C. (1984). Big structures, large processes, huge comparisons. New York: Russell Sage Foundation.

Westbury, I. (1992). Comparing American and Japanese achievement: Is the United States really a low achiever? Educational Researcher, 22(3), 18-24.

Page 350 Cite

Suggested Citation:"11. Large-Scale, Cross-National Surceys of Educational Achievement: Promises, Pitfalls, and Possibilities." National Research Council. 2002. Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: The National Academies Press. doi: 10.17226/10322.

×