Read "Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education" at NAP.edu

Page 9 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

CHAPTER ONE
What Is TIMSS?

Since the early 1960s, education research organizations in the United States and other countries have conducted several major international comparisons of student performance in mathematics and science. For example, the 1981 Second International Mathematics Study (SIMS) measured mathematics achievement among 13 year olds in 14 industrialized and 6 developing nations. It focused on curricula, classroom processes, preparation of teachers, and attitudes of teachers and students toward mathematics (McKnight et al., 1989). Similarly, in 1991 the International Assessment of Educational Progress assessed the mathematics and science skills of samples of 9 and 13 year olds from the United States and 19 other countries. In this and earlier assessments the scores of U.S. students generally fell into the lower part of the distribution of scores for the students sampled (Lapointe et al., 1992; U.S. Department of Education, 1992).

Page 10 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

The Third International Mathematics and Science Study (TIMSS), which was conducted over the course of several years in the mid-1990s, was by far the largest and most ambitious international assessment of student performance in mathematics and science. Like previous studies, TIMSS set out to assess how well students in different countries are able to solve mathematical and scientific problems at different stages of their education. In addition, TIMSS sought to set these achievement data in a much richer context than had been available before. It gathered an extensive variety of information about curricula, teaching practices, and the influences on teachers and students both inside and outside the classroom.

The data provided by TIMSS, along with information from previous international comparisons, have been an extremely valuable resource. They have called attention to factors associated with student achievement, thus identifying promising areas for future study. They have provided deep insights into different ways of teaching and learning, which has made possible reexamination of conventional U.S. practices. By opening a window onto the educational systems of other countries, TIMSS has revealed new possibilities for U.S. education.

For example, information from TIMSS and from previous studies has made it possible to answer questions that have immediate implications for teaching and learning. Do U.S. students know as much about mathematics and science as students in other countries? Are U.S. curricula as demanding or as well structured as the curricula in other countries? What do U.S. teachers actually do in the classroom, and how does this compare with what they say they are doing? How much support do teachers receive for mathematics and science education? How much time do students spend working in outside jobs, doing homework, and watching television? All of these questions and many more can be addressed using information gathered by TIMSS.

However, even a data set as extensive as that offered by TIMSS cannot answer many important questions in education. In particular, the educational system is so complex that it is difficult to link cause and effect conclusively. The TIMSS data cannot be used to select one or two education changes, such as revamping the curriculum or increasing the amount of homework students do, that will guarantee higher student performance. Nor was TIMSS designed to be an experimental study, where different students are randomly assigned to carefully balanced groups, the groups are treated differently, and the effects of those differences are then measured.

It also is important to recognize what TIMSS did not set out to study. It did not, for example, gather information about educational financing at the local and class levels. TIMSS also gathered less information about students in their last year of secondary school and their teachers than it did for students in populations I and 2. It did not assess the performance of students in college or ask whether U.S. college students eventually catch up with their international peers in areas where they have fallen behind. Finally, TIMSS gathered more information about mathematics than about science in many areas—for example, classes were videotaped only in mathematics, and some

Page 11 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

questionnaires were distributed only to mathematics teachers and students.

THE POPULATIONS STUDIED

TIMSS focused on students at three stages of their education: midway through elementary school, midway through lower secondary school, and at the end of upper secondary school (U.S. Department of Education, 1997a, p. 6). The selection of students therefore considered both their ages and their grade level.

At the elementary school level, TIMSS assessed the performance of students in the two adjacent grades containing the most 9 year olds (Table 1-1). In the case of the United States, this "population 1" group was drawn from grades three and four. Twenty-six countries participated in this part of the study (Table 1-2). In the United States, achievement data were collected from a sample of 3,819 third graders and 7,296 fourth graders in 189 public and private elementary schools (Martin et al., 1997, p. A-14; Mullis et al., 1997, p. A-16).

At the lower secondary school level, students were studied in the two adjacent grades containing the most 13 year olds. In the United States, this "population 2" group encompassed grades seven and eight. Forty-one countries participated in this part of the study. In the United States, 185 public and private junior high and middle schools participated in the tests, with a sample of 3,886 seventh graders and 7,087 eighth graders being tested (Beaton et al., 1996a, p. A-14; 1996b, p. A-14).

The third population studied consisted of students in their final year of secondary school. Because secondary schools conclude at different ages in different countries, the students in this population were not all the same age. In the United States, the students in this population 3 group were seniors in high school. A sample of about 11,000 high school seniors from 211 public and private high schools participated in the assessment of general knowledge in mathematics and science (Mullis et al., 1998, p. B-19). Twenty other countries also participated fully in this part of TIMSS. In addition, two sets of 16 countries, including in both cases the United States, tested smaller groups of students in physics and advanced mathematics.

Most of the testing occurred two to three

TABLE 1-1 Groups of Students Studied in TIMSS

Population 1	Students in the pair of adjacent grades containing the most 9 year olds	Grades three and four in the United States
Population 2	Students in the pair of adjacent grades containing the most 13 year olds	Grades seven and eight in the United States
Population 3	Students in their final year of secondary school, regardless of age	Grade 12 in the United States

Page 12 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

Table 1-2 Countries That Participated in the TIMSS Student Performance Assessments

Population 1	Population 2	Population 3 Math. and Sci. Literacy	Population 3 Advanced Mathematics	Population 3 Physics
Australia	Australia	Australia	Australia	Australia
Austria	Austria	Austria	Austria	Austria
—	Belgium (Flemish)	—	—	—
—	Belgium (French)	—	—	—
—	Bulgaria	—	—	—
Canada	Canada	Canada	Canada	Canada
—	Colombia	—	—	—
Cyprus	Cyprus	Cyprus	Cyprus	Cyprus
Czech Republic	Czech Republic	Czech Republic	Czech Republic	Czech Republic
—	Denmark	Denmark	Denmark	Denmark
England	England	—	—	—
—	France	France	France	France
—	Germany	Germany	Germany	Germany
Greece	Greece	—	Greece	Greece
Hong Kong	Hong Kong	—	—	—
Hungary	Hungary	Hungary	—	—
Iceland	Iceland	Iceland	—	—
Iran, Islamic Rep.	Iran, Islamic Rep.	—	—	—
Ireland	Ireland	—	—	—
Israel	Israel	—	—	—
—	—	Italy	Italy	—
Japan	Japan	—	—	—
Korea	Korea	—	—	—
Kuwait	Kuwait	—	—	—
Latvia	Latvia	—	—	Latvia
—	Lithuania	Lithuania	Lithuania	—
Netherlands	Netherlands	Netherlands	—	—
New Zealand	New Zealand	New Zealand	—	—
Norway	Norway	Norway	—	Norway
Portugal	Portugal	—	—	—
—	Romania	—	—	—
—	Russian Fed.	Russian Fed.	Russian Fed.	Russian Fed.
Scotland	Scotland	—	—	—
Singapore	Singapore	—	—	—
—	Slovak Republic	—	—	—
Slovenia	Slovenia	Slovenia	Slovenia	Slovenia
—	South Africa	South Africa	—	—
—	Spain	—	—	—
—	Sweden	Sweden	Sweden	Sweden
—	Switzerland	Switzerland	Switzerland	Switzerland
Thailand	Thailand	—	—	—
United States	United States	United States	United States	United States
Note: Dashes indicate that the country did not participate in that part of the assessment. Source: U.S. Department of Education, 1996, 1997b, 1998.

Page 13 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

months before the end of the 1994–95 school year. In each country, the tests were translated into the primary language or languages of instruction. All testing in the United States was done in English.

Worldwide, more than a half million students from some 15,000 schools participated in the TIMSS achievement tests, including approximately 33,000 U.S. students from more than 500 schools.

RANGE OF DATA

Much of the media coverage of the TIMSS results focused on the achievement comparisons, and the United States' ranking among nations is what many people still remember best about TIMSS. However, the achievement data were just one part of TIMSS.

TIMSS used five different methods to collect data: student achievement tests, questionnaire responses, curriculum analyses, videotapes of classroom instruction, and case studies of policy issues (Table 1-3).

TIMSS Achievement Tests

The half-million students that participated in TIMSS took tests that were an hour and a half long (U.S. Department of Education, 1997a, p. 7). The tests included both multiple choice problems and free-response exercises that asked students to solve problems in their own words. Each student answered a subset of the total set of questions, allowing for a broader testing of content than if all students answered all questions. A smaller number of students in many countries also completed hands-on performance assessments designed to gauge their skills in particular areas of mathematics and science.

The content to be tested in each subject and at each grade level was determined through a consensus process involving all of the partici

TABLE 1-3 Areas in Which Data Were Gathered in TIMSS

Data Gathered	Pop. 1 Math.	Pop. 1 Science	Pop. 2 Math.	Pop. 2 Science	Pop. 3 M&S Literacy
Achievement tests	X	X	X	X	X
Teacher questionnaires	X	X	X	X
Student questionnaires			X	X
Administrator questionnaires	X	X	X	X	X
Curriculum analyses	X	X	X	X	X
Videotaped lessons			X
Note: More information was gathered for mathematics than for science, and more information was collected at the population 2 level (seventh and eighth grades in the United States) than at either the population 1 (third and fourth grades) or population 3 (final year of high school) levels. The curriculum analysis covered all grades, not just those sampled in the TIMSS achievement tests. Lessons were videotaped only of mathematics classes and only in three countries: the United States, Germany, and Japan. Case studies were made of selected features of educational systems in those same three countries.

Page 14 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

pating countries. An international analysis of curricula was conducted so that the development of the assessments could reflect the curricula of participating countries. Pilot testing of assessments further reduced any bias toward or against particular countries.

To avoid making statistically meaningless distinctions that come with a strict ranking, U.S. publications describing the TIMSS achievement results divide participating countries into three bands: those that performed significantly better than the United States, those that performed at a level indistinguishable from that of the United States, and those that performed significantly worse than the United States. The results of the achievement tests are described in Chapter 2.

Questionnaires

Students, teachers, and administrators at the schools that participated in TIMSS answered questionnaires about important aspects of education. Students answered questions about their mathematics and science classes and about their attitudes toward these subjects. Teachers answered questions about their teaching practices, their backgrounds, and their instructional goals as well as their attitudes toward science and mathematics. School administrators were asked about school policies and practices, curriculum, staffing levels, and the availability of instructional resources, including science laboratories.

Curriculum Analyses

Researchers analyzed more than 1,000 mathematics and science textbooks and official curriculum guides from participating countries to determine what TIMSS researchers termed the "intended curriculum" (Beaton et al., 1996a, p. A-1). For each of these documents the subject-matter content, sequencing of topics, and expectations for student performance were coded. Questionnaires distributed to education experts within each country supplemented the curriculum analyses.

Videotapes of Classes

In the United States, Germany, and Japan, between 50 and 100 eighth-grade mathematics classes in each country were videotaped (Stigler and Hiebert, 1997, pp. 14–21; Stigler et al., 1999, p. 9). The tapes were digitized, transcribed, and translated, giving researchers virtually instant access to any part of the lessons. The tapes then were coded for the occurrence of various events, teaching strategies, and content elements, so that the lessons could be analyzed quantitatively. In addition, teacher questionnaires concerning the specific class sessions videotaped were collected, so that stated intentions and the actual teaching evident in the classroom could be compared.

The students and teachers in most of these tapes were guaranteed confidentiality, and those tapes are seen only by researchers. Several "public use" tapes also were collected in each country as examples to help communicate the results of the study (U.S. Department of Education, 1997c; Stigler et al., 1999, p. 9). Teachers and students who appear in these tapes agreed to have their lessons made available for public viewing.

Page 15 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

Case Studies

Also in the United States, Germany, and Japan, teams of bilingual researchers did a number of case studies of educational policies and practices (Stevenson and Nerison-Low, 1997; Stevenson, 1998, pp. 524–529). About 20 researchers, all of whom were familiar with the culture in which they worked, spent two to three months conducting interviews, conversations, and classroom observations in three metropolitan areas in each country. The researchers conducted interviews with pupils, teachers, parents, policymakers, education authorities, and other persons engaged in the education enterprise. A computer network linked all the researchers and enabled them to store and retrieve verbatim transcripts, observational records, and other field notes. The case studies focused on four topics: education standards, teacher education and teachers' working conditions, dealing with differences in student ability, and the place of school in adolescents' lives.

CRITICISMS OF TIMSS

Because international comparisons of student performance inevitably call attention to U.S. educational practices, all such comparisons have received intense scrutiny. TIMSS has been no exception.

Questions and criticisms of TIMSS and other international comparisons have fallen into several broad categories (Bracey, 1996). The first concerns whether comparable groups of students in each country are included in a study. For example, if one country tested only groups of students who would be expected to score higher on a test, its results may be skewed higher compared with results from a country that tested a more representative group of students.

The designers of TIMSS took a number of steps to avoid this selection bias (Beaton et al., 1996a, pp. A-9 through A-19). First, criteria were established to ensure that the schools selected and the students tested achieved certain participation rates. Countries could exclude a small percentage (less than 10 percent) of certain kinds of schools or students who would be very difficult or too resource intensive to test (e.g., schools for students with special needs or schools that were very small or located in extremely remote locations). Most countries excluded a much smaller percentage of schools and students than specified, and countries that did not meet this criterion were noted in the results.

Of the remaining schools, countries had to achieve participation rates of 85 percent of the schools and students selected (or a combined rate of 75 percent) to satisfy the sampling guidelines. Within each school, countries had to use random procedures to select the classes to be tested. All of the students in the selected classes participated in the TIMSS testing. An international committee scrutinized this selection and testing process to ensure that the students who participated in TIMSS were randomly selected to represent all students in their respective nations.

When nations did not meet the established standards, these exceptions were noted in analyses of the results. For instance, of the 26 nations that participated in the population 1 assessment, 17 met or came close to meeting all

Page 16 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

of the selection standards for the study. The other 9 countries did not meet the standards—for example, because the percentage of schools, teachers, or students declining to participate exceeded the sampling guidelines. These nations and the problems they had meeting the guidelines are identified in the published results.

A related set of criticisms of TIMSS involves the assessments of students at the end of their secondary education. For populations 1 and 2—except for a handful of TIMSS countries—virtually all children are enrolled in school and are therefore eligible to take the test. However, students not still enrolled in school by their final year of secondary school were not tested in TIMSS. Furthermore, because secondary school ends at different points in different countries, the average age of these students varied from country to country, and some have asked whether it is fair to compare students of different ages (Rotberg, 1998). Finally, because testing occurred toward the end of the school year, questions also have arisen about whether the U.S. seniors were motivated to do well on the test.

The average age of the U.S. students tested in population 3—18.1 years—was somewhat less than the average age of all the students in this population who took the test—18.7 years (Forgione, 1998). However, the mathematics and science literacy assessment at the population 3 level sought to measure knowledge that should have been learned several years earlier, lessening the effect of age differentials. Finally, one objective of this part of TIMSS was to assess performance when students in each country are deemed ready to enter the adult world, and differences in age are one measure of how this determination varies across countries.

Another major criticism of TIMSS and other international achievement tests is that the results depend largely on the sequence of topics within each country's overall curriculum and do not reflect the quality of those curricula or teaching practices. The first half of this point is certainly valid. As shown in Chapter 4, what students are taught does have a direct impact on their performance, and one of the goals of TIMSS was to explore this connection between curriculum and student knowledge.

TIMSS was designed, however, in such a way as to minimize the effects of curriculum differences. Extensive information on curriculum was factored into the tests' design so that they reflected the mathematics and science curriculum of all TIMSS countries and did not overemphasize what is taught in only a few. Questions on the test also were divided into separate subcategories so that performance in specific areas of mathematics and science could be compared with the detailed curriculum in different countries.

A related criticism of TIMSS suggests that widespread access to higher education in the United States reduces the importance of the subpar high school achievement results. But a substantial portion of high school graduates do not attend college—fewer than two-thirds of 1994 U.S. high school graduates were enrolled in a college or university the following fall. Many students who attend college never obtain a degree, and many of those who do take little mathematics or science in college. Furthermore, many students who do go to college need to take remedial courses in mathematics or science—one in four freshmen in 1995 took

Page 17 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

remedial math.

A final and much broader objection to TIMSS is that the countries compared are so different culturally that comparisons of student performance have little relevance (Bracey, 1997, 1998). For example, according to this line of argument, the extensive academic work that many Asian students do outside school to prepare for high school and college entrance exams makes comparisons with U.S. students meaningless. Critics of international comparisons also point to such intangibles as creativity, motivation, perseverance, flexibility, and entrepreneurial skill as positive outcomes of U.S. education that international comparisons cannot measure.

As with the other differences among countries, cultural differences are part of what TIMSS set out to study (Baker, 1997a, 1997b). TIMSS gathered data on a wide variety of cultural influences, such as the amount of time students spend working, watching television, and doing homework; the background and experiences of teachers; and student and teacher attitudes toward mathematics and science. Each of these factors is a potential explanation for differences in student understanding of mathematics and science, as are differences in curriculum and instruction.

Moreover, differences in culture do not invalidate comparisons of students. TIMSS set out to measure basic skills that people must use throughout their lives such as reasoning, application of knowledge, and designing multistep solutions. Parents, educators, and policymakers are legitimately interested in how these skills vary from country to country.

In general, the TIMSS results are broadly consistent with the findings of earlier and more limited comparisons of international academic performance (Stedman, 1997). Younger students in the United States tend to do better in international comparisons than do older students. In particular areas, U.S. students perform much more poorly than do students in other countries, and this poor performance persists across the various grades tested. For example, at both the fourth and the eighth grades the comparatively weakest part of U.S. students' performance in science was in the physical sciences, a finding that also applies in the last year of high school.

WHAT OTHER STUDIES ARE UNDER WAY?

TIMSS generated a huge body of data. Even today some of the basic studies from TIMSS have not yet been released, and reanalyses of data already released will continue for years to come.

At the same time, new international studies are now being planned that will extend the results from TIMSS. The most directly related follow-up study is known as the TIMSS Repeat Project, or TIMSS-R. This study gathered achievement data very similar to the TIMSS data for the upper grade of population 2 (eighth grade in the United States) in 1999. Because TIMSS tested students in 1994–95, the students in population 1 for the original TIMSS will be in population 2 for TIMSS-R, making it possible to compare the progress of different groups of students over time. TIMSS-R also will include background questionnaires for students, teachers, and schools to investigate instructional practices and aspects of the

Page 18 Cite

Suggested Citation:"1 What is TIMSS?." National Research Council. 1999. Global Perspectives for Local Action: Using TIMSS to Improve U.S. Mathematics and Science Education. Washington, DC: The National Academies Press. doi: 10.17226/9605.

×

learning environment. In 1998, 31 countries conducted field tests for the study, and 8 additional countries planned to join the main data collection stage.

As a separately funded project, the U.S. Department of Education is sponsoring a videotape project to extend the TIMSS videotape study of eighth-grade mathematics teaching in the United States, Japan, and Germany. The new videotape study will encompass additional countries as well as an analogous taping and analysis of eighth-grade science teaching.

Another closely related project is the Program for International Student Assessment (PISA) being conducted by the Organization for Economic Cooperation and Development (OECD). PISA will measure students' knowledge, skills, and competencies in three areas—reading, mathematics, and science. The overall strategy is to collect in-depth information on student outcomes in one of these three domains every three years, with a minor focus on the other two content domains. The major focus for the first survey, which will take place in the year 2000, is on reading, with a minor focus on mathematics and science. The major focus in 2003 will be mathematics, and in 2006 it will be science. The subjects of this study will be nationally representative samples of 15 year olds, the highest age at which school enrollment in OECD countries is essentially universal. About 25 OECD countries are expected to participate, and they likely will be joined by a number of other countries.

These studies and the continuing analysis of results from TIMSS will provide a continual flow of new information about mathematics and science education in the United States and in countries around the world. The challenge, which is taken up in the next four chapters of this report, is to use this information to help guide improvements in the curricula, teaching, and educational environments experienced by all students.