The Challenge and the Charge
In the United States, where much educational decision making is undertaken at the state or local level, the availability of a variety of curricula is both expected and desired. However, the many products and approaches to curricula are likely to result in varied quality and effectiveness. Consequently, state and local decision makers need valid, informative, credible, and cost-efficient evaluation data on curricula effectiveness to assist them in the interpretation and use of these data. National-level policy makers and agencies and commercial publishers that support the development of curricula also must be assured that the funds expended for such purposes result in development of curricula and associated resources that demonstrably enhance learning. Methodologically sound evaluations of those materials are essential.
However, no single method of evaluation alone is sufficient. Evaluation necessarily involves value judgments and requires careful consideration of evidence. Well-conducted evaluations depend on the availability and distribution of resources, are expensive to undertake, and reflect contextual opportunities and constraints. Thus, decision makers need a flexible evaluation framework that provides a highly reliable and informative means of curricular review that fits local goals and expectations. Moreover, curricular decisions must be reexamined periodically, and curricula need to be revised based on data and professional judgment. Curriculum evaluations must accommodate local expectations, values, and resources.
To address this issue, a committee (hereafter referred to as “we”) was assembled by the National Research Council (NRC) in spring 2002. Our
assignment was to collect the evaluation studies of certain mathematics curricula developed by for-profit companies or with National Science Foundation (NSF) funds, or by a combination of the two, and to assess their quality. This report presents our conclusions and provides recommendations for improvements to the evaluation process.
NEED FOR THIS STUDY
Between 1990 and 2007, the NSF will have devoted an estimated $93 million, including funding for revisions, to 13 mathematics projects to “stimulate the development of exemplary educational models and materials (incorporating the most recent advances in subject matter, research in teaching and learning, and instructional technology) and facilitate their use in the schools” (NSF, 1989, p. 1). As these NSF-supported materials, which were informed by the publication of the National Council of Teachers of Mathematics (NCTM) Standards (NCTM, 1989), gained visibility, publishers also produced curriculum materials aligned with NCTM Standards or developed alternative approaches based on other standards.
These standards were viewed as a promising new approach for translating and infusing research results into classroom practice across the United States. Although each NSF-supported curriculum underwent individual evaluations, little emphasis was placed on reaching consensus about the particular aspects of the curricula to be analyzed or methods to be used. Furthermore, until these curricula had been used for a significant amount of time, no meta-analysis of NSF efforts as a whole in supporting new mathematics curricula could be undertaken.
In 1999, the U.S. Department of Education convened a Panel on Exemplary Programs in Mathematics whose recommended curriculum programs generated much controversy (Klein et al., 1999). Documented evidence of a curriculum’s effectiveness was included in the Panel’s criteria. Part of the controversy concerned the quality of this evidence. Because the NSF-supported materials have been marketed longer and additional evaluation studies have been conducted, reexamination of the adequacy of the evaluations is timely.
Such examination is essential because several factors indicate that the conditions that motivated NSF funding of those curriculum projects may still persist (McKnight et al., 1987; Schmidt, McKnight, and Raizen, 1996). The United States may not be meeting its own mathematical needs in producing students who are capable, interested, and successful in the following areas:
Attaining high school diplomas with adequate levels of mathematical knowledge and reasoning to function as an informed and critical citizenry (Adelman, 1999);
Undertaking study at two-year colleges without undue fiscal burdens imposed by the need for remedial mathematics activities (Adelman, 1999);
Pursuing advanced mathematics at the research level in mathematics and science (Lutzer, 2003); and
Pursuing mathematically intensive careers in technology fields, statistics, and “client disciplines”—engineering, chemistry, and, increasingly, fields such as biology, economics, and social sciences (NRC, 2003).
In addition, concerns for preparation of all students (Campbell et al., 2002) across the spectrum of academic achievement necessitate such examination, evaluation, and critique of mathematics curricula.
Currently, too many deliberations on mathematics curricular choices lack a careful and thorough review of the evaluations of mathematics curricula. Because of the cumulative nature of mathematics topics, a weak curriculum can limit and constrain instruction beyond the K-12 years. It can discourage students from entering mathematically intensive fields or hobble the progress of those who pursue them. International studies have heightened American awareness that our mathematics performance has deteriorated, especially in the 8th and 12th grades. Even the performance of the most advanced students has suffered (Takahira et al., 1998).
The impetus for ways to examine effectiveness of curricular reform was intensified with release of the 2003 National Assessment of Educational Progress report, known as the Nation’s Report Card, which showed significant improvements in mathematics achievement as reading scores remained constant. Average 4th-grade student performance increased nine points, while 8th-grade student scores increased by five points. Closer examination shows that the percentage of students identified as below basic levels of performance declined by 12 and 5 percentage points at the 4th and 8th grades, respectively. The majority of subsequent gains occurred in the number of students identified as proficient, the second-highest level. These gains were quite evenly distributed across ethnic groups and class lines. Interpreting the scores over successive years created methodological issues, and the factors instrumental in producing these gains are not known. Determining the extent to which these gains can be attributed to curricular reform requires application of sound, sophisticated evaluation design, establishing an additional need for this report.
TIMELINESS OF THE REPORT
This report is timely because a review of evaluations providing evidence on the effectiveness of mathematics curricula must be undertaken after the curriculum materials have been used under a variety of conditions and when the materials are in final editions rather than preliminary forms. Premature review would contribute to unrealistic perceptions that education can be easily fixed in a short period. An early review also could contribute to vacillation among approaches, wasted funding, and practitioners skeptical of change who cringe as they await future reforms to displace current efforts.
This review is also timely because of the federal No Child Left Behind Act of 2001. This law specifies that all educational programs should demonstrate effectiveness based on “scientifically based research.” Publishers, decision makers, and researchers are now seeking clear guidelines to determine whether their curriculum development programs meet this standard. Guidelines must be designed that are informed by and built on the state of evaluation data currently available. As committee members, we believe that funding decisions should be predicated on a realistic, honest assessment of the quality of the current knowledge base. Given this legislative mandate, we sought to define the phrase scientifically established as effective as applied to mathematics curricula. Our deliberations also have been informed by the use of the phrase scientific research in education, as articulated by the NRC report with the same name (NRC, 2002).
COMMITTEE CHARGE AND APPROACH
Our committee was assembled in June 2002 with the following charge:
The Mathematical Science Education Board will nominate a committee of experts in mathematics assessment, curriculum development, curriculum implementation, and teaching to assess the quality of studies about the effectiveness of 13 sets of mathematics curriculum materials developed through NSF support and 6 sets of commercially generated curriculum materials. A committee will collect available studies that have evaluated NSF-supported development and commercially generated mathematics education materials and establish initial criteria for review of the quality of those studies. The committee will receive input from two workshops of mathematics educators, mathematicians, curriculum developers, curriculum specialists, and teachers. The product will be a consensus report to NSF summarizing the results of the workshops, presenting the criteria and framework for reviewing the evidence, and indicating whether the currently available data are sufficient for evaluating the efficacy of these materials. If these data are not sufficiently robust, then the steering committee would also develop recommendations about the design of a subse-
quent project that could result in the generation of more reliable and valid data for evaluating these materials.
Originally we were to review evaluation data on the effectiveness of only NSF-supported mathematics curriculum materials. Our charge was amended to include evaluation data on the effectiveness of six sets of commercially generated mathematics materials. This expanded scope anticipated that methods of evaluation and data thus derived from commercially generated materials might differ from the methods used to evaluate the NSF-supported curriculum materials. By expanding its investigation to include commercially generated mathematics curricula, we anticipated learning about different techniques that might be incorporated into a curriculum evaluation framework. Investigating these alternative approaches to evaluation might be useful to a broader spectrum of people who evaluate mathematics curricula.
Our goal in writing this report is twofold. First, we aim to examine evidence currently available from the evaluation of effectiveness of mathematics curricula. Second, we will suggest ways to improve the evaluation process that will enhance the quality and usefulness of evaluations and help guide curriculum developers and evaluators in conducting better studies. To determine if the corpus of evaluations was “sufficient for reviewing the efficacy of the materials,” we examined both their methods and the conclusions, evaluating the quality of evidence and argument. We also distinguished between studies that were at least minimally methodologically adequate and those with methodology lacking sufficient rigor or relevance. Finally, “in order to make recommendations about the design of a subsequent project,” we summarized inferences that could be drawn from the patterns of findings of those “at least minimally methodologically adequate” studies that would inform the design and conduct of subsequent evaluations and an evaluation framework. However, to stay within the limits of our charge, we do not report summary data at the level of particular programs. Instead, we report at the level of program type, and use the summary data as a means to investigate the quality and stability of the evaluations. Furthermore, we recognize that design weaknesses of some evaluation studies render the summary statements only tentative. In this way, we sought to fulfill our charge by advancing “the design of a subsequent project that could result in the generation of more reliable and valid data for evaluating these materials.”
Establishing clearer guidelines for curricular evaluation becomes increasingly important as the number of U.S. publishing companies decreases through mergers, acquisitions, and purchase by international publishing conglomerates. This reduction in publishing companies is likely to affect curriculum development, review, revision, and adoption. Also needed are
criteria that enable researchers and policy makers to monitor and document the effects of these changes on future curricular options that become available to U.S. educators and students. Members of these corporations indicated to us that they welcome clear statements of their responsibilities in this arena of curricula evaluation.
Thus, this report considers issues related to policy, practice, and measurement in an integrated fashion. Policy makers should be knowledgeable of real practice demands and their effect on evaluations. They need expert advice on how to develop a plan of action that serves the needs of all constituents and is reliable, strategic, and feasible. At the same time, practice in education is complex and subject to multiple forces. It exists within multiple levels of organization, governance, and regulation. Practitioners, the majority of whom are teachers of mathematics, are charged with mathematics curricular implementation, and their professional preparation, knowledge, and experience are essential in selecting materials for their curricular effectiveness. Curricular evaluation must consider not only the quality of the materials but also a realistic assessment of the practice conditions in which these innovations are set. Thus, our efforts address the intended curriculum and the enacted curriculum.1 Finally, undertaking these studies within a scientific approach to educational research requires the clear articulation of the tenets that underlie evaluations of curricula effectiveness.
Chapter 2 begins with a discussion of the methods used to collect relevant evaluation studies. It describes the resulting database, methods, and criteria used to review these studies and to decide which evaluation studies should be included in the report. This chapter also describes the initial study characteristic coding system that was used to create and analyze the large database.
The database and study characteristics were then used to develop a framework for curriculum evaluation in mathematics. This framework is presented in Chapter 3. Based on the framework, we identified four major classes of evaluation studies—content analysis, comparative analysis, case, and synthesis. We divided into four subgroups to study each in depth. The subgroups refined the methodology to create a decision tree to map studies
into categories for further examination. Discussions of each of these categories, together with the refined methodology, appear in three chapters: Chapter 4 details content analysis studies, Chapter 5 details comparative studies, and Chapter 6 details case and synthesis studies. These subgroup reports were subsequently reviewed and discussed by the entire committee and were revised to relate to each other and to the framework.
Our conclusions and recommendations are listed in Chapter 7.