Read "Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress" at NAP.edu

Page 1 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Executive Summary

The National Assessment of Educational Progress (NAEP) is the only continuing measure of the achievement of the nation's students in key subject areas. Also known as ''the nation's report card,'' NAEP has provided periodic data regarding what American students know and can do for nearly 30 years. Throughout that time, NAEP results have been increasingly used by policy makers, educators, and the public as indicators of the nation's educational health. The NAEP program is sponsored by the U.S. Department of Education and administered by the National Center of Education Statistics (NCES). Since 1989, NAEP policy has been determined by the nonpartisan, independent National Assessment Governing Board (NAGB).

When NAEP was first administered in the late 1960s, and through the early 1980s, results were presented on a question-by-question basis; reports indicated the percentages of students who were able to answer each question correctly. Results were presented for the nation, for regions of the country, and for major demographic subgroups. Progress (or the lack thereof) was monitored by tracking changes over time in the percentages of students who correctly answered each question.

In the early 1980s, partly in response to the growing national concern about the quality and international competitiveness of the nation's educational system, reflected in such reports as the 1983 A Nation at Risk, NAEP was redesigned. As a result of the implementation of innovative design and analysis strategies, the program began reporting results based on performance on the entire assessment, rather than on a question-by-question basis. Results were presented as numerical

Page 2 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

scores (on a scale, for example, of 0 to 500) that summarized student achievement across a subject area for the nation, for demographic subgroups, and over time.

The utility of NAEP summary scores that answer questions such as "How well are American fourth graders achieving in mathematics?" and "How much has the science achievement of female students improved over time?" was recognized by the Alexander-James Panel in 1987 when it recommended that the NAEP program begin collecting and reporting state-level results. This enabled states to evaluate their students' achievement relative to the nation and to each other and to track their own progress in state-level education reform.

The congressional legislation that established the state NAEP program also mandated standards-based reporting of NAEP results; it stated that NAEP results should be presented both as overall scores and in terms of percentages of students who meet established standards for performance. Thus, in the 1990s, most NAEP assessments have reported summary scores and the percentages of students performing at or above basic, proficient, and advanced levels of performance. Recognizing the likely political ramifications of state-level and standards-based reporting, this same legislation established the National Assessment Governing Board, the independent body charged with determining policy for the NAEP program and overseeing standard-setting and the development of the frameworks that delineate what will be assessed in each of NAEP's subject areas.

These events in NAEP's history are evidence of the perceived utility of NAEP as a measure of student achievement. Indeed, through the 1990s, pressures on NAEP to do more and more beyond its established purposes have risen. Various educators and policy makers have suggested, for example, that NAEP be used as a lever for education reform, as an anchor for other assessments, as an accountability tool, and as an international assessment tool. In response to the many varied and competing demands on NAEP, NAGB and NCES currently are implementing a second redesign of NAEP intended to focus its purposes, streamline its design, and enhance its utility to its constituents.

It is against this backdrop of change and pressure on NAEP that the National Research Council's Committee on the Evaluation of National and State Assessments of Educational Progress conducted its congressionally mandated evaluation of the program. The committee examined NAEP's mission and measurement objectives; sampling, design, and analysis strategies; framework and assessment development and achievement-level-setting processes; and the reporting and utility of NAEP's results.

The committee focused its efforts on improving the utility of NAEP assessment results. It is clear that Americans want the kinds of information about the achievement of the nations' students currently provided by NAEP summary scores and achievement-level results. However, users of NAEP not only want to know about the overall achievement of students and their performance in relation to established standards for achievement; they also want and need information that helps them know what actions to take in response to NAEP results. In this report

Page 3 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

the committee provides a series of conclusions and recommendations, which focus on enabling the U.S. Department of Education, NCES, NAGB, and the NAEP program to provide more useful information about student achievement and the nation's educational systems to the community of educators, policy makers, and the public who can have an impact on education.

The primary messages of the report are highlighted below. Each is presented as a summary conclusion based on the committee's observations and analyses of the current National Assessment of Educational Progress, accompanied by a summary recommendation for action that can contribute to a satisfactory resolution of some of the issues facing the current assessment program. If implemented, these recommendations will greatly enhance the utility and information value of the NAEP assessments; if left unaddressed, NAEP's effectiveness and future prospects for success will be undermined.

CREATING A COORDINATED SYSTEM OF INFORMATION TO ASSESS EDUCATIONAL PROGRESS

Summary Conclusion 1. The current NAEP assessment has served as an important but limited monitor of academic performance in U.S. schools. Neither NAEP nor any other large-scale assessment can adequately measure all aspects of student achievement. Furthermore, measures of student achievement alone cannot meet the many and varied needs for information about the progress of American education.

In an attempt to satisfy the multiple needs of diverse users, the NAEP program has adopted varied, and often conflicting, objectives without changing its basic features. As a result, NAEP now has a complex and costly design and operational structure. This proliferation of users and uses is indicative of NAEP's perceived value as a social indicator and, in some sense, suggests that the NAEP program has been weighed down by its success.

In general, successful indicator systems not only perform a monitoring function, but also help users understand results. Indeed, an examination and analysis of the purposes ascribed to NAEP is consistent with this observation; users want NAEP to:

Provide descriptive or "barometer" information. Stakeholders want NAEP to serve as a monitor of American students' academic performance and progress.
Serve an evaluative function by helping NAEP users know whether students' performance is "good enough." The establishment of performance standards in NAEP potentially allows policy makers and others to judge whether observed performance measures up to externally defined goals.

Page 4 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Provide interpretive information to help NAEP's users better understand achievement results and begin to investigate their policy implications.

Both historically and currently, NAEP serves as a good barometer of student achievement. However, the interpretive and evaluative functions are currently not well achieved by NAEP. The question is how to accomplish these functions without further burdening NAEP. A solution for enhancing the interpretive function lies in a broader conceptualization of progress in American education.

Summary Recommendation 1. The nation's educational progress should be portrayed by a broad array of education indicators that includes but goes beyond NAEP's achievement results. The U.S. Department of Education should integrate and supplement the current collections of data about education inputs, practices, and outcomes to provide a more comprehensive picture of education in America. In this system, the measurement of student achievement should be reconfigured so that large-scale surveys are but one of several methods used to collect information about student achievement.

STREAMLINING NAEP'S DESIGN

Summary Conclusion 2. Many of NAEP's current sampling and design features provide important, innovative models for large-scale assessments. However, the proliferation of multiple independent data collections—national NAEP, state NAEP, and trend NAEP—is confusing, burdensome, and inefficient, and it sometimes produces conflicting results.

NAEP has many strong features. Its frameworks and sample assessment materials have the potential to stimulate national debate about teaching and learning. The assessment items and tasks have served as important guides and benchmarks for state and local assessment development efforts. NAEP's sampling, scaling, and analysis procedures serve as important models for the measurement community.

However, several factors suggest that NAEP's design should be simplified: recent discrepancies between results from trend NAEP and main NAEP assessments; the burden on states and schools that is created by participating in multiple data collection efforts; and the inherent inefficiencies associated with the ongoing administration of assessments for every trend line that the NAEP program supports. Exploration and implementation of methods to merge the trend NAEP and main NAEP assessments, and to streamline the data collections for the national and state components of main NAEP, are clearly warranted.

Page 5 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Summary Recommendation 2. NAEP should reduce the number of independent large-scale data collections while maintaining trend lines, periodically updating frameworks, and providing accurate national and state-level estimates of academic achievement.

IMPROVING PARTICIPATION AND ASSESSMENT OF ALL STUDENTS IN NAEP

Summary Conclusion 3. NAEP has the goal of reporting results that reflect the achievement of all students in the nation. However, many students with disabilities and English-language learners have been excluded from the assessments. Some steps have been taken recently to expand the participation of these students in NAEP, but their performance remains largely invisible.

Historically, the NAEP program has done little to understand the special testing needs and achievements of students who have disabilities or for whom English is a second language. Although some successful steps to enhance the participation of these students in NAEP assessments have been implemented, the performance of many of them is not included in NAEP's overall results. In addition, inconsistent criteria for identifying these students and for including them in the assessments potentially influences overall results in unknown ways.

Summary Recommendation 3. NAEP should enhance the participation, appropriate assessment, and meaningful interpretation of data for students with disabilities and English-language learners. NAEP and the proposed system for education indicators should include measures that improve understanding of the performance and educational needs of these populations.

PROVIDING MORE COMPLETE AND INFORMATIVE PORTRAYALS OF STUDENT ACHIEVEMENT

Summary Conclusion 4. The current assessment development process for main NAEP, from framework development through reporting, is designed to provide broad coverage of subject areas in a large-scale survey format. However, the frameworks and assessment materials do not capitalize on contemporary research, theory, and practice in ways that would support in-depth interpretations of student knowledge and understanding. Large-scale survey instruments alone cannot reflect the scope of current frameworks or of more comprehensive goals for schooling.

Page 6 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

As NAEP's frameworks and assessments have evolved and changed, so has scientific understanding of the nature of student learning as well as understanding of the complex nature of curriculum. Unfortunately, many of the changes in NAEP instrumentation over the last 30 years reflect only minimally the changes in certain critical areas of scientific knowledge. In fact, the core assumptions related to cognition and curriculum that underlie NAEP's assessment design have remained relatively unchanged while research and theory in these areas has advanced substantially. NAEP's consensus-based frameworks and the assessments based on those frameworks focus on covering the breadth of a subject-area content. However, they do not fully capitalize on current research and theory about what it means to understand concepts and procedures, and they are not structured to capture critical differences in students' levels of understanding. Thus, they do not lead to portrayals of student performance that deeply and accurately reflect student achievement.

The development of such portrayals will require the use of multiple methods for measuring achievement that go beyond current large-scale assessment formats. The NAEP program has been a leader among large-scale testing initiatives with respect to developing and applying innovative procedures to assess more complex aspects of achievement, but it is clear that large-scale survey methods alone are not adequate for assessing complex aspects of achievement described in current frameworks. Nor are they adequate for assessing broader conceptualizations of achievement that are consonant with the more comprehensive goals for schooling that will be prominent in the 21st century.

Summary Recommendation 4. The entire assessment development process should be guided by a coherent vision of student learning and by the kinds of inferences and conclusions about student performance that are desired in reports of NAEP results. In this assessment development process, multiple conditions need to be met: (a) NAEP frameworks and assessments should reflect subject-matter knowledge; research, theory, and practice regarding what students should understand and how they learn; and more comprehensive goals for schooling; (b) assessment instruments and scoring criteria should be designed to capture important differences in the levels and types of students' knowledge and understanding both through large-scale surveys and multiple alternative assessment methods; and (c) NAEP reports should provide descriptions of student performance that enhance the interpretation and usefulness of summary scores.

Page 7 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

SETTING REASONABLE AND USEFUL ACHIEVEMENT STANDARDS

Summary Conclusion 5. Standards-based reporting is intended to be useful in communicating student results, but the current process for setting NAEP achievement levels is fundamentally flawed.

Although reporting student achievement in relation to clearly defined performance standards fulfills a highly desired evaluative role for NAEP, the current achievement levels have not yet realized their potential impact on the education community. This committee, as well as the U.S. General Accounting Office, the National Academy of Education, and other evaluators, have judged the current achievement-level-setting model and results to be flawed. It is clear that the current processes are too cognitively complex for the raters, and there are notable inconsistencies in the judgment data by item type. Furthermore, NAEP achievement-level results do not appear to be reasonable compared with other external information about students' achievement.

Summary Recommendation 5. The current process for setting achievement levels should be replaced. New models for setting achievement levels should be developed in which the judgmental process and data are made clearer to NAEP's users.

The implementation of these recommendations, and more specific recommendations described in the body of the report, will require changes in the design and operations of the NAEP program and many other data collections of NCES. Most notably, the successful implementation of these recommendations will require that the design of NAEP's measures of student achievement adhere much more closely to the principle that assessment design should closely match the intended purpose of the assessment. It should not be assumed that large-scale assessments are the primary means by which the achievements of the nation's students are measured; the use of multiple alternative types of surveys and assessments will be required.

Large-scale assessments should remain as important components of the NAEP program; we recommend that the core subjects of reading, mathematics, science, and writing continue to be assessed in part using large-scale survey methods and that the measurement of trends continue in these subject areas. But we also recommend that multiple assessment strategies become a much more prominent component of the NAEP program and be used to measure, for example: achievement in subject areas not assessed frequently enough to establish trend lines; subject areas (or portions of subject areas) in which not all students receive instruction (e.g., fine arts, advanced mathematics); aspects of student achievement not well addressed by large-scale survey methods (e.g., scientific investigation and problem-solving strategies); and the accomplishments of students

Page 8 Cite

Suggested Citation:"Executive Summary." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

with disabilities and English-language learners. NAEP Report Cards should include results from the array of methods used to assess achievement in a subject area.

The development of an improved NAEP within a coordinated system of indicators is a major task and has cost implications. Streamlining NAEP's design may result in cost savings. The costs of implementing the coordinated system of indicators are likely to be substantial, as are the costs for improving the participation and assessment of English-language learners and students with disabilities. Use of multiple methods to assess student achievement in NAEP's subject areas will require reallocation of funds currently devoted to the development of the current large-scale survey assessments. However, substantial efforts to these ends will result in better descriptive, evaluative, and interpretive information about American students' academic achievement and educational progress broadly conceived.