Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 9
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress Introduction CHANGING SOCIOPOLITICAL CONTEXT OF NAEP Since its establishment in the late 1960s, the National Assessment of Educational Progress (NAEP) has become a very significant part of America's educational landscape. NAEP has earned a reputation as the nation's best measure of student achievement in key subject areas over time, and, increasingly, its results get the attention of the press, the public, and policy makers as indicators of the nation's educational health. However, over its 30-year history, the sociopolitical context in which NAEP exists has changed significantly. Partly in response to this changed context, many major changes have been made in NAEP; it has become an exceedingly complex entity, reflecting the desires and needs of multiple constituencies. Perhaps the most critical feature of the changing context has been a deep and increasingly public concern about the quality of education in the United States. Concern about the condition of U.S. schools and levels of students' achievement began in 1957 with the launching of Sputnik and has been amplified by numerous documents and reports, such as A Nation at Risk (National Commission on Excellence in Education, 1983), a watershed publication in promoting public awareness of the shortcomings of American education. Public concern has led to increased investment in education at all levels. The world has changed substantially over the last three decades, witnessing the fall of the Iron Curtain, the "triumph" of capitalism over communism, and the shift to a highly competitive global economy. Discussions of workforce readiness, especially the international competitiveness of America's workforce, permeate
OCR for page 10
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress the media. The increasingly intense focus on the results of large-scale assessments, including those from programs such as NAEP, reflects the desire to know how the United States stands in comparison to past performance and, most especially, in comparison to international competitors. Such comparisons have become a routine part of America's economic, social, and political rhetoric. And comparisons are not limited to contrasts between the United States and other industrialized countries; they include comparisons of states with each other and with international benchmarks. Increasingly, states want indicators of the quality of their education systems, partly to evaluate the return on investments made to support education reform since the mid-1980s. Today, a key focus of the concern is a debate on questions of accountability. Citizens, educators, and policy makers—at levels from local school districts to the federal bureaucracy—want to know whether the substantial investments that have been made in education are reaping rewards. Accountability has become the goal of educational policy makers, the business community, and the public; this focus on accountability is closely tied to burgeoning awareness of the changing nature of commerce and the emergence of internationalism. Large-scale, high-stakes assessment programs have become the proposed means to that end. The focus on U.S. academic achievement was further heightened by the promulgation of national education goals during the early 1990s. Objectives such as "being first in the world in mathematics and science by the year 2000" (P.L. 103-227, Goals 2000: Educate America Act, 1994), regardless of how unrealistic they may be, have served to raise the political ante. The national education goals, together with the development of national standards in multiple curriculum areas, have been a dominant force in shaping American educational policy during this decade. For example, various federal policies and legislation have been enacted promulgating a top-down strategy for systemic reform. Examples include legislation that requires states to adopt more rigorous standards for curriculum and student achievement in order to obtain federal funds (P.L. 103-328, Improving America's Schools Act, Title 1, 1994). Although states are free to set their own standards, federal review of those standards requires that they must be rigorous and aligned with various national standards, such as the Curriculum and Evaluation Standards for School Mathematics (National Council of Teachers of Mathematics, 1989) and the National Science Education Standards (National Research Council, 1996a). Several changes in the NAEP program, including the introduction of a state assessment program and standards-based reporting, are a direct outgrowth of this confluence of forces, and there is little doubt that NAEP has been exceedingly responsive at both the federal and state levels. As a result, it has achieved prominence as the country's primary vehicle for monitoring levels of educational achievement. In fact, many groups want more NAEP—more often, more subjects, and with faster reporting—albeit at less cost. The popularity of the nation's national assessment program is a blessing, but also a curse: much of NAEP's
OCR for page 11
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress current complexity is a product of these pressures, and its capacity for change may be limited by its prominence. COMMITTEE CHARGE It is in this context of the sociopolitical and educational changes of the past 30 years, and of the challenges NAEP faces as a result of those changes, that this committee has conducted an evaluation of NAEP. Our charge, levied by Congress, includes evaluation of the national assessment, the state program, the student performance standards, and the extent to which the results are reasonable, valid, and informative to the public (P.L. 103-382). It is also with a congressional mandate for ongoing evaluation of NAEP that we conduct this work. In many important ways, our evaluation research builds on the work of previous evaluators. The National Academy of Education reviewed the NAEP administrations in 1990, 1992, and 1994 (National Academy of Education, 1992, 1993, 1996). The Technical Review Panel for NAEP conducted evaluation and other research during this same period. Some of the work of the NAEP Validity Studies Panel also is evaluative in nature. In addition, analysts from NAEP's sponsoring and cooperating agencies, contractors, and advisers conduct research on an ongoing basis on the psychometric properties of NAEP, its use, and the value of its results. We build on this broad base of information in this report. We reiterate and synthesize the results of prior evaluators and researchers. We discuss earlier findings and recommendations as a conceptual foundation for what we hope is a unique and important contribution to the reconceptualization of NAEP's measures of student achievement and to a broadening of the definition of ''the assessment of educational progress.'' We rely on earlier work and on our own research to provide a unifying vision for assessing educational progress and charting NAEP's future. We began this work in 1996 with an analysis of the policy directives of the National Assessment Governing Board (NAGB) for future NAEP assessments; we reviewed the May 1996 draft of NAGB's policy statement, entitled Policy Statement on Redesigning the National Assessment of Educational Progress (National Assessment Governing Board, 1996) in an earlier committee report (National Research Council, 1996b). We deliberated about and prepared a volume on standard setting (Applied Measurement in Education, 1998). We commissioned a series of papers on NAEP's mission and measurement objectives and on varied sampling, data collection, and analysis issues (National Research Council, 1999). Our evaluation culminates in this report and with suggestions for advancing the agenda it lays out.
OCR for page 12
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress HISTORY AND CURRENT STATUS OF NAEP In 1963 Francis Keppel, then U.S. Commissioner of Education, appointed a committee to explore options for assessing the condition and progress of American education. The committee's chair, Ralph Tyler, described the need for a base of information to help public officials make decisions about education (Tyler, 1966:95): [D]ependable information about the progress of education is essential. … Yet we do not have the necessary comprehensive and dependable data; instead, personal views, distorted reports, and journalistic impressions are the sources of public opinion. This situation will be corrected only by a careful, consistent effort to obtain data to provide sound evidence about the progress of American Education. In 1966 the Keppel committee recommended that a battery of tests be developed to the highest psychometric standards and with the consensus of those who would use it. NAEP was conceived to provide that information base and to monitor the progress of American education (National Center for Education Statistics, 1974; Office of Technology Assessment, 1992). NAPE'S Original Design A number of key features were recommended in the original design of the assessment (Jones, 1996). With respect to matters of content, each assessment cycle was supposed to target one or more broadly defined subject areas that corresponded to familiar components of school curricula, such as mathematics. Although the subjects to be assessed were defined by the structure of school curricula, NAEP was intended to assess knowledge and skills that were not necessarily restricted to school learning. For each subject area, panels of citizens would be asked to form consensus groups about appropriate learning objectives at each target age for that particular subject area. Test questions or items were then to be developed bearing a one-to-one correspondence to particular learning objectives. Thus, from NAEP's beginning, there were heavy demands for content validity as a part of the entire development process. There were also a number of interesting technical design features proposed for the assessment program. For example, multiple-choice item formats were to be discouraged in favor of short-answer items and those that asked students to perform tasks, features that would further support the content validity of the assessment.1 Some items and tasks would need to be administered individually, whereas others could be administered to small groups. All test items, whether 1 Despite this proposed design feature, throughout the 1970s and 1980s multiple-choice items were predominant in all NAEP subject-area assessments except for writing.
OCR for page 13
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress administered individually or in group formats, would be presented by trained personnel rather than by local school personnel in order to maintain uniformly high standards of administration. Of special note was the proposal for using a matrix-sampling design, a design that distributes large numbers of items broadly across school buildings, districts, and states but limits the number of items given to individual examinees. In essence, the assessment would be designed to glean information from hundreds of items, several related to each of many testing objectives, while restricting the amount of time that any student would have to spend responding to the assessment. The target period was proposed to be approximately 50 minutes per examinee. For each assessment cycle, test booklets would include items for each subject assessed in that cycle, with a distribution of easy, moderately difficult, and hard test items. The latter feature was intended to ensure that all respondents would have a probability of succeeding on some, but not necessarily all, of the items that they were given. Items and tasks not only would be presented in printed form but also would be read aloud by tape recording to permit even poor readers to demonstrate what they knew in subjects other than reading. This was also intended as a mechanism to pace performance so that all students would have sufficient time to work through every test item. At all ages, the multiple-choice items would include the response choice, "I don't know," to discourage guessing and nonresponse. The populations of interest for NAEP were to be all U.S. residents at ages 9, 13, and 17, as well as young adults. This would require the selection of private and public schools into the testing sample, as well as selection of examinees at each target age who were not in school. Results would then be tabulated and presented by age and by demographic groups within age—but never by state, state subunit, school district, school, or individual. Assessment results would be reported to show the estimated percentage of the population or subpopulation that answered each item and task correctly. And finally, only a subset of the items would be released with each NAEP report. The unreleased items would remain secure, to be administered at a later testing for determining performance changes over time, thereby providing the basis for determining trends in achievement. The agenda laid out for NAEP in the mid-1960s reflected the political and social realities of the time (National Assessment Governing Board, no date). Prominent among these was the resistance of state and local policy makers to a national curriculum; state and local leaders feared federal erosion of their autonomy and voiced concern about pressure for accountability. The designers responded by defining testing objectives for NAEP that were too expansive to be incorporated into any single curriculum. They specified that results be reported for specific test items, not in relation to broad knowledge and skill domains. Tests were developed for and administered to 9-, 13-, and 17-year-olds rather than to students at specific grade levels. These features thwarted perceptions of
OCR for page 14
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress the program as a federal testing initiative addressing a nationally prescribed curriculum. Indeed, NAEP's design provided nationally and regionally representative data on the educational condition of American schools while avoiding any implicit federal standards or state, district, and school comparisons. NAEP was coined the "nation's educational barometer." Redesign of the Original Plan As NAEP's design emerged, however, the educational landscape changed. There was a dramatic increase in the racial and ethnic diversity of the school-age population and a heightened commitment to educational opportunity for all. Schools across the United States developed new programs to respond to various federally sponsored education initiatives. The Elementary and Secondary Education Act of 1965 established mechanisms through which schools could address the learning needs of economically disadvantaged students. In the ensuing years, federal support expanded to provide additional resources for English-language learners and students with disabilities. As federal initiatives expanded educational opportunities, they fostered an administrative imperative for assessment data to help gauge the effect of these opportunities on the nation's education system. NAEP's original design could not accommodate the increasing demands for data about these educationally important populations and issues. Age-level (rather than grade-level) testing made it difficult to link NAEP results to state and local education policies and school practices. Furthermore, its reporting scheme allowed for measurement of change on individual items, but not on the broad subject areas; monitoring the educational experiences of students in varied racial and ethnic, language, and economic groups was difficult without summary scores. Increasingly, NAEP was asked to provide more information so that government and education officials would have a stronger basis for making judgments about the adequacy of education services; NAEP's constituents were seeking information that, in many respects, conflicted with the basic design of the program. The first major redesign of NAEP took place in 1984, when responsibility for its development and administration was moved from the Education Commission of the States to the Educational Testing Service. The design for NAEP's second generation (Messick et al., 1983) changed the sampling, objective-setting, item development, data collection, and analysis. Tests were administered by age and grade groupings; summary scores were provided for each subject area. These and other changes afforded the program much greater flexibility in responding to policy demands as they evolved. Almost concurrently, however, the earlier mentioned report, A Nation at Risk (National Commission on Excellence in Education, 1983), was issued. It warned that America's schools and its students were performing poorly. The report's publication spawned a wave of state-level education reforms. As states invested
OCR for page 15
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress more and more in their education systems, they sought information about the effectiveness of their efforts. State-level policy makers looked to NAEP for guidance on the effectiveness of alternative practices. The National Governors' Association issued a call for state-comparable achievement data, and a new report, The Nation's Report Card (Alexander and James, 1987), recommended that the NAEP program be expanded to provide state-level results. This set of recommendations departed dramatically from the political sensitivities that guided NAEP's inception. As the program retooled to accommodate this change, participants in a 1989 education summit in Charlottesville, Virginia set out to expand NAEP even further. At the summit, President George Bush and the nation's governors challenged the prevailing assumptions about national expectations for achievement in American schools. They established six national goals for education and specified the subjects and grades in which progress should be measured with respect to national and international frames of reference (Alexander, 1991). By design, these subjects and grades paralleled NAEP's structure. The governors called on educators to hold students to "world-class" standards of knowledge and skill. The governors' commitment to high academic standards included a call for the reporting of NAEP results in relation to rigorous performance standards. They challenged NAEP to describe not only what students currently know and can do, but also what young people should know and be able to do as participants in an education system that holds its students to high standards. Current NAEP The program that resulted is the NAEP we know today. It is a large and complex program. Current NAEP includes two distinct assessment programs with different instrumentation, sampling, administration, and reporting practices. The two assessments are referred to as trend NAEP and main NAEP. Trend NAEP is a collection of test items in reading, writing, mathematics, and science that have been administered many times over the last three decades. As the name implies, trend NAEP is designed to document changes in academic performance over time. During the current decade, trend NAEP will have been administered in 1990, 1992, 1994, 1996, and 1999. Trend NAEP is administered to nationally representative samples of 9-, 13- and 17-year-olds. Main NAEP consists of test items that reflect current thinking about what students know and can do in the NAEP subject areas. They are based on recently developed content and skill outlines in reading, writing, mathematics, science, U.S. history, world history, geography, civics, the arts, and foreign languages. Typically, two subjects are tested at each biennial administration. Main NAEP has two components, national NAEP and state NAEP. National NAEP typically tests nationally representative samples of students in grades 4, 8, and 12. The object is to measure achievement in NAEP subject
OCR for page 16
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress areas in relation to current thinking about curriculum and instruction. In most but not all subjects, NAEP is administered two, three, or four times during a 12-year period, which makes it possible to examine changes in performance over a decade. National NAEP also occasionally includes assessment studies that do not rely exclusively on large-scale assessments; these are referred to as special studies. Special studies are designed to gather information on important aspects of achievement not well addressed by large-scale assessment methods; for example, recent studies focused on oral reading fluency and extended writing performance. The data from these studies are not used to measure trends in performance, and they usually include a wide range of data on curriculum and instruction in tested subjects. State NAEP assessments are administered to state-representative samples of students in states that elect to participate in the state assessment program. State NAEP uses the same large-scale assessment materials that are used in national NAEP. State NAEP is administered in grades 4 and 8 (not in high school) and in reading, writing, mathematics, and science (although not always in both grades in each of these subjects). To recapitulate, current NAEP consists of two assessments, trend NAEP and main NAEP. Main NAEP includes both national and state-level administrations. Figure I-1 depicts the components of the current NAEP assessments, and Table I-1 summarizes the features of each of these components. Table I-2 provides a schedule of NAEP administrations from 1990 through 2002. FIGURE I-1 The components of the current NAEP assessments.
OCR for page 17
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress TABLE I-1 Components and Features of Current NAEP Component Purpose Sample Assessment Design Main NAEP National NAEP Measure national-level achievement in 9 subject areas specified in national education goals; measure short-term trends Grades 4, 8, and 12a Assessments based on recently developed frameworks State NAEP Measure state-level achievement in reading, writing, mathematics, science; measure short-term trends Grades 4 and 8b Same assessments as national NAEP Trend NAEP Measure long-term trends in student achievement in reading, writing, mathematics, and science 9-, 13-, and 17-year-olds in reading, mathematics, and science; grades 4, 8, and 11 in writing Assessment is based on collections of items that have been administered many times over the past 20-30 years a All three grades are assessed in most, but not all, subject areas. b Both grades have not always been assessed in each subject area. Current Governance NAEP's complex design is mirrored by an increasingly complex governance structure. In 1988, amendments to the authorizing statute for NAEP established the current management and governance structure. Under this structure, the commissioner of education statistics, who leads the National Center for Education Statistics (NCES) in the U.S. Department of Education, retains responsibility for NAEP operations and technical quality control. NCES procures test development and administration services from cooperating private companies; currently, these are the Educational Testing Service and WESTAT. The program is governed by the National Assessment Governing Board, appointed by the secretary of education but independent of the department. The board, authorized to set policy for NAEP, is designed to be broadly representative of NAEP's varied audiences. It selects the subject areas to be assessed and ensures that the content and skill outlines, or NAEP frameworks, that specify goals for assessment are produced through a national consensus process. During the 1990s, NAGB contracted with the Council of Chief State School Officers for this consensus development. In addition, NAGB establishes performance standards for each subject and grade tested, in consultation with its contractor for this
OCR for page 18
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress TABLE I-2 Administration Schedule for Current NAEP Assessments, 1990-2002 Year National NAEPa State NAEPb Trend NAEPc 1990 Reading; Mathematics; Science Mathematics (8) Reading; Writing; Mathematics; Science 1992 Reading; Writing; Mathematics Reading (4); Mathematics (4, 8) Reading; Writing; Mathematics; Science 1994 Reading; U.S. History; Geography Reading (4) Reading; Writing; Mathematics; Science 1996 Mathematics; Science Mathematics (4, 8); Science (8) Reading; Writing; Mathematics; Science 1997 Arts (grade 8 only) — — 1998 Reading; Writing; Civics Reading (4, 8); Writing (8) — 1999 — — Reading; Writing; Mathematics; Science 2000 Mathematics; Science Mathematics (4, 8); Science (4, 8) — 2001 U.S. History; Geography — — 2002 Reading; Writing Reading (4, 8); Writing (4, 8) — a All national NAEP assessments are administered at grades 4, 8 and 12, unless otherwise indicated. b Grades at which state NAEP is administered are indicated in parentheses. c Trend NAEP assessments are administered at ages 9, 13, and 17 in reading, mathematics, and science, and in writing at grades 4, 8, and 11. SOURCE: Data from National Assessment Governing Board.
OCR for page 19
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress task, the American College Testing Program. NAGB also develops guidelines for NAEP reporting. CURRENT CONTEXT AND DEMANDS As previously noted, NAEP was envisioned in the 1960s as a fairly straight-forward indicator, a barometer of academic achievement for the nation, large geographic regions, and major demographic subgroups. Since that time, several related changes in the sociopolitical and educational landscape have occurred. First, as noted above, there has been increased federal, state, and local funding of education and growing public attention to education and demands for accountability, followed by an expansion of state involvement in education with increased responsibility for the disbursement of state (and often federal) funds. Second, there has been a marked increase in the racial and ethnic diversity of the school-age population and strong national commitment to providing educational opportunities to all children, including English-language learners and students with disabilities. Third, there has been the emergence of new knowledge, primarily through research on cognition, about how students learn and what they understand in various disciplines. And fourth, there has been the emergence of standards-based education reform and the need for measures of progress against stringent educational goals. We discuss these changes in turn below and in the chapters that follow. Educational indicators and data sources. The increased demands for accountability have led to a proliferation of educational indicators (e.g., of student achievement, school resources, teacher preparation), both within and beyond NCES, that are often disconnected from each other. Also, in addition to national-level data, many policy makers want indicator information at the state and local levels. Such demands are also frequently accompanied by the expectation that the indicators be tied to information that helps provide context for and even explains the indicator results. For NAEP, this has led to increased desires that it be used as a source of information to help explain why achievement results are what they are. Participation. The increased diversity of the student population and the national commitment to participation have led to pressures on the NAEP program to take steps to include all students in the assessment, including students with disabilities and English-language learners, and to provide modes of assessment that capture the knowledge and skills of all members of this increasingly diverse U.S. student population. Cognitive theory and curriculum. NAEP's original purpose as an indicator of what students know and can do in key subject areas led to assessments that were highly content-and curriculum-based—that is, they test students' knowledge in a discipline but reveal little about how they think and learn. The program
OCR for page 20
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress is increasingly called on to incorporate current findings from disciplinary research and cognitive and developmental research in NAEP assessments so as to reflect broader conceptualizations of achievement at the same time that it is asked to provide measures of progress over time —measures that require some level of constancy in assessment content. Standards. Finally, NAEP is expected to reflect both current curriculum and practice and the goals of standards-based educational reform. For example, NAEP is expected to determine and measure what students know, as well as what students should know to meet the nation's far-reaching educational goals. These major changes have combined to produce an audience for the program that is much more diverse than that envisioned by NAEP's originators. This audience now includes policy makers at national, state, and local levels, reformers, parents, teachers, and researchers, all seeking to use NAEP for many, varied, and often conflicting purposes. The response to the pressures brought on by these multiple users has led the program to add more and more components. The result is that the NAEP assessment program faces difficult decisions about trade-offs between purposes and uses of the assessment, assessment design, and available program funds, which must be addressed and resolved in any future (re)design. OVERVIEW OF THE REPORT In the remaining chapters of the report, we examine current NAEP and make recommendations for action that can contribute to a satisfactory resolution of some of these issues. In several instances, it is the committee's view that current problems and issues, if left unaddressed, are likely to undermine NAEP's effectiveness and future prospects for success. Chapter 1 examines the information needs of NAEP's users and looks at the extent to which the program does and does not satisfy the many and varied needs for data and judgments about the progress of American education. In Chapter 1, we also propose a coordinated system of indicators for assessing educational progress and for providing context for improved understanding of NAEP's student achievement results. We discuss the implementation of such a system within NCES. Chapters 2 through 5 focus on NAEP's assessments of student achievement. Chapter 2 discusses NAEP's sampling, data collection, analysis, and reporting designs. Chapter 3 documents and evaluates NAEP's efforts to include and meaningfully assess students with disabilities and English-language learners. Chapter 4 evaluates NAEP's frameworks and assessment materials and the extent to which they lead to data that support clear and useful inferences about the academic capabilities of the school-age population. And Chapter 5 documents recent efforts to set reasonable and useful performance standards for NAEP. Each of these chapters provides background information and evidence that the
OCR for page 21
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress committee considered during its evaluation, as well as specific conclusions and recommendations related to the chapter's broader topic. Chapter 6 provides suggestions for timelines, strategies, and priorities for implementing recommendations presented in Chapters 2 through 5.
Representative terms from entire chapter: