Read "Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress" at NAP.edu

Page 114 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance

Summary Conclusion 4. The current assessment development process for main NAEP, from framework development through reporting, is designed to provide broad coverage of subject areas in a large-scale survey format. However, the frameworks and assessment materials do not capitalize on contemporary research, theory, and practice in ways that would support in-depth interpretations of student knowledge and understanding. Large-scale survey instruments alone cannot reflect the scope of current frameworks or of more comprehensive goals for schooling.

Summary Recommendation 4. The entire assessment development process should be guided by a coherent vision of student learning and by the kinds of inferences and conclusions about student performance that are desired in reports of NAEP results. In this assessment development process, multiple conditions need to be met: (a) NAEP frameworks and assessments should reflect subject-matter knowledge; research, theory, and practice regarding what students should understand and how they learn; and more comprehensive goals for schooling; (b) assessment instruments and scoring criteria should be designed to capture important differences in the levels and types of students' knowledge and understanding both through large-scale surveys and multiple alternative assessment methods; and (c) NAEP reports should provide descriptions of student performance that enhance the interpretation and usefulness of summary scores.

Page 115 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

INTRODUCTION

Frameworks and the assessments that are based on them are central to the entire enterprise of NAEP. The framework documents describe the knowledge and skills to be assessed in each NAEP subject area, and the assessments represent the collection of measures (items, tasks, etc.) from which inferences about student performance in the subject area will be derived. Together they form the basis for describing student achievement in NAEP.

In this chapter we describe and evaluate NAEP's frameworks and the assessment development process for main NAEP. We use the term assessment development process here in a very broad sense, to describe the entire scope of activity from framework development through final assessment construction, scoring, and reporting. As background, we first provide an overview of the major steps in the development of an operational NAEP assessment, using the development of the 1996 NAEP science assessment for illustration. We then examine the conclusions and recommendations of previous evaluation panels most pertinent to our subsequent discussion. Our evaluation of NAEP's frameworks and assessment development process follows; in this discussion we make arguments for:

determining the kinds of inferences and conclusions about student performances that are desired in reports of NAEP results, and then using this vision of student achievement to guide the entire assessment development process
improving assessment of the subject areas as described in current frameworks and including an expanded conceptualization of student achievement in future frameworks and assessments
using multiple assessment methods, in addition to large-scale surveys, to improve the match of assessment purpose with assessment method
enhancing use of assessment results, particularly student responses to constructed-response items, performance-based tasks, and other alternative assessment methods, to provide interpretive information that aids in understanding overall NAEP results, and
improving coherence across the many steps in the assessment development process as an essential prerequisite to successfully accomplishing goals 1 through 4

In Chapter 1 we described the importance of enhancing NAEP's interpretive function by integrating its measures of student achievement with a larger system of indicators for assessing educational progress. This would provide an essential context for better understanding NAEP's achievement results in a given subject area. The focus in that discussion was on the collection and integration of data on relevant student-, school-, and system-level variables in ways that can elucidate student achievement and answer questions about ''why the results are what they are.''

Page 116 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

In this chapter we discuss the analysis of students' responses to assessment items and tasks as another strategy for enhancing NAEP's interpretive function. By capitalizing on the currently unexploited sources of rich information contained in student responses (and patterns of responses), we describe how NAEP could answer questions about what students know and can do at a level of detail not currently reflected by summary scores. This type of interpretive information, gleaned from students' responses, provides insights about the nature of students' understanding in the subject areas. When combined with the broader-scale interpretive information that emerges from the coordinated system of indicators described in Chapter 1, qualitative and quantitative summaries of student achievement can help educators and policy makers begin to answer the key question that is asked when achievement results are released: "What should we do in response to these results?"

OVERVIEW OF NAEP'S CURRENT ASSESSMENT DEVELOPMENT PROCESS

When this committee began its evaluation in spring 1996, the 1996 main NAEP science assessment was the focus, largely because the science achievement-level-setting process was undertaken concurrently with the term of this evaluation and because the science assessment included an unprecedented number and variety of constructed-response items and hands-on tasks. However, because each NAEP subject area has unique features, in terms of the content and structure of the domain and the methods used to assess the domain, it was necessary and useful to consider other NAEP subject-area assessments as well. Thus, although our evaluation maintains an emphasis on the 1996 science assessment, we have also considered NAEP's mathematics and reading assessments in some depth, since these subject areas are among the most important to educators and policy makers. Simultaneous consideration of science, mathematics, and reading also permits attention to issues that cut across subject areas, as well as those that are subject-specific.

The development of NAEP's frameworks and assessments is a complex multistep process. For any given subject area, the entire sequence of activities—from framework development, through assessment development and administration, to the reporting of initial results—spans approximately five years, barring funding interruptions or other changes in scheduling. An overview of the sequence of activities in the framework and assessment development process, based on the 1996 science assessment, is portrayed in Figure 4-1. The impressive effort that is mounted by the National Assessment Governing Board (NAGB), the National Center for Education Statistics (NCES), and their subcontractors each time a NAEP assessment is developed and administered is often looked to as a model for framework and assessment development by states, districts, and other developers of large-scale assessments.

Page 117 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

FIGURE 4-1 A generalized overview of NAEP's assessment development process.

Page 118 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Under NAGB's auspices, frameworks for the main NAEP assessments are developed by a planning committee (primarily subject-area experts —teachers, curriculum specialists, and disciplinary researchers) and a steering committee (a broad group of education administrators, policy makers, and subject-area experts) through a unique, broad-based consensus process. Through this consensus process, the planning and steering committee members reach a level of agreement about the subject-area knowledge and skills students should know and be able to do. Although there is never complete agreement among committee members about the scope and content of the frameworks, in general the outcome of the consensus process has been that the framework strikes a balance between reflecting current practice and responding to current reform recommendations.

Most NAEP frameworks specify that the subject-area assessments be constructed around two or more dimensions. In science, two major dimensions are "fields of science" and "ways of knowing and doing," which are supplemented by two underlying dimensions, ''nature of science'' and "themes." In reading, the major dimensions are "reading stance" and "reading purpose"; in mathematics, two primary dimensions, "content" and "mathematical abilities," are supplemented with a dimension designated "mathematical power." For each dimension, the frameworks also describe the proportions and types of items and tasks that should appear on the final version of the NAEP assessments. (See Figures 4-2, 4-3, and 4-4 for diagrammatic representations of the current main NAEP frameworks in science, reading, and mathematics.)

Following the development of the framework, test and item specifications are generated, also under the auspices of NAGB. These specifications, which provide a detailed blueprint for assessment development, are typically developed by a small subgroup of the individuals involved in the development of the framework, along with a subcontractor with experience in the development of specifications for large-scale assessments.

The framework and specifications documents thus serve as guides for the development of assessment materials in each subject area. Item development and field-test administration and scoring are currently carried out by staff at the Educational Testing Service (ETS—under contract to NCES) in consultation with an assessment development committee of subject-area experts, some of whom have been involved in the development of the framework. Items and draft scoring rubrics are developed by the committee, ETS staff, and external item writers identified by ETS and by the committee. Items are developed to include a mix of multiple-choice and a variety of constructed-response items and performance tasks as specified in the framework and specifications. ETS staff and assessment development committee members review and edit all assessment materials, which are also reviewed for potential sources of bias. When time has permitted, some of the more complex performance-based items have been piloted with two to three classes, and students have been interviewed about the items and

Page 119 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

FIGURE 4-2 The 1996 main NAEP science framework matrix. NOTE: Nature of Science: the historical development of science and technology, and the habits of mind that characterize these fields, and the methods of inquiry and problem solving. Themes: the "big ideas" of science that transcend scientific disciplines and induce students to consider problems with global implications.

SOURCE: National Assessment Governing Board (no date, d:13).

their responses to the items. It has not, however, been universal practice to pilot items before formal field testing.

Field tests are administered to samples of students by WESTAT and scored by National Computer Systems (NCS). ETS staff and development committee members participate in the selection of items for the final version of the assessment and the revision of scoring rubrics based on the initial wave of incoming student responses. Constructed-response items are then scored by trained readers. ETS documents state that items or sets of items (in the case of reading passages or hands-on science tasks) are selected for the final assessment based on their fit with the framework, their fit with preliminary achievement-level descriptions, and their general statistical properties (e.g., level of difficulty, item-test correlations).

Final assessment forms are again reviewed by the assessment development committee prior to administration by WESTAT to a nationally representative sample of students (generally a year after the field test was administered). Scoring is once again managed by NCS, with ETS staff and the assessment development committee overseeing any necessary revisions of the scoring guides prior to scoring by the trained readers.

Page 120 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

FIGURE 4-3 The 1992-1998 main NAEP reading framework matrix. SOURCE: National Assessment Governing Board (no date, b:16-17).

Page 121 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

FIGURE 4-4 The 1996 main NAEP mathematics framework matrix. NOTE: Mathematical Power: consists of mathematical abilities within the broader context of reasoning and with connections across the broad scope of mathematical content and thinking. Communication is both a unifying threat and a way for students to provide meaningful responses to tasks.

SOURCE: National Assessment Governing Board (no date, a:11).

Subsequent analysis of the results and production of the initial report (known as the Report Card) leads to the release of overall summary score results approximately 12 to 18 months after the administration of the assessment. Achievement-level setting and the release of achievement-level results also occur within the same time period, since it is NAGB's goal to include these results in the initial report. Following the release of initial summary score and achievement-level results, a series of follow-up reports that provide univariate analyses of student achievement in relation to contextual variables are released, and public-use NAEP datasets are made available to those who have site licenses.

NAGB's current plans call for NAEP final assessments to be readministered periodically (at 4-year intervals for reading, writing, mathematics, and science; see Table I-1). Because some assessment materials are released to the public after each administration of a final assessment, a new round of item development and field testing is conducted to replace those materials. The new materials and the revised final assessment are intended to reflect the goals of the original framework and specifications. Thus, the same framework serves as the basis for a series of assessments over time.

Page 122 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

SELECTED FINDINGS FROM PREVIOUS NAEP EVALUATIONS

Our examination of NAEP's frameworks and the assessment development process has benefited greatly from the previous evaluations conducted by the National Academy of Education (NAE) and from a range of design initiatives and validity studies conducted by NAGB and NCES themselves. The NAE evaluations were mandated in NAEP's authorizing legislation and focused on the quality, validity, and utility of the NAEP assessments that were included as part of the trial state assessment program between 1990 and 1994 (the 1990 and 1992 mathematics assessments and the 1992 and 1994 reading assessments). Several major areas of observation and evaluation from the NAE studies are integral to discussions we present later in this chapter.

Framework Consistency with Disciplinary Goals

In general, the NAE panel found the NAEP frameworks for the 1990 and 1992 mathematics assessments and the 1992 and 1994 reading assessments to be reasonably well balanced with respect to current disciplinary reform efforts and common classroom practices in reading and mathematics. In reading, the panel concluded that the framework and the assessments were consistent with current reading research and practice, incorporating innovations in assessment technology such as interesting and authentic reading passages, longer testing time per passage, and a high proportion of constructed-response items (National Academy of Education, 1996:9). However, in their evaluation of the 1994 reading assessment, the panel contended that there were important aspects of reading not captured in the current reading framework, most notably differences in students' prior knowledge about the topic of their reading and contextual factors associated with differences in students' background, experiences, and interests (DeStefano et al., 1997).

In mathematics, the panel concluded that the 1990 frameworks and assessments reflected much of the intent of the Curriculum and Evaluation Standards for School Mathematics of the National Council of Teachers of Mathematics (1989) and that appropriate steps were taken to bring the 1992 assessment materials even more in line with those widely accepted standards. They did recommend, however, that the current content-by-process matrix, which requires items to be classified in a single content category and a single process category, be replaced with a model that better represents the integrated nature of mathematical thinking (National Academy of Education, 1992:20, 1993:69).

Fit of Items to Frameworks and Specifications

Analyses conducted for the NAE panel show that for the 1990 and 1992 mathematics assessments, the fit of the items to major dimensions of the framework

Page 123 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

was reasonable, particularly in the content categories. When a group of mathematics experts classified the items in the 1990 grade 8 mathematics assessment on the basis of the content and mathematical ability categories specified in the framework (see Figure 4-4), their classifications matched NAEP's classifications in content areas for 90 percent of the items, and they matched mathematical ability category classifications for 69 percent of the items (Silver et al., 1992). Nearly identical results were obtained when a similar study was conducted using the 1992 grade 4 items (Silver and Kenney, 1994). The lower congruence of classifications in the mathematical ability categories was judged to result from the fact that many items appeared to tap skills from more than one ability, making the classification of items into a single ability category a difficult task.

For the 1992 grade 4 reading assessment, a group of reading experts judged the item distribution across "reading purposes" to be a reasonable approximation of the goals specified in the framework, but they noted that the assessment was lacking in items that adequately measured the personal response and critical stance categories of the "reading stance" dimension (Pearson and DeStefano, 1994). The panel reiterated the lack of clarity in the stance dimension following the evaluation of the 1994 reading assessment (DeStefano et al., 1997), positing that the assessment of this dimension, as currently carried out, added little to the interpretive value of NAEP results.

Use of Constructed-Response and Other Performance-Based Items

Across the assessments that it evaluated, the NAE panel repeatedly applauded NAEP's continued move to include increasing numbers and variations of constructed-response and other performance-based item types, and it encouraged further development and inclusion of such items as mechanisms for assessing aspects of the framework not easily measurable through more constrained item formats. They also recommended that special studies should be used to assess aspects of the frameworks not easily captured in the range of item types administered in a large-scale survey assessment format (National Academy of Education, 1992:28-29, 1993:69-72, 1996:25-28).

Continuity Across Framework and Assessment Development Activities

Recognizing the complex, multistep nature of the NAEP assessment development process, the NAE panel recommended that mechanisms be implemented to ensure continuity throughout the process. The panel suggested that the mechanism could be a set of subject-specific oversight committees that monitor all steps of the process, from framework development to reporting, in order to ensure that the intentions of the framework developers were reflected in the assessment materials and in reports of NAEP results (National Academy of Education, 1992:30).

Page 124 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Time Allotted for Assessment Development

The NAE panel repeatedly noted the severe time constraints placed on the NAEP assessment development process, observing that "due to short authorization and funding cycles on one hand and time-consuming federal clearance procedures on the other, the actual development of the frameworks and assessment tasks has been squeezed into unconscionably short time frames" (National Academy of Education, 1996:27). The panel noted that such time constraints are antithetical to the iterative design and development processes required to develop innovative assessment tasks that measure aspects of student achievement not well measured through more constrained item formats.

A Broader Definition of Achievement

In their fifth and final evaluation report, Assessment in Transition: Monitoring the Nation's Educational Progress (National Academy of Education, 1997), the NAE panel provided arguments for the reconceptualization of the NAEP assessment domains to include aspects of achievement not well specified in the current frameworks or well measured in the current assessments. They recommended that particular attention be given to such aspects of student cognition as problem representation, the use of strategies and self-regulatory skills, and the formulation of explanations and interpretations. The NAE panel contended that consideration of these aspects of student achievement is necessary for NAEP to provide a complete and accurate assessment of achievement in a subject area.

THE COMMITTEE'S EVALUATION

Our evaluation of NAEP's frameworks and the assessment development process is organized around four topics: (1) an examination of the existing frameworks and assessment development process for main NAEP, (2) an argument for a broader conceptualization of student achievement in future NAEP frameworks and assessments, (3) a recommendation for the use of a multiple-methods strategy in the design of future NAEP assessments, and (4) a discussion of the types of portrayals of student achievement that can enable NAEP to better meet its interpretive function.

Two underlying themes regarding the assessment development process emerged during the course of our evaluation. These serve as a foundation for the discussion in this chapter and are central to the successful implementation of the process improvements we recommend.

First, we contend that the entire assessment development process must be guided by a clear understanding of the kinds of inferences and conclusions about student achievement that one wants to find in reports of NAEP results. For

Page 125 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

example, assume that the developers of a science framework determine that it is essential to describe and understand students' abilities to design and conduct scientific investigations. A primary goal of the framework should be to describe the kinds of inferences about students' knowledge and skills in scientific investigation that will eventually be made in reports of results. The method of assessment should then be appropriate for eliciting student performance in designing and conducting investigations. Scoring rubrics should capture critical differences in student responses that provide information needed to make the inferences about that performance.

However, for many large-scale assessment development efforts, including NAEP, too often the focus is not on the kinds of information that eventually are to be provided in the reports of results. Too often the focus of framework development is on the development of broad content outlines that include nearly everything that could be assessed in a subject area. Too often the focus of assessment development is on the production of large numbers of items that match categories of framework dimensions in very general ways. Too often scoring rubrics are designed for ease of training readers and scoring responses. Instead, the focus should be on defining what kinds of inferences about achievement are to be provided in reports and then designing a connected system of frameworks, assessments, and scoring rubrics so that they lead to the collection of the information from students' responses necessary to make such inferences.

The second theme is closely related to the first. In order for desired inferences about student achievement to guide the assessment development process, there must be a high degree of continuity from one step to another in the process, from the conceptualization of the framework, to the development of assessment materials and scoring rubrics, through the reporting of results. Too often the intentions of the developers of the framework can be diluted, and even unrealized, if there is not sufficient attention to carrying out the inferential goals described in the framework throughout the entire assessment development process. We discuss strategies for improving the coherence across the steps of the process later in this chapter.

NAEP's Frameworks and Assessment Development Process

In this section we evaluate NAEP's existing frameworks and the current assessment development process. We discuss (1) the content of main NAEP's frameworks in science, mathematics, and reading; (2) the fit of items to the framework dimensions in the 1996 NAEP science assessment; (3) assessment of the knowledge and skills described in the frameworks; (4) coherence across the assessment development process; and (5) the time frame available for completing assessment development activities.

Page 126 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Frameworks for Main NAEP Science, Mathematics, and Reading Assessments

Science The consensus process through which NAEP's frameworks are developed has led to the production of comprehensive documents that cover a broad range of the content knowledge and skills within the potential subject-area domain. The committee observes that the framework for the 1996 science assessment continues this trend, specifying very broad and detailed coverage of subject-matter content across life, physical, and earth sciences, along with a range of process skills (Science Assessment and Exercise Specifications for the 1996 NAEP, National Assessment Governing Board, no date, c). These process skills are among those that are accorded high importance in national science standards documents, including those developed by the National Academy of Sciences, the American Association for the Advancement of Science's Project 2061 effort, and the National Science Teachers Association's Scope, Sequence, and Coordination framework. The process skills, defined in the NAEP framework as "ways of knowing and doing" are: conceptual understanding, scientific investigation, and practical reasoning. The science framework also includes two additional dimensions that are consonant with ideas promoted in the standards documents; these dimensions cut across the framework's content-by-process matrix:

"Themes"—systems, patterns of change, and models—are described in the framework as the "big ideas" of science that transcend the scientific disciplines and enable students to consider problems with global implications (Science Framework for the 1996 NAEP, National Assessment Governing Board, no date, d:28).
"The nature of science" includes the "historical development of science and technology, and the habits of mind that characterize those fields, and the methods of inquiry and problem-solving" (p. 15).

Thus, the framework for the 1996 NAEP science assessment includes both broad and detailed content coverage and the process skills that are accorded importance in national science curriculum standards. The structural matrix that summarizes the major components of the 1996 NAEP science assessment framework appears in Figure 4-2.

Mathematics In mathematics, the framework also prescribes both broad content coverage and skills deemed to be important in national curriculum standards. NAEP has been attentive to ongoing input from the disciplinary and education communities and from previous evaluations in its revision of the mathematics framework in preparation for the 1996 mathematics assessment. The 1990-92 mathematics framework was modified for the 1996 assessment to include "mathematical power" as a component of the domain (see Figure 4-4). Mathematical

Page 127 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

power includes reform-endorsed measures of students' abilities "to reason in mathematical situations, to communicate perceptions and conclusions drawn from a mathematical context, and to connect the mathematical nature of a situation with related mathematical knowledge and information gained from other disciplines or through observation" (Mathematics Framework for the 1996 NAEP; National Assessment Governing Board, no date, a:37). In response to the NAE panel's evaluation of the 1990-92 mathematics framework, the 1996 framework dispensed with the rigidly structured content area-by-mathematical-ability matrix as a guide for specifying percentages of items to be included in the assessment (this matrix assumed that any given item assessed one, and only one, of three mathematical abilities—conceptual understanding, procedural knowledge, or problem solving). The revised framework is based on a single dimension comprised of five content strands that serve as the basis for specifying item percentages, but it recognizes that any given item, especially those that are complex in nature, can assess more than one aspect of mathematical ability or mathematical power (e.g., an item that assesses the content strand of geometry and spatial senses might also assess the mathematical abilities of procedural knowledge and problem solving). The goal during assessment development is to achieve a balance of coverage of mathematical abilities and mathematical power across the entire assessment, rather than focusing on developing a predetermined number of items that purport to measure each mathematical ability or aspect of mathematical power in a discrete fashion. Such a strategy supports current conceptions about the integrated nature of mathematical thinking.

Reading The framework used for the 1998 reading assessment remains unchanged from that used to guide the development of the 1992 and 1994 reading assessments. We concur with the NAE panel's evaluation that this framework reflects current theory and an understanding of research about reading processes. It successfully delineates characteristics of good readers and the complex interaction among the reader, the text, and the context of the reading situation. As described by NAGB (Reading Framework for the National Assessment of Educational Progress: 1992-1998; National Assessment Governing Board, no date, b:9-10), the framework acknowledges a number of different aspects of effective reading and a number of variables that are likely to influence students' reading performance (see Figure 4-3).

NAEP has adequately addressed two aspects of effective reading: the extent to which "students read a wide variety of texts" and "form an understanding of what they read and extend, elaborate and critically judge its meaning" (National Assessment Governing Board, no date, b:9). This has been accomplished by including three types of texts in the assessment (literature, information, documents) and by asking questions at four levels of understanding or stances (initial understanding, developing interpretation, personal reflection and response, and demonstrating a critical stance). The NAEP reading framework also reflects

Page 128 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

current disciplinary goals for assessment by including a substantial number of extended response items whereby students are asked to write answers to comprehension questions rather than simply to recognize correct answers. However, as the NAE panel noted, there are important aspects of the reading model presented in the framework that are not captured in the organizing structure of stances and types of text.

Thus, we conclude, on the basis of studies conducted previously and on the committee's own observations, that NAEP's existing frameworks in science, mathematics, and reading generally reflect many goals of the disciplinary communities and have instituted some forward-looking, reform-oriented innovations. However, the frameworks still do not adequately reflect contemporary research and theory from cognitive science and the subject-area disciplines about how students understand and learn. Maintaining broad coverage of subject-area knowledge and skills is still a major focus of the frameworks, particularly in science and mathematics. Although breadth of coverage supports traditional assessment methodologies that result in summary scores as indicators of student achievement, it provides little insight about the level and depth of student understanding that is valued in many current views of student learning. It is also notable that none of the three frameworks reviewed specifically defines the kinds of inferences about student achievement that are most desired in reports of results. Instead, the user of the frameworks must make assumptions about the kinds of descriptions of student achievement that the framework developers intended to appear in results.

Fit of Items to the Framework Dimensions: 1996 NAEP Science

The construction of the main NAEP assessments, as is the case for most current large-scale survey assessments, has been predicated on the assumption that the goals of the framework can be measured through a broad array of discrete items (or sets of items that refer to a common reading passage or problem situation). Recognizing that some aspects of the framework are not best assessed in an objective (multiple-choice) format, NAEP has appropriately incorporated increasing numbers of short and extended constructed-response items into the assessments. The use of such items was more extensive in the 1996 NAEP science assessment than in any previous NAEP assessment (Table 4-1). Over 60 percent of the items required constructed responses, and approximately 80 percent of the students' assessment time was allocated to responding to these items. In addition, every student in the assessment was administered a hands-on task. In these tasks, students were provided with a set of materials and asked to carry out an activity according to provided instructions. They were then asked to respond to a series of discrete objective and constructed-response questions related to the activity.

The committee commissioned research to examine how well this diverse

Page 129 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

pool of multiple-choice and constructed-response items matched the major categories of the structural matrix of the NAEP science framework. The central findings of this work are summarized below. A detailed description of methods and results is presented in the volume of research papers that accompanies this report (Sireci et al., 1999).

In one study, 10 eighth-grade science teachers who were familiar with science education reform and the goals of the NAEP science framework were asked to study and discuss the framework and then classify items from the 1996 eighth-grade science assessment by content area and process area. They were also asked to indicate which, if any, of the three themes the items assessed, and to determine if each item assessed the nature of science. An item was considered to be ''correctly'' classified if at least 7 of 10 of the teachers classified the item in the category in which the item was classified by NAEP (the assessment development committee and the ETS staff).

In general, there was a high degree of congruence between content classifications of items assigned by the eighth-grade science teachers and those assigned by NAEP. Using the "7 of 10 raters" criterion, across the three content areas (life, physical, and earth sciences), 85 percent of the items were matched to the content area in which they were classified by NAEP. For more than half of the items, all 10 teachers matched the classifications assigned by NAEP. "Correct" classifications were relatively lower for the process dimension (60 percent). The percentages of correct classifications for conceptual understanding, practical reasoning, and scientific investigation were 70 percent, 53 percent, and 50 percent, respectively. For 12 percent of the items, all 10 teachers' classifications were congruent with those assigned by NAEP. This suggests that delineating process domains for these items is more difficult than delineating content domains. A likely reason is that many items may require students to draw on more than one cognitive skill simultaneously, an assessment feature that many in the science and education communities would support. Thus, we recommend that the science framework should be revised to parallel the changes to the framework for the 1996 mathematics assessment, in which the goal was to achieve a balance of coverage across process categories in the item pool as a whole rather than presuming that each item can assess only a single process category.

The results with regard to the themes and nature of science dimensions were problematic. Approximately 50 percent of the items in the science assessment were categorized in one of the three themes by NAEP, evenly distributed across the three themes (systems, models, and patterns of change). According to the judgment of the teachers in this study, virtually all of the items were thought to be measuring one of the three themes. Of the items that NAEP classified into one of the three themes, the match between the theme identified by the teacher and that designated by NAEP was only 50 percent. Likewise, the teachers also judged that virtually all of the items were assessing the nature of science dimension, while only 16 percent of the items were classified as "nature of science" by

Page 130 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

TABLE 4-1 Items and Distribution of Assessment Time in NAEP Instruments

	Number of Items				Percent of Items				Percent of Time
Items, Year	MC	SCR	ECR	Total	MC	SCR	ECR	MC	SCR	ECR
1992 Reading
Grade 4	42	35	8	85	49%	41%	9%	28%	46%	26%
Grade 8	57	53	13	123	46%	43%	11%	25%	46%	29%
Grade 12	63	54	16	133	47%	41%	12%	25%	43%	32%
1994 Reading
Grade 4	39	37	8	84	46%	44%	10%	25%	48%	26%
Grade 8	41	55	13	109	38%	50%	12%	19%	51%	30%
Grade 12	44	62	13	119	37%	52%	11%	19%	53%	28%
1992 Mathematics
Grade 4	99	54	5	158	63%	34%	3%	43%	47%	11%
Grade 8	118	59	6	183	64%	32%	3%	44%	44%	11%
Grade 12	115	58	6	179	64%	32%	3%	44%	44%	11%
1996 Mathematics
Grade 4	80	55	9	144	56%	38%	6%	34%	47%	19%
Grade 8	93	62	7	162	57%	38%	4%	37%	49%	11%
Grade 12	91	68	7	166	55%	41%	4%	35%	52%	13%
1994 Geography
Grade 4	59	23	8	90	66%	26%	9%	41%	32%	28%
Grade 8	84	32	9	125	67%	26%	7%	44%	33%	23%
Grade 12	85	25	13	123	69%	20%	11%	43%	25%	33%
1994 History
Grade 4	62	26	6	94	66%	28%	6%	43%	36%	21%
Grade 8	101	35	12	148	68%	24%	8%	44%	30%	26%
Grade 12	104	33	19	156	67%	21%	12%	39%	25%	36%
1996 Science
Grade 4	51	73	16	140	36%	52%	11%	18%	53%	29%
Grade 8	74	100	20	194	38%	52%	10%	20%	53%	27%
Grade 12	70	88	30	188	37%	47%	16%	18%	44%	38%
NOTE: Main balanced incomplete block spiral only; excludes theme blocks and estimation blocks. MC = multiple choice SCR = short constructed response ECR = extended constructed response SOURCE: Johnson et al. (1997:4-5).

NAEP. The teachers' interpretation of the themes and the nature of science dimensions, as described in the framework and based on their own experiences with these concepts, appears to be so broad that they view nearly every item in the assessment as measuring both of these dimensions. Although it may truly be the case that these dimensions thread through all parts of the science assessment (and in ways that are perceived differently from one subject-matter expert to the next), it is clear that these dimensions must be more clearly and narrowly defined in the framework. Inferential goals for reporting achievement in these areas must be

Page 131 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

clearly stated, if these dimensions are to be successfully translated into assessment materials and have any interpretive utility.

Improved Assessment of Knowledge and Skills Described in the Frameworks

As stated earlier, the science, mathematics, and reading frameworks have incorporated many aspects of the standards-based goals of the disciplinary communities.

Page 132 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

In general, the assessment item pools are reasonably reflective of the goals for distributions of items set forth in the framework matrices, particularly in the content-area dimensions in mathematics and science.

However, the presence of standards-based goals in the frameworks and the general fit of the assessment item pools to categories in the major framework dimensions do not ensure that the goals of the framework have been successfully translated into assessment materials. Several lines of evidence indicate that NAEP's assessments, as currently constructed and scored, do not adequately assess some of the most valued aspects of the frameworks, particularly with respect to assessing the more complex cognitive skills and levels and types of students' understanding:

Across the NAEP assessments, students' responses to some short constructed-response items and many extended constructed-response tasks are often sparse or simply omitted (up to 40 percent omit rates for some extended constructed-response items). Given that it is these very items that are often intended to assess complex thinking and understanding, the assessments are failing to gather adequate information on these aspects of the framework.
Significant numbers of the scoring rubrics in the NAEP reading, mathematics, and science assessments award points for easily quantifiable aspects of the response (awarding higher scores for numbers of examples provided or reasons given, numbers of correct statements made, etc.) rather than for the quality of the response. Such quantitative rubrics do little to capture students' level of understanding. In addition, on some items, respondents can get partial credit while demonstrating no knowledge of the construct the item was designed to measure. On other items, the same level of partial credit is given to a variety of responses that suggest quite different understanding of the concepts the item was designed to measure. In many cases, rubrics are not well constructed to capture the potential complexity of student responses. Silver et al. (1998) presented a paper at the 1998 annual meeting of the American Educational Research Association that corroborates these observations. They analyzed scoring rubrics and student responses for several extended constructed-response items from the 1996 main NAEP mathematics assessment and concluded that varying levels of sophistication in the reasoning used by students to respond to the items were not reflected in the rubrics they examined.
When the NAEP science framework was developed in 1990-1991, the NAGB-appointed steering and planning committees believed that it was imperative that the assessment include measurement of student achievement via hands-on tasks. They specified that every student participating in the assessment should be administered one of these tasks. Initially this appeared to be a laudable method for promoting hands-on learning experiences in science instruction. However, the evidence is mounting that such tasks, when administered in standardized fashion as part of a large-scale survey assessment, are not an adequate

Page 133 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

way to measure student achievement in scientific investigation and related cognitive skills (Hamilton et al., 1997; Baxter and Glaser, in press). The standardized tasks in the NAEP science assessment (and other large-scale survey assessments) are necessarily highly structured, have a very heavy reading load, and appear to measure some general reasoning skills and the ability to read and follow directions at least as much as the scientific investigation skills highlighted in the framework. Also, the generalizability of similar types of science performance tasks appears to be rather low (Shavelson et al., 1993). Students' prior experience and their degree of engagement with a task set in a particular context may have a large (but probably unquantifiable) impact on their response to the task, and these impacts may vary when assessing similar aspects of achievement with a task set in a different context. The current technology for using performance-type measures in science (and in other NAEP subject areas) via the current large-scale survey assessment clearly has serious shortcomings.

These observations provide examples of ways in which current assessment items and tasks and the accompanying scoring rubrics fail to capture complex aspects of the NAEP frameworks in a satisfactory way. These challenges are certainly not unique to NAEP but are faced by virtually all large-scale survey assessments that attempt to measure even moderately complex student skills and understanding. NAEP is to be commended for developing frameworks that prescribe the assessment of some complex aspects of achievement and for taking a leadership role in exploring new methods for assessing such achievements. It is clear, however, that NAEP must continue to improve how various aspects of student achievement are assessed in the large-scale surveys. It is also clear that reliance on large-scale surveys alone is not adequate for the assessment of the more complex aspects of student achievement. More effort is needed, both by NAEP and by the assessment community, to find workable solutions to these problems. Some suggestions for how these challenges can be addressed, by improving the items and rubrics included in the large-scale surveys, as well as by broadening the range of methods used in NAEP's assessment system, are presented later in this chapter.

Specific recommendations and examples of how main NAEP's reading current assessment materials might be improved are presented in Appendix A. In this appendix, we provide a detailed analysis of a grade 8 reading passage and set of related items and scoring rubrics that were administered as part of the 1994 main NAEP reading assessment. In doing so, we illustrate how there is still much to be gained through improvements to the current large-scale assessment materials.

Improved Coherence Across the Assessment Development Process

As we stated earlier, the sequential, multistep NAEP assessment development process—framework development, item development, scoring of field test

Page 134 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

items, assembly of final forms, scoring, analysis, and reporting—can occur as somewhat discrete, fragmented events.

Efforts to reduce the fragmentation of the steps in the assessment development process in recent assessments have attempted to ensure that there is significant overlap among the experts who participate in framework development, the development of preliminary achievement-level descriptions, item development, scoring rubric development, and final form development. These experts have been given major decision-making roles, and this effort appears to have helped improve continuity. For example, during the development process for the 1996 NAEP science assessment, there was notable continuity of personnel involved in various stages of the process:

the 2 leaders of the framework development effort also oversaw the development of the assessment and exercise specifications;
5 of 11 members of the NAGB-sponsored committee that developed the preliminary achievement-level descriptions had served on the committees that developed the framework, and 5 were also serving on the assessment development committee;
5 of 13 members of the assessment development committee had also served on the committees that developed the frameworks;
many members of the assessment development committee played a large role in developing and refining scoring rubrics and rater training protocols;
members of the assessment development committee were involved as leaders at various stages of the achievement-level-setting process;
3 members of the original committees that developed the framework in 1991 continued to participate as members of the assessment development committee through the 1996 assessment scoring sessions and were leaders in the 1996 and 1997 achievement-level-setting sessions.

In response to the recommendation by the NAE panel that subject-specific oversight committees monitor all steps of the process from framework development to the reporting of results, in 1996 NCES established four subject-area standing committees for NAEP (reading and writing; mathematics and science; arts; and civics) as well as a standing committee for students with disabilities and English-language learners. The stated purpose of the committees is "to ensure continuity throughout the development of assessments." Nevertheless, our observations of two meetings of one of these committees (mathematics and science) revealed that the committee was primarily used as an ad hoc advisory committee on NAEP issues of current interest to NCES. They did not seem to view their function as one of ensuring continuity across various phases of the development process. To some degree, this mismatch of stated purpose with actual committee activities is understandable, as this committee was formed long after the frameworks and assessment materials had been developed, when the recommended

Page 135 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

oversight role should have been taking place. However, when frameworks are revised or redeveloped and assessment materials begin to be developed for a largely new assessment, the intended role of these committees could be realized if their activities were focused on their stated purpose and the committee members were made aware of the goal of continuity across stages of the process that they are expected to oversee.

Although increased and continuing involvement of subject-area experts is likely to enhance coherence across the assessment development process, our observations indicate that there are still some critical stages in the process during which a lack of coherence seems apparent:

Translation of the goals of the frameworks into assessment instruments and scoring rubrics. As stated earlier, current assessment items and tasks often are not well designed to measure complex aspects of student achievement described in the frameworks. Also, when items and tasks are well designed, the scoring rubrics are not consistently designed to attend to key differences in students' levels and types of understanding of the knowledge and skills specified in the framework. Rather, emphasis is often given to easily quantifiable aspects of a response with little consideration of the relevance of those distinctions to important differences in the levels of students' understanding.
Reflection of the goals of the frameworks in the reporting of results. The current NAEP frameworks provide broad and detailed descriptions of the knowledge and skills to be covered in NAEP's subject-area assessments. However, NAEP reports, with their focus on summary score reporting, do little to portray any of the texture found in the frameworks. For example, "mathematical power" was added to the mathematics frameworks for the 1996 assessment, but no results, analysis, or even mention of student performance across this dimension is found in the Report Card of mathematics results (Reese et al., 1997). If goals specified in the frameworks are successfully translated into assessment materials, then NAEP should be able to provide descriptive, sometimes qualitative, information about student performance in all key aspects of the framework.
Reflection of the preliminary achievement-level descriptions in assessment materials. The pools of items and tasks in current NAEP assessments have not been consistently constructed to measure knowledge and skills specified in the preliminary achievement-level descriptions presented in NAEP's framework documents. Although we discuss and evaluate NAEP's achievement-level setting in more detail in Chapter 5, it is important to note here that, if student performance is to be reported in relation to achievement levels, then the framework and assessment materials must be constructed with this goal in mind. The preliminary achievement-level descriptions must be integral parts of the frameworks, reflect the most valued aspects of the framework, and incorporate current, research-based understandings of levels of student performance in a discipline. The assessment must be designed to measure the knowledge and skills laid out in

Page 136 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

those descriptions, and the rubrics should be constructed to capture meaningful differences in the levels of students' understanding.

The NAE panel noted that assessment development for the NAEP reading and mathematics assessments had been squeezed into unconscionably short time frames (National Academy of Education, 1996:27). For the four new main NAEP large-scale survey assessments that have been developed since that time (the 1994 U.S. history and geography assessments; the 1996 science assessment; and the 1998 civics assessment), this has remained the case. The time from the awarding of the assessment development contract to the deadline for submission of all field test materials to the U.S. Department of Education and the Office of Management and Budget ranged from 5 to 8 months. The conception, development, piloting, review, and revision of all items and tasks to be field-tested occurs during this time, as does the initial development of scoring rubrics and necessary ancillary materials (such as kits used in the science hands-on performance tasks). In what may have been the worst-case scenario, between the time that the science assessment development contract was awarded (April 1993) and the deadline for submission (August 1993), the assessment development subcontractor (ETS) coordinated the development of over 220 multiple-choice items, 320 short constructed-response items, 125 extended constructed-response items, and 17 hands-on tasks. Concern about compressing this critical development activity increases when one keeps in mind that not only do these items and tasks serve as the pool from which the final assessment will be built, but also they will be readministered in subsequent assessments to obtain trend information.

The impact of this compressed development was confirmed during discussions with individuals involved in recent NAEP assessment development efforts. They consistently reported that more time was needed to pilot and revise items, and that items and tasks should be piloted in settings in which it is possible to determine how students' responses are related to their understanding of the content being assessed. This is particularly important for the extended constructed-response items and performance tasks. Individuals who have studied students' responses to these items have concluded that, in many cases, it was clear that students often did not appear to know what was expected of them in order to respond in ways that were consistent with the scoring guides. More specifically, these observations may indicate that (1) task goals may not have been clear to the students, or (2) tasks may not have been worded in ways that elicit knowledge-based differences in students' responses, or (3) scoring systems did not capture those differences.

Addressing these types of issues implies more than field-testing items under assessment conditions. Standard field-testing can work well for multiple-choice items for which there are well-established statistical procedures for determining item quality, but assessment materials designed to measure more complex performances require a different development strategy. It is important to understand

Page 137 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

how students solve problems and how they perceive the goals of the task. This involves using cognitive laboratories to talk with students or groups of students about their strategies, their perceptions of the task, and their understandings of the nature of the response that is required. Particular attention should be paid to refining scoring rubrics based on pilot-test and field-test results, focusing explicitly on distinguishing among the kinds of responses that indicate differential understanding.

NAEP's redesign plans initially extended assessment development by a year in order to provide 12 months for item and task development and field-testing (as occurs now) and an added year for a dry run of the final assessment. The purpose of this dry run was to obtain statistical information that would make it possible to perform data analyses and achievement-level setting more rapidly and efficiently following the administration of the operational assessment in the following year (and thus issue reports of initial results in a timely fashion). This plan was recently abandoned, however, apparently because of the high cost of conducting the dry run of the assessment.

We urge the NAEP program to reconsider adding a year to the assessment development cycle, but to devote it to the preliminary development and small-scale piloting that is needed to produce high-quality assessment materials that can better reflect the intent of NAEP's frameworks. This additional pilot test year is particularly important for constructed-response items, performance tasks, and the array of assessment methods that we envision as important components of NAEP in the future. Additional development time is essential if these assessment materials are to capture important differences in levels of students' understanding based on the current theory and research and if such differences are to be part of the interpretive information provided in reports of results. Development of assessment materials and scoring rubrics that accomplish this is not a simple task, and the extra year of development time is critical. Field-testing could then occur in the following year, followed by the administration of the operational assessment in the year after that.

Broader Conceptualization of Student Achievement

In addition to improving the assessment of important cognitive skills presented in the current frameworks, we contend that NAEP frameworks should incorporate a broader conceptualization of achievement, and that there is considerable research on cognition, learning, and development that could inform the design, conduct, and interpretation of NAEP (see also Greeno et al., 1997; National Academy of Education, 1997; National Research Council, 1999a). NAEP's frameworks currently do not adequately capitalize on current research and theory about what it means to understand concepts and procedures, and they are not structured to capture critical differences in students' levels of understanding. They also do not adequately describe more comprehensive goals for student

Page 138 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

achievement that go beyond subject-matter knowledge and focus on the skills and abilities that will be important to an educated person in the next century (see for example, SCANS Commission, 1991). Dimensions of achievement not adequately reflected in current frameworks and assessments include (National Academy of Education, 1997):

Problem representation: building representations of problem-solving situations and drawing inferences to be used in problem solution, including planning steps for problem solution, planning for alternative outcomes, and planning steps to be taken as a result of those outcomes.
Use of strategies: selection and execution of appropriate problem-solving steps needed to accomplish the goal (based on the understanding of the task).
Self-regulatory skills: monitoring and evaluating strategies during problem solution and implementing corrective actions.
Explanation: drawing on existing knowledge to explain concepts and principles; providing principled justification for steps taken in problem solving.
Interpretation: synthesizing and evaluating information from various perspectives, understanding the relationships of claims, evidence, and other sources of information.
Individual contributions to group problem solving: building and using knowledge resources while engaging in group problem solving; recognizing competence in others and using this information to judge and perfect the adequacy of one's own performance.

We contend, as did the NAE panel, that advances in the study of cognition provide valuable insights into problem solving, explanation, interpretation, and how complex understanding is achieved, and they can be used to inform the development of assessments that better measure these dimensions of achievement than can the current array of broadly used large-scale assessment technologies.

Theories of Cognition and Learning

The conceptualization of cognition that has emerged over the last decade views knowledge as not only residing in the "head of the individual," but as a derivative of how individuals operate collectively in a larger set of social settings and contexts. The latter perspective construes knowledge as being "situated" and views attempts to decontextualize it or fragment it as antithetical to the distributive, socially shared perspective on knowledge.

These perspectives on the nature of knowledge and skill raise serious questions about what should be assessed and the manner of assessment. With regard to the latter, it has been argued that the assessment technologies currently in use to develop, select, and score test items and tasks, and thus to determine NAEP's

Page 139 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

summary scores and achievement-level results, treat content domains and cognition as consisting of separate pieces of information, e.g., facts, procedures, and definitions. This fragmentation of knowledge into discrete exercises and activities is the hallmark of ''the associative learning and behavioral objectives traditions,'' which dominated American psychology for most of this century (Greeno et al., 1997). This "knowledge in pieces" view has dominated learning theory and instructional practice in America, as well as assessment and testing technology. As noted by Mislevy (1993), "It is only a slight exaggeration to describe the test theory that dominates educational measurement today as the application of 20th century statistics to 19th century psychology" (p. 19). Much of current testing technology, notwithstanding changes made in scaling methods and measurement models, is based on an underlying theory that allows tasks to be treated as independent, discrete entities that can be accumulated and aggregated in various ways to produce overall scores. This model also allows for a simple substitution of one item for another or one exercise for another based on parameters of item difficulty.

In contrast to the approach currently employed in NAEP, contemporary cognitive theorists would argue that inferences about the nature of a student's level of knowledge and achievement in a given domain should not focus on individual, disaggregated bits and pieces of information as evidenced by questions students can answer correctly. More important is the overall pattern of responses that students generate across a set of items or tasks. The pattern of responses reflects the connectedness of the knowledge structure that underlies conceptual understanding and skill in a domain of academic competence. Thus, it is the pattern of performance, over a set of items or tasks explicitly constructed to discriminate between alternative models, that should be the focus of assessment. The latter can be used to determine the level of a given student's understanding and competence within a given domain of expertise. Such information is interpretive and diagnostic, highly informative, and potentially prescriptive.

Another important construct derivable from a contemporary cognitive perspective is that achievement is captured less by the specific factual, conceptual, or procedural knowledge questions that one can answer, and more by the extent to which such knowledge is transferable and applicable in a variety of tasks and circumstances. To know something is not simply to reproduce it but to be able to apply or transfer that knowledge in situations that range in similarity to the originally acquired competence. A third salient feature of contemporary views of cognition is that a person's knowledge, understanding, and skill are demonstrated by the capacity to carry out significant, sustained performances. Often, such performances may extend well beyond a few minutes and can extend to days, months, and even years (in the case of student research projects). A corollary is that such performances are often dependent on collaboration with others in a group. Especially significant are group situations that emphasize distributed

Page 140 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

expertise and the sharing of knowledge across individuals to enable successful performance of a major task.

These perspectives imply, first, that an assessment program such as NAEP can be truly informative about the nature of student achievement only if (1) it includes a wide array of tasks tapping the various facets of knowledge and understanding and (2) the information gathered from this array is not merely reduced to summary scores. This is true whether the subject area is mathematics, reading, science, or history. Second, it needs to involve extended individual and group performances. Even though NAEP has increasingly included more performance tasks and constructed-response items in lieu of heavy reliance on multiple-choice items, the kinds of extended items and tasks that can be administered under the constraints of large-scale survey conditions do not reflect the level of complexity that many feel is necessary for assessing achievement in subject areas.

A NAEP that is more reflective of contemporary perspectives on cognition would (1) assess a broader range of student achievements, (2) be more concerned with describing exactly what it is that students know rather than simply attempting to quantify their knowledge, and (3) would place increased emphasis on qualitative descriptions of students' knowledge as an essential supplement to quantitative scores. For example, in mathematics the goal would not be just to describe whether students could solve problems, but how they solved them or why they could not solve them. The implications of incorporating a cognitive perspective into NAEP on the types of results that can be reported is discussed in more depth in a later section of this chapter.

The arguments for including a broader conceptualization of achievement in NAEP are strengthened further when one examines the degree to which these aspects of student achievement are consistent with the more comprehensive goals for schooling that have been put forth as required skills and abilities for an educated person in the next century (Resnick, 1987; SCANS Commission, 1991; Murnane and Levy, 1996). These reports suggest, among other things, the critical importance of communication skills, reasoning skills, and the ability to work with others using technologies to accomplish meaningful tasks.

It is notable that there are a variety of skills emphasized in these reports, as well as in the dimensions of achievement that we have discussed here, that educational assessment techniques have no way to measure in a large-scale assessment setting, such as that in which NAEP is currently administered:

solving complex, meaningful problems using technological tools,
making persuasive presentations and arguments in conversation,
finding and researching questions that are worth pursuing,
figuring out what is going on in some complex situations and being able to diagnose problems with the process,
designing artifacts and systems to accomplish meaningful goals,
taking responsibility for completing a substantial piece of work,

Page 141 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

listening to what other people are saying and being able to make sense of different viewpoints,
asking facilitative questions of other people and getting them to think about what they are doing,
understanding deeply several domains of inquiry of particular interest to the student,
reading on their own materials that relate to their interests and goals, and
working well with others to plan and carry out tasks.

Re-creating these cognitively complex performances in assessment materials may not even be possible. However, extracting data from naturally occurring student performances by videotaping student activity and computer-based analysis of students' written work offers promise as alternative means of data gathering on these aspects of achievement. Given that the skills listed above are important goals for education, it is critical that a program that assesses educational progress in America find a way to assess such aspects of student performance —we propose a general model to address this issue in a later section.

Current Knowledge: Possibilities and Limits

We have argued that the assessment of student thinking should be a clearly articulated priority for NAEP, and, insofar as possible, the frameworks and assessments should take advantage of current research and theory (both from disciplinary research and from cognitive and developmental psychology) about what it means to know and understand concepts and procedures in a subject area. This strategy should be reflected in efforts to improve the assessment of subject areas delineated in the frameworks and to assess dimensions of achievement not currently emphasized in the frameworks.

In arguing for such an approach, we recognize that achieving this objective is an incremental process predicated on the existence of well-developed theories and sufficient research on student understanding to guide assessment development activities. Such knowledge does not exist for all portions of subject-area domains or for all dimensions of achievement. For example, there are major differences in the degree to which detailed theories and fine-grained descriptions exist for student understanding and performance in various aspects of reading, mathematics, and science. However, there is sufficient extant knowledge to significantly improve the design of current assessment materials and embark on the task of developing new ones.

In reading, the current NAEP assessment was developed around theories of reading that were evident in the framework, but more needs to be done to make assessment of important aspects of the framework possible. In some cases, important intentions of the framework are not evident in the assessment itself. For instance, the current assessment ignores the interaction between the type of

Page 142 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

text and the purpose of reading the text and does not fully assess the depth of understanding needed to explore literary texts or learn about challenging subject matter. The material in Appendix A provides illustrations of improvements that can be made in reading assessment materials that begin to address some of these issues. Additional examples could be generated for the various text types and for the different reading tasks specified within the current frameworks by drawing on the considerable body of research currently available on the text structure factors influencing comprehension, on the strategies used to effectively process texts given different reading purposes, and on the evaluation of students' representation of the various elements of a given text.

In mathematics, there is a growing body of research on students' understanding of mathematical concepts (see Grouws, 1996, for several examples). Thus, it is possible to pursue a type of task and item development strategy that focuses on differentiating specific levels and types of student understanding and to do so for a number of important topics in mathematics, including many that fall within the existing NAEP mathematics frameworks. Appendix C provides a concrete example of such a process of translating results from research about student learning into assessment tasks. A set of items is presented that systematically differentiates levels of student understanding for the conceptual domain of number patterns. The example provided, like the example shown in Appendix A for reading, starts from existing NAEP materials but significantly augments how items are structured individually and collectively, thereby enhancing what can be determined about levels of students' understanding in the domain. Later in this chapter, we discuss the relevance of this example in the context of recommendations for providing more informative portrayals of student achievement in NAEP reports.

In science, research is somewhat more limited than in the areas of reading and mathematics. Nonetheless, there are detailed investigations of how students build their understanding in various conceptual areas (e.g., electricity and circuits, force and motion) and how to assess the form and scope of such understanding, especially to assist instructional decisions (Minstrell, 1991; Minstrell and Hunt, 1992; White and Fredericksen, 1998). This type of systematic knowledge of the levels at which students understand and represent physical concepts, principles, and/or situations is a starting point for developing highly informative assessment tasks that could be used in large-scale survey assessments such as NAEP. An example of how these investigations can be used as a foundation for constructing assessment materials is shown in Appendix B and in a research paper by James Minstrell (1999) in a volume that accompanies this report.

The area of science performance assessment, which was discussed earlier as problematic in NAEP's 1996 assessment, provides an especially powerful example of how the design and evaluation of innovative assessments can be informed by cognitive theory and research on the nature of subject-matter expertise. As noted earlier, a major aspect of the recent NAEP science assessment

Page 143 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

TABLE 4-2 Cognitive Activity and the Structure of Knowledge

	Structure of Knowledge
Cognitive Activity	Fragmented (developmentally immature)	Meaningfully Organized (developmentally mature)
Problem representation	Surface features and shallow understanding	Underlying principles and relevant concepts
Strategy use	Undirected trial-and-error problem solving	Efficient, informative, and goal oriented
Self-monitoring	Minimal and sporadic	Ongoing and flexible
Explanation	Single statement of fact or description of superficial factors	Principled and coherent
SOURCE: Adapted from Baxter and Glaser (in press).

frameworks is the inclusion of scientific investigation and the use of hands-on performance tasks to assess these aspects of the domain. Such assessment innovations in large-scale surveys are highly laudable, but there are serious limitations to assessing these aspects of the domain in a large-scale format.

Baxter and Glaser (in press) have proposed an analytic framework for investigating the cognitive complexity of science assessments. Their framework juxtaposes the components of competence derived from studies of the development of expertise with the content and process demands of science subject matter. Table 4-2 from Baxter and Glaser (in press) illustrates critical aspects of cognition that are the desired targets of assessment in science (and other knowledge domains) and how these elements are typically displayed when the structure of a student's knowledge and understanding is fragmented and developmentally immature versus meaningfully organized and representative of higher levels of expertise and understanding. The cells in Table 4-2 provide capsule descriptions of the behaviors representative of a particular combination of cognitive activity and stage of knowledge structure development.

As argued by Baxter and Glaser, an analysis of the cognitive complexity of assessment tasks must take into account both the demands of the domain in which cognitive activities are manifested and their realization in the assessment situation. To capture the latter, they developed a simple content-process space (see Figure 4-5) that depicts the relative demands of the content knowledge and science process skills required for successful completion of a given science assessment task. In this space, task demands for content knowledge are conceptualized as falling on a continuum from rich to lean. At one extreme are knowledge-rich tasks that require in-depth understanding of subject-matter topics for task execution

Page 144 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

FIGURE 4-5 Content-process space. SOURCE: Baxter and Glaser (in press).

and completion. At the other extreme are tasks that are not dependent on prior knowledge or experience. Instead, performance is solely dependent on information given in the assessment situation. The task demands for process skills are also conceptualized as lying on a continuum from constrained to open. Process-constrained situations include those with step-by-step directions or highly scripted task-specific procedures for task completion. Hands-on science performance assessment tasks, as well as other innovative formats for science assessment (see Shavelson, 1997), can involve many possible combinations of content knowledge and process skills.

The content-process space, together with the components of competence mentioned in Table 4-2, provide a framework for examining the cognitive complexity of science assessments, including those currently in use in NAEP and any that might be developed under existing or modified frameworks. Analyses of a diverse range of science assessments illustrate matches and mismatches between the intentions of test developers and the nature and extent of cognitive activity elicited in an assessment situation. Such analyses also serve to illustrate the

Page 145 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

degree of correspondence between the quality of observed cognitive activity and performance scores (Baxter and Glaser, in press). As we noted earlier, many performance assessments constructed for large-scale survey instruments such as NAEP fall into the quadrant of Figure 4-5 described as knowledge-lean and process-constrained. In part, this is due to time limitations of the testing scenario and attempts to reduce sources of bias that could influence students' performance. However, the consequence is a limitation on the nature of what can be learned about some of the more cognitively complex aspects of science achievement incorporated into the NAEP frameworks. This is not a limitation of performance assessments per se but of their design and implementation within the constraints of typical large-scale survey administration. As we argue subsequently, creating performance assessments that sample from all aspects of the space represented in Figure 4-5 is probably a desired goal and may well require different methods and modes of data collection.

The examples we have provided of the application of cognitive theory and research to the design of enhanced assessment materials are only illustrative. Accepting the reality that there are limits to how extensively cognitive theory and research can be applied to task and item development, in areas for which such knowledge exists, it should play a central role in framework and assessment development. In portions of the subject-area domain for which little research exists, assessment development should take into account more than the content and structure of the discipline. For example, there are other sources of information about student thinking than those found in formal theory and research.

Teachers and other individuals who work intensively with students can offer informed perspectives regarding how students think about a subject. They can, for example, identify misconceptions, patterns of errors, and strategies. Furthermore, what we learn from the results of the assessment can and should be used to improve future assessments. Thus, an increased emphasis on student understanding and a broadening of the conceptualization of achievement assessed by NAEP is worthy of consideration, even if we accept the fact that assessment development cannot be grounded entirely in disciplinary and cognitive theory and research at present.

Multiple Methods for Measuring Achievement

The goals we have argued for in this chapter pose significant challenges for assessment development and assessment administration and operations. In this section we present a model for considering how the design of NAEP can evolve to accomplish these goals.

When considering the assessment of a domain, four general sets of questions should guide the framework and assessment developers:

Have we been clear about the kinds of inferences we wish to make about

Page 146 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

achievement when we report results? Have the aspects of achievement about which we want to make inferences been clearly articulated in the framework? Have we specified exactly what aspects of student achievement we intend to measure?

Assuming that the kinds of inferences to be made have been identified and articulated, what methods of assessment and types of assessment tasks provide students with appropriate opportunities to display their performance in the aspects of achievement of interest?
Is the assessment task organized and presented in a way that elicits the levels and types of student responses that are needed to support the kinds of inferences that you wish to make?
Does the scoring system capture critical aspects of student performance and permit distinguishing the relative quality of different performances?

If this set of questions is used as guidance, an assessment system designed to measure student achievement in the subject-area domains described in the frameworks—as well as the broader conceptualizations of achievement—would consist not only of the current large-scale survey assessments, but would also include a range of assessment methods—a new paradigm NAEP.

We propose that new paradigm NAEP adopt a design strategy whereby its assessments better match specific assessment technologies with the constructs to be assessed and the types of inferences to be drawn about student achievement. The types of technologies range along a continuum from large-scale survey assessments comprised primarily of multiple-choice and short constructed-response items to less-constrained, moderately open assessments (but still conducted at a single point in time) to relatively unconstrained observations of student performance obtained over longer time periods. The portion of the construct domain assessed also can range along a continuum, to some extent but not completely paralleling the continuum of assessment methods. Large-scale surveys can be used to assess individual cognitive constructs in the domain, not necessarily less complex or less important constructs, and each individual item assesses only a small slice of the domain. Families of items, such as the example in Appendix C, can assess a larger portion of the domain and levels of understanding within a cognitive construct. Moderately open assessments can assess related sets of cognitive constructs and also assess larger portions of the domain; highly open, less constrained tasks assess a range of simple and complex constructs in the domain and typically will cover large segments of the domain. The proposed continuum of assessment technologies and tasks also affords the opportunity to sample aspects of cognition and achievement that are otherwise difficult, if not impossible, to incorporate in restricted response tasks. These include some mentioned previously in this chapter such as problem representation, strategy use, self-regulation and monitoring, explanation, interpretation, argumentation, working with others, and technological tool use in problem solving.

Page 147 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

As we stated in Chapters 1 and 2, a major component of this new paradigm NAEP is a core NAEP, consisting of large-scale survey instruments. Core NAEP would continue to track trends in achievement for both national NAEP and state NAEP in core subjects. Core subjects would include reading, mathematics, science, and writing, and any other subjects, such as U.S. history or geography, in which assessments are administered frequently enough to establish trend lines. However, core NAEP alone cannot assess all important aspects of student achievement. The second major component in our proposed design is multiple-methods NAEP, consisting of alternative surveys and assessments. These components should be used to assess (1) components of core subject area frameworks that are not well suited for assessment via large-scale surveys, (2) nontrend subject areas, (3) achievements of members of special populations who cannot participate in the large-scale surveys, and (4) achievements of students with specific instructional experiences (e.g., fine arts, advanced mathematics).

We contend that implementing a multiple-methods NAEP will be required in order to appropriately assess all aspects of the current frameworks as well as the broader conceptualizations of achievement discussed earlier in this chapter. Specifically, alternative methods will be required to assess aspects of student achievement not well assessed by large-scale surveys (e.g., performing investigations in science, solving problems in a group setting). In addition, multiple-methods NAEP is appropriate for assessing targeted samples of students with specific instructional experiences (e.g., advanced mathematics, fine arts, economics). An overview of the measures of student achievement in new paradigm NAEP is presented in Table 4-3.

Although we contend that a wider range of methodologies must have a place in new paradigm NAEP to appropriately assess all aspects of the current frameworks and to be able to assess broader dimensions of achievement, we simultaneously recognize that this would simply not be feasible, financially or logistically, if it were assumed that all assessment methods were administered to a sample of students as large as those to whom the current large-scale survey assessment instruments are administered. Smaller samples of students, and samples less fully representative of the nation should be used, as one moves along the assessment continuum. These issues, including costs associated with a multiple-methods NAEP, are considered in the next section.

Features of a Multiple-Methods Assessment System

If a multiple-methods approach were implemented, each core subject-area assessment would consist of a combination of the large-scale survey instruments and multiple alternative assessments. Insofar as possible, data from multiple-methods NAEP and core NAEP's large-scale surveys should be linked, and data from all methods administered across a subject area should be used to represent student achievement in NAEP's reports (i.e., summary scale score results from

Page 148 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

TABLE 4-3 Overview of New Paradigm NAEP

	Assessment Method
		Multiple-Methods NAEP
Assessment Purpose	Core NAEP (Standard Large-Scale Survey)	Alternative Surveys	Alternative Assessment Methods
Reporting trends using proficiency scores	X
Reporting trends using achievement levels	X
Assessment of students with special needs who cannot be included in standard assessments			X
Assessment of nontrend subjects		X	X
Assessment of samples of students with specific instructional experiences		X	X
Assessment of constructs not well assessed by large-scale surveys		X	X

large-scale assessment surveys should not be the only source of information used to represent student achievement).

Multiple-methods NAEP should explore such technologies as the use of clinical interviews and protocol analysis, assessment of group performance, and technology-based modes of assessment (e.g., computer-based analyses of collections of naturally occurring data on student classroom performances) as alternative methods for assessing how students think and learn. In the short term, NAEP should use alternative methods of assessment to administer components of the existing large-scale survey for which that method is not the most appropriate mode of data collection (e.g., science hands-on performance tasks).

Our recommendation for the use of multiple assessment methods is in some ways similar to one proposed by the current testing subcontractor, Educational Testing Service, in its 1997 report, NAEP Redesigned, one of several papers submitted to NCES to inform planning for the current redesign of NAEP (Johnson et al., 1997). In that document, ETS proposes a "modular" assessment design as

Page 149 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

one option for future NAEP. A key feature of the design is the proposal to administer more open, performance-based tasks to smaller samples of students, contending that the information needed to make appropriate inferences about student performances can be obtained from these more limited samples, if such samples can be linked to those taking the large-scale survey assessment. The proposal falls short, however, in one way; an assumption appears to be made that the constructed-response items and tasks that were originally developed for a large-scale assessment mode should be administered and scored as they previously had been—just to smaller samples of students. The multiple-methods approach that we are recommending should entail development of tasks and scoring rubrics that support collecting more in-depth descriptive information than what is currently gathered through even the most ''open'' items and tasks on the current main NAEP large-scale survey assessments.

Planning and Implementation Challenges

We make our recommendation for a multiple-methods NAEP with the recognition that full implementation of such a strategy is not immediately practical or feasible. Progress must be accomplished in three areas before multiple-methods NAEP could consist of the range of types of assessment methods that we have discussed in earlier sections of this chapter: (1) strategies for managing the costs of development, administration, scoring, and analysis of alternative surveys and assessments must be in place; (2) the research base for understanding the measurement attributes of such alternative methods must be expanded; and (3) current models used for the development of assessment materials must be changed.

Planning and implementation of a multiple-methods strategy must be undertaken with the recognition that trade-offs will be necessary to manage costs. We do not recommend that the broader array of assessment types be simply added on to the existing program. A portion of the funds currently devoted to the development, administration, and scoring of the extensive large-scale survey instruments will need to be diverted to multiple-methods NAEP. In our proposed design, some aspects of student achievement described in NAEP's frameworks would no longer be assessed via core NAEP and its large-scale survey instrumentation. The components of the current large-scale surveys that are intended to assess these aspects of achievement (extended-response questions, performance tasks) should therefore be reduced. Funds that are now devoted to developing these types of items and tasks, administering them to large samples of students, and scoring the large number of responses should be used to develop, administer, and score components of multiple-methods NAEP—to smaller samples of students.

The financial impact of a reduction in these types of items and tasks of the large-scale survey instruments is not insignificant. Detailed NAEP budgets were not available to us, but we did determine that approximately 35 percent of the

Page 150 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

budget for national NAEP is allocated for assessment development, field testing, and the scoring of student responses. On a per item basis, extended constructed-response and performance task item types require a disproportionate share of these funds. In fact, the increased representation of such items in NAEP's large-scale assessment has led to almost geometric increase in costs (Johnson et al., 1997). Thus, it is reasonable to conclude that significant savings would result from reducing the number of these item types in NAEP's large-scale survey.

NAEP's current assessment development and operations subcontractor, ETS, has presented analyses that show that such trade-offs—in which more complex (and expensive to administer and score) assessment materials are administered to smaller samples of students—could indeed be accomplished using NAEP's current financial resources; they could even result in cost savings (Johnson et al., 1997). Such savings could then be allocated to the development of the broader range of assessment materials needed to better assess the current frameworks and to adequately assess other aspects of achievement not currently measured by NAEP.

There are also considerable challenges associated with developing, administering, scoring, analyzing, and reporting results from alternative methods, and research and development efforts to date have not provided clear and complete solutions to these challenges. Developing assessment materials to assess complex constructs has been difficult, and there are no well-established strategies for developing such materials. In addition, the reliability and generalizability of such assessments has not been as high as is desirable. Data collection scenarios for multiple-methods NAEP must also circumvent the problem of the lack of student motivation that is the likely cause of the low response rates observed on extended-response items and tasks on the current large-scale survey. We anticipate that an increased reliance on the analysis of students' classroom work products may be necessary to ameliorate the lack of motivation exhibited by some students in a low-stakes assessment such as NAEP. Accelerated research regarding the use of naturally occurring student work as a basis for the assessment of student achievement is imperative.

Successful development of multiple-methods NAEP also requires that new models for the development of assessment materials be implemented. The development processes and "machinery" used by large testing subcontractors to rapidly develop large numbers of multiple-choice and short constructed-response items is inappropriate for the development of the types of assessments we envision for multiple-methods NAEP. Iterative review and revision based on a series of tryouts and follow-up discussions with individual students or small groups of students in cognitive laboratory settings will be needed, with an emphasis on the development of smaller quantities of assessment materials that more successfully assess complex performances and levels and types of student understanding. Such cognitive laboratory tryouts during the initial stages of assessment development are currently being used in efforts to improve NAEP's background questionnaires

Page 151 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

and in the development of reading and mathematics items for the proposed voluntary national test (National Research Council, 1999b).

It will also be important to include individuals with a broader range of expertise in assessment development activities than has previously been the case. Disciplinary specialists who conduct research about student learning and cognition as well as cognitive and developmental psychologists must be represented on committees that develop the frameworks and the assessment materials if implementation of the strategies we have recommended is to be accomplished.

In addition to an exemplary design team, a successful development process relies on iteratively updating frameworks and conceptions of student thinking based on research and practice. Indeed, if, as we envision, NAEP is but one component of a larger system of data collections for assessing educational progress, then the range of contextual, interpretive information gained from this system could inform the development of the next generation of frameworks and assessments in new paradigm NAEP.

Progress in the areas described above will not be easy to achieve and implementation of a multiple-methods NAEP will be incremental and evolutionary. For example, we anticipate that, largely for reasons of cost, multiple-methods NAEP would initially only be conducted as part of national NAEP, with the most feasible and informative components carried over to state NAEP administrations on a gradual, selected basis. However, despite the challenges posed by costs and funding reallocations, the need for an expanded research base, and the need to change assessment development models, the alternative is an unacceptable status quo—a NAEP that measures only those aspects of student achievement that can be assessed through a single, "drop-in-from-the-sky" large-scale survey and leaves other parts of the framework unaddressed. That alternative relegates NAEP to the role of an incomplete indicator of student achievement.

Portraying Student Achievement in NAEP Reports

Implementation of the committee's recommendations—to improve the translation of the goals of current frameworks into assessment materials and to evolve the frameworks to encompass broader conceptualizations of student achievement—would enable NAEP to produce broader and more meaningful descriptive information, both quantitative and qualitative. At a minimum, it would lead to an improved understanding of the current NAEP summary score results and, if capitalized on appropriately, would provide a much more useful picture of what it means to achieve in each subject area. This information would support the desires of NAEP's users for the enhanced interpretive function of NAEP discussed in Chapter 1. In this section, we further evaluate NAEP's current methods for portraying student achievement and describe how, even prior to the full implementation of the recommendations presented in this chapter, NAEP could improve the breadth and depth of how student achievement is portrayed.

Page 152 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

NAEP's Current Portrayals of Student Achievement

A primary means by which NAEP currently describes student achievement is through summary scale scores, expressed on a proficiency scale from 0 to 300, 0 to 400, or 0 to 500. Summary scores (i.e., mean proficiencies) are reported for the overall national sample at each grade (4, 8, and 12) and for major demographic subgroups. In NAEP's 1996 mathematics and science Report Cards, the subgroups for which scale scores were reported were geographic regions, gender, race/ethnicity, level of parents' education, type of school, and socioeconomic level as indicated by a school's Title 1 participation and by free/reduced-price lunch eligibility (O'Sullivan et al., 1997; Reese et al., 1997). In previous Report Cards and in various follow-up reports, summary scores have been presented for additional subgroups (e.g., amount of television watching, time spent on homework). However, reporting by these types of variables in the Report Cards was recently abandoned by NAEP in an effort to streamline the reports, and because such stand-alone portrayals of student proficiency have been criticized for leading users to make inappropriate causal inferences about the effect of these single variables on student achievement.

This latter concern notwithstanding, in addition to the Report Cards, NAEP also produces a variety of briefer follow-up reports, which are generally released 12 to 30 months after the release of the Report Cards. These reports provide the results of univariate analyses in which mean proficiency scores are presented as a function of variables presumed to be related to achievement (i.e., summary scores in reading as a function of number and types of literacy materials in the home; summary scores in history as a function of amount of time spent discussing studies at home each day).

Another important means of reporting NAEP results is by the percentage of students performing at or above NAEP's basic, proficient, and advanced achievement levels. Achievement-level setting and the reporting of achievement-level results are discussed in Chapter 5.

Toward More Informative Descriptions of Student Achievement

In Chapter 1 we concluded that scores that summarize performance across items are, in general, reasonable and effective means for NAEP to fulfill the descriptive function of a social indicator. They provide a broad-brush view of the status of student achievement (albeit a more limited definition of achievement than we advocate) and do so in a way that can, when necessary, attract the attention of the public, educators, and policy makers to the results. However, summary scores should not be viewed as the only type of information needed to understand and interpret student achievement. In NAEP, we have argued that they represent performance on only a portion of the domain described in the frameworks, and thus they provide a somewhat simplistic view of educational

Page 153 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

achievement. On their own, do they not allow NAEP to adequately fulfill one of the interpretive functions of a social indicator—that is, they do not provide information that helps NAEP's users to think about what to do in response to NAEP results. More in-depth descriptive portrayals of student achievement are needed for this function to be fulfilled.

For example, much of the current debate regarding curriculum reform focuses on what should be taught, and decisions about what to teach are not entirely the province of curriculum developers and teachers. Policy decisions are made about content coverage and emphasis at state levels. NAEP could and should provide information that would assist those who make these decisions beyond simply portraying subject-area achievement as "better than it was four years ago" or "worse in one region of the country than in another." If one is faced with making a decision whether to shift emphasis in a state mathematics curriculum framework to focus on computational skills, as has recently been the case in California, it would be useful to have specific information about students' achievement in computational skills and how it relates to their understanding of underlying concepts and their ability to apply their skills to solve problems. A single score tells very little about where students' strengths and weaknesses are, nor does it help improve student achievement, whereas a more descriptive analysis of student achievement could provide guidelines for curriculum decisions.

How can NAEP provide the kinds of information about student achievement that is needed to help the public, decision makers, and education professionals understand strengths and weaknesses in student performance and make informed decisions about education? The new paradigm NAEP that we recommend, in which assessment method is optimally matched with the assessment purpose (and the kinds of inferences to be drawn), has great potential to provide an impressive array of information from which such portrayals could be constructed. This entails a shift to more qualitative measures of student achievement, with an emphasis on describing critical features of student knowledge and understanding. In order to make progress in this direction in the short term, the following initial guidelines should be implemented:

Scoring rubrics for constructed-response items and tasks (whether included as part of the large-scale survey assessments of core NAEP or in multiple-methods NAEP) should be constructed to describe critical differences in levels and types of student understanding; for example, rubrics should not be constructed simply to capture easily quantifiable differences in numbers of correct examples given or reasons cited. Thus scale scores generated from the accumulation of student responses would be more valid reflections of the intent of both current and envisioned frameworks.
Scoring rubrics for constructed-response items and tasks should allow for the accumulation of information about more than one aspect of a student's performance. Although current scaling and analysis methodologies may not

Page 154 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

enable all such information to be reflected in summary scores, information gleaned from student responses can be used to provide informative and useful descriptions of achievement.

Assessment instruments should include families of related items, designed to support inferences about the levels of student understanding in particular portions of the frameworks. Analysis of patterns of student responses across these items can reflect the knowledge structure that underlies students' conceptual understanding, providing a richer interpretive context for understanding overall achievement results. In such a scenario, families of items serve as the unit of analysis; that is, each item is not simply a discrete source of information unconnected to other items. If we presume that these responses also contribute to summary scores, then this has implications for scaling —and appropriate modifications to existing scaling methodology would need to be explored and implemented.
Finally, in an ideal situation, the reporting of information that provides an interpretive context for understanding patterns of achievement results would be released along with the Report Card that presents summary score results for the nation and major subgroups. However, given the current pressures to release summary results on an accelerated schedule, providing interpretive analyses in the Report Cards may not be feasible, at least in the short term. NAEP's current type of univariate interpretive follow-up reports represents a first-order type of interpretive reporting. We envision much more in-depth analyses, such as those described in the example in the following section. This level of analysis undoubtedly will present challenges to NAEP's time frames for reporting, which have been focused on presenting summary score results as shortly as possible after the administration of the assessment. Nevertheless, reports that provide interpretive context should be released by NCES as quickly as possible after the release of Report Cards, accompanied by the same kinds of high profile press conferences and press release packets that are used for the release of reports of national and state summary results. Although timely reporting of summary score results is a necessary and laudable goal, when these results are released in the absence of information that provides an interpretive context for helping users understand results, then the value of NAEP as an indicator is much diminished.

A Successful First Step: NCTM's Interpretive Reports

A multiple-methods NAEP has the potential to provide an array of in-depth information about achievement in NAEP disciplines; still, it is a relatively easy task to glean more detailed information from the current assessments than presently occurs. Examination of data (particularly students' responses to constructed-response items) from the current assessments provides a basis for profiling student knowledge. For example, it is possible to analyze students' specific errors, examine the quality of their explanations, and interpret overall performance

Page 155 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

on relevant clusters of items in ways that characterize what students can and cannot do.

Since the first mathematics assessment, the National Council of Teachers of Mathematics has written interpretive reports based on the analysis of students' responses to individual NAEP items. These reports, supported by funding external to NAEP, characterize student performance at different levels of detail appropriate for different audiences. For example, the most recent monograph, reporting on the sixth NAEP mathematics assessment, administered in 1992, includes an analysis of students' understanding of basic number concepts and properties, their computational skills, and their ability to apply number concepts and skills to solving problems, based on examinations of items that assess these skills and concepts (Kenney and Silver, 1997). The report includes data across approximately 100 individual NAEP items. For some items, responses are analyzed in some detail; for others, p-values are reported. The reports, however, go well beyond cataloging the results for individual items. Patterns of responses and errors are analyzed to draw conclusions about student performance on specific topics. For example, the authors of the 1996 report concluded (Kenney and Silver, 1997:137-138):

[S]tudents at all three grade levels appear to have an understanding of place value, rounding and number theory concepts for whole and rational numbers in familiar, straightforward contexts. Students' understanding improves across grade levels but falls when the contexts are unfamiliar or complex. Students at all three grade levels perform well on addition and subtraction word problems with whole and rational numbers that are set in familiar contexts and only involve one step calculation. … [S]ome students at all three grade levels attempt to solve multistep problems as though they involved single-step procedures. …

The most troubling results were the low performance levels associated with students' ability to justify or explain their answers to regular and extended, constructed-response items.

The NCTM interpretive teams have consistently documented that the most critical deficiency in students' learning of mathematics at all ages is their inability to apply the skills that they have learned to solve problems. This conclusion is consistently supported by the fine-grained analysis of student performance in virtually every content area of the mathematics framework. The analyses also provide a perspective on relations between skill acquisition and the development of understanding of fundamental concepts. These conclusions, based on interpretive analyses of students' responses, address issues that are at the core of public debate regarding curriculum choices. NAEP should help inform this debate and provide a basis for more informed policy decisions by integrating these types of analyses and reports into plans for assessments in all NAEP subject areas.

A good step in this direction was NAEP's establishment of collaborative

Page 156 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

relationship with arts organizations to develop reports and dissemination strategies for the 1997 NAEP arts assessment. The collaboration with NCTM to conduct and report the results of interpretive analyses should be continued, and similar collaborations with organizations in NAEP's other subject areas should be established (e.g., the National Council of Teachers of English, the International Reading Association, the National Science Teachers Association, the National Council for Social Studies).

Although the NCTM interpretive teams have learned a great deal by analyzing student performance, the NAEP mathematics assessment is not specifically designed to support these kinds of within-and across-item analyses. Much could be improved in the structure of NAEP items and rubrics to better capture students' understanding in mathematics. Because the response data are not accumulated in ways that facilitate these analyses (Kenney and Silver, 1997), the interpretations are less explicit than they might be if the assessment were specifically designed to support them. The conclusions identify both specific and general areas of student weakness, but it is not possible to aggregate data to provide specific percentages of students who demonstrated understanding of core concepts or proficiency in essential skills or who meet benchmark criteria for applying concepts and skills to solve problems, because the assessments were not designed to include sets of items that ensured that this sort of analysis and reporting would be possible.

The NCTM reports provide an example of the educationally useful and policy-relevant information that can be gleaned from students' responses in the current assessments, and they point toward the even more useful information that could be provided if assessments were developed with these analyses in mind. A first step in this assessment development strategy—the development of families of items for use in large-scale assessments—is discussed next.

Recommended Next Step: Developing Item Families

The notion of item families in NAEP was first articulated in the framework for the 1996 main NAEP mathematics assessment. However, an analysis conducted by Patricia Kenney for this committee showed that the sets of items included in the 1996 mathematics assessment exhibited few of the characteristics of either of the two kinds of families of items described in the framework (Kenney, 1999). The framework describes two types of item families: a vertical family and a horizontal family. A vertical family includes items or tasks that measure students' understanding of a single important mathematics concept within a content strand (e.g., numerical patterns in algebra) but at different levels, such as providing a definition, applying the concept in both familiar and novel settings, and generalizing knowledge about the concept to represent a new level of understanding. A horizontal family of items involves the assessment of students' understanding of a concept or principle across the various content strands in the NAEP program within a grade level or across grade levels. For example, the

Page 157 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

concept of proportionality can be assessed in a variety of contexts, such as number properties and operations, measurement, geometry, probability, and algebra. The framework also suggested that a family of items could be related through a common context that serves as a rich problem setting for the items.

In the volume of research papers that accompanies this report, Minstrell (1999) and Kenney (1999) describe strategies for developing families of items for use in future NAEP large-scale assessments of science and mathematics. One such item family in mathematics and the rationale underlying its construction is presented in Appendix C. This family of items assesses the progression of grade 4 students' understanding of numerical patterns; it was constructed using a combination of items from the 1996 main NAEP assessment, supplemented with new items that together form a coherent family. This example illustrates one way in which improved interpretations of students' achievements can be generated by making relatively modest changes to NAEP's current assessment development strategy.

We close this section by reiterating one of the chapter's underlying themes: frameworks and assessments must be designed with goals for reporting as a guide. We urge the implementation of a strategy for reporting NAEP results in which reports of summary scores are accompanied by, or at the very least quickly followed by, interpretive reports produced by disciplinary specialists and based on analyses of patterns of students responses across families of items as well as across multiple assessment methodologies.

A VISION FOR ASSESSMENT DEVELOPMENT IN NAEP

The goals that we have set forth in this chapter are ambitious. They are very challenging from the standpoints of assessment development and assessment administration and operations. These goals—improving the assessment of more complex aspects of the current frameworks, expanding the conceptualization of NAEP's dimensions of achievement; implementing a multiple-methods design, and extracting and reporting more in-depth interpretive information from students' responses— may even seem overwhelming. However, each is critical if an already respected program is to better fulfill its mission of assessing academic achievement and be well positioned to meet the information demands of its users in the next century.

If these goals are implemented, what would be accomplished? What would the new paradigm NAEP look like? How would it differ from what exists now? If the recommendations presented in this chapter were implemented, NAEP would be characterized by:

an assessment development process that is guided by a vision of the kinds of inferences and conclusions about student achievement to be described in reports of NAEP results,

Page 158 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

an assessment design in which assessment purpose is aligned with assessment method,
core NAEP subjects that are assessed using the current large-scale survey (for measurement of trends) and whatever multiple methods are best suited to assess aspects of the framework not well assessed through large-scale surveys,
nontrend subjects assessed using whatever combination of surveys and alternative assessment methods is best suited to meet the goals described in the subject area's framework,
an array of alternative assessment methods to assess the broader conceptualizations of achievement that are included in future NAEP frameworks, and
subject-specific reports of achievement results that include in-depth portrayals of student achievement gleaned from the entire array of methods used to assess a subject area; in core subjects, such reports ideally would also include summary proficiency scores from large-scale assessments and results from achievement level setting.

In Figure 4-6 we present a further-developed view of new paradigm NAEP and other measures of student achievement within the coordinated system of educational indicators that we proposed in Chapter 1.

FIGURE 4-6 Measures of student achievement, including new paradigm NAEP. NOTE: TIMSS = Third International Mathematics and Science Study; NELS = National Education Longitudinal Study; ECLS = Early Childhood Longitudinal Study.

Page 159 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

MAJOR CONCLUSIONS AND RECOMMENDATIONS

Conclusions

Conclusion 4A. The current development of NAEP frameworks and assessments is not guided by a clear vision of the kinds of inferences to be drawn from the results. These frameworks and assessments support neither the reporting of achievement levels nor in-depth interpretations of student performance.

Conclusion 4B. There are many complex steps between framework development and reporting, and the intentions of the framework developers are often lost in this sequence of activities. Although NAEP has made progress in improving continuity from one step to another, attending to the lack of coherence across steps is still a challenge.

Conclusion 4C. Currently, NAEP focuses on the assessment of subject-area knowledge and skills but does not adequately capitalize on contemporary research, theory, and practice in ways that would support in-depth interpretations of student knowledge and understanding.

Conclusion 4D. Measuring student achievement only through NAEP's current large-scale survey precludes adequate assessment of (1) the more cognitively complex portions of the domains described in the current frameworks and (2) expanded domains represented by conceptions of achievement that are responsive to the changing demands of society.

Conclusion 4E. NAEP's current reporting metrics fail to capitalize on interpretive information that can be derived from responses to individual items or sets of items.

Conclusion 4F. Insufficient time is allotted to assessment development, which restricts activities needed for developing the kinds of materials that support more interpretive analyses and more informative reporting.

Page 160 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

Recommendations

Recommendation 4A. The inferences to be made about student performance in NAEP reports should guide the development of NAEP frameworks. These inferential goals should also guide a coherent set of assessment development activities.

Recommendation 4B. NAEP's frameworks and assessments should capitalize on research, theory, and practice about student learning in the content domains to guide (1) the development of items, tasks, scoring rubrics, and assessment designs that better assess the more complex aspects of the content domains and (2) the development of integrated families of items that support in-depth interpretations of student knowledge and understanding.

Recommendation 4C. NAEP needs to include carefully designed targeted assessments to assess the kinds of student achievement that cannot be measured well by large-scale assessments or are not reflected in subject-area frameworks.

Recommendation 4D. NAEP reports should provide interpretive information, derived from analyses of patterns of students' responses to families of related items, in conjunction with the overall achievement results.

Recommendation 4E. More time, attention, and resources are needed for the initial stages of assessment development (task development, scoring, tryouts, and field tests) to produce a rich array of assessment materials.

Recommendation 4F. In order to accomplish the committee's recommendations, NAEP's research and development agenda should emphasize the following:

development of materials (items, tasks, families of items, and scoring rubrics) that support improved assessment of current frameworks in NAEP's large-scale survey assessment,
development of targeted assessments that tap components of the current frameworks and expanded achievement domains not well assessed via large-scale survey methods,
methods for producing and presenting more in-depth interpretive information in NAEP reports to make overall results more understandable,

Page 161 Cite

Suggested Citation:"4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance." National Research Council. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: The National Academies Press. doi: 10.17226/6296.

×

minimize improper or incorrect inferences, and support the needs of users who seek information that assists them in determining what to do in response to NAEP results, and

development and implementation of sampling, scaling, and analysis models that accommodate the use of families of interdependent items in the large-scale survey assessment.