4

Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance

Summary Conclusion 4. The current assessment development process for main NAEP, from framework development through reporting, is designed to provide broad coverage of subject areas in a large-scale survey format. However, the frameworks and assessment materials do not capitalize on contemporary research, theory, and practice in ways that would support in-depth interpretations of student knowledge and understanding. Large-scale survey instruments alone cannot reflect the scope of current frameworks or of more comprehensive goals for schooling.

Summary Recommendation 4. The entire assessment development process should be guided by a coherent vision of student learning and by the kinds of inferences and conclusions about student performance that are desired in reports of NAEP results. In this assessment development process, multiple conditions need to be met: (a) NAEP frameworks and assessments should reflect subject-matter knowledge; research, theory, and practice regarding what students should understand and how they learn; and more comprehensive goals for schooling; (b) assessment instruments and scoring criteria should be designed to capture important differences in the levels and types of students' knowledge and understanding both through large-scale surveys and multiple alternative assessment methods; and (c) NAEP reports should provide descriptions of student performance that enhance the interpretation and usefulness of summary scores.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress 4 Frameworks and the Assessment Development Process: Providing More Informative Portrayals of Student Performance Summary Conclusion 4. The current assessment development process for main NAEP, from framework development through reporting, is designed to provide broad coverage of subject areas in a large-scale survey format. However, the frameworks and assessment materials do not capitalize on contemporary research, theory, and practice in ways that would support in-depth interpretations of student knowledge and understanding. Large-scale survey instruments alone cannot reflect the scope of current frameworks or of more comprehensive goals for schooling. Summary Recommendation 4. The entire assessment development process should be guided by a coherent vision of student learning and by the kinds of inferences and conclusions about student performance that are desired in reports of NAEP results. In this assessment development process, multiple conditions need to be met: (a) NAEP frameworks and assessments should reflect subject-matter knowledge; research, theory, and practice regarding what students should understand and how they learn; and more comprehensive goals for schooling; (b) assessment instruments and scoring criteria should be designed to capture important differences in the levels and types of students' knowledge and understanding both through large-scale surveys and multiple alternative assessment methods; and (c) NAEP reports should provide descriptions of student performance that enhance the interpretation and usefulness of summary scores.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress INTRODUCTION Frameworks and the assessments that are based on them are central to the entire enterprise of NAEP. The framework documents describe the knowledge and skills to be assessed in each NAEP subject area, and the assessments represent the collection of measures (items, tasks, etc.) from which inferences about student performance in the subject area will be derived. Together they form the basis for describing student achievement in NAEP. In this chapter we describe and evaluate NAEP's frameworks and the assessment development process for main NAEP. We use the term assessment development process here in a very broad sense, to describe the entire scope of activity from framework development through final assessment construction, scoring, and reporting. As background, we first provide an overview of the major steps in the development of an operational NAEP assessment, using the development of the 1996 NAEP science assessment for illustration. We then examine the conclusions and recommendations of previous evaluation panels most pertinent to our subsequent discussion. Our evaluation of NAEP's frameworks and assessment development process follows; in this discussion we make arguments for: determining the kinds of inferences and conclusions about student performances that are desired in reports of NAEP results, and then using this vision of student achievement to guide the entire assessment development process improving assessment of the subject areas as described in current frameworks and including an expanded conceptualization of student achievement in future frameworks and assessments using multiple assessment methods, in addition to large-scale surveys, to improve the match of assessment purpose with assessment method enhancing use of assessment results, particularly student responses to constructed-response items, performance-based tasks, and other alternative assessment methods, to provide interpretive information that aids in understanding overall NAEP results, and improving coherence across the many steps in the assessment development process as an essential prerequisite to successfully accomplishing goals 1 through 4 In Chapter 1 we described the importance of enhancing NAEP's interpretive function by integrating its measures of student achievement with a larger system of indicators for assessing educational progress. This would provide an essential context for better understanding NAEP's achievement results in a given subject area. The focus in that discussion was on the collection and integration of data on relevant student-, school-, and system-level variables in ways that can elucidate student achievement and answer questions about ''why the results are what they are.''

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress In this chapter we discuss the analysis of students' responses to assessment items and tasks as another strategy for enhancing NAEP's interpretive function. By capitalizing on the currently unexploited sources of rich information contained in student responses (and patterns of responses), we describe how NAEP could answer questions about what students know and can do at a level of detail not currently reflected by summary scores. This type of interpretive information, gleaned from students' responses, provides insights about the nature of students' understanding in the subject areas. When combined with the broader-scale interpretive information that emerges from the coordinated system of indicators described in Chapter 1, qualitative and quantitative summaries of student achievement can help educators and policy makers begin to answer the key question that is asked when achievement results are released: "What should we do in response to these results?" OVERVIEW OF NAEP'S CURRENT ASSESSMENT DEVELOPMENT PROCESS When this committee began its evaluation in spring 1996, the 1996 main NAEP science assessment was the focus, largely because the science achievement-level-setting process was undertaken concurrently with the term of this evaluation and because the science assessment included an unprecedented number and variety of constructed-response items and hands-on tasks. However, because each NAEP subject area has unique features, in terms of the content and structure of the domain and the methods used to assess the domain, it was necessary and useful to consider other NAEP subject-area assessments as well. Thus, although our evaluation maintains an emphasis on the 1996 science assessment, we have also considered NAEP's mathematics and reading assessments in some depth, since these subject areas are among the most important to educators and policy makers. Simultaneous consideration of science, mathematics, and reading also permits attention to issues that cut across subject areas, as well as those that are subject-specific. The development of NAEP's frameworks and assessments is a complex multistep process. For any given subject area, the entire sequence of activities—from framework development, through assessment development and administration, to the reporting of initial results—spans approximately five years, barring funding interruptions or other changes in scheduling. An overview of the sequence of activities in the framework and assessment development process, based on the 1996 science assessment, is portrayed in Figure 4-1. The impressive effort that is mounted by the National Assessment Governing Board (NAGB), the National Center for Education Statistics (NCES), and their subcontractors each time a NAEP assessment is developed and administered is often looked to as a model for framework and assessment development by states, districts, and other developers of large-scale assessments.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress FIGURE 4-1 A generalized overview of NAEP's assessment development process.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress Under NAGB's auspices, frameworks for the main NAEP assessments are developed by a planning committee (primarily subject-area experts —teachers, curriculum specialists, and disciplinary researchers) and a steering committee (a broad group of education administrators, policy makers, and subject-area experts) through a unique, broad-based consensus process. Through this consensus process, the planning and steering committee members reach a level of agreement about the subject-area knowledge and skills students should know and be able to do. Although there is never complete agreement among committee members about the scope and content of the frameworks, in general the outcome of the consensus process has been that the framework strikes a balance between reflecting current practice and responding to current reform recommendations. Most NAEP frameworks specify that the subject-area assessments be constructed around two or more dimensions. In science, two major dimensions are "fields of science" and "ways of knowing and doing," which are supplemented by two underlying dimensions, ''nature of science'' and "themes." In reading, the major dimensions are "reading stance" and "reading purpose"; in mathematics, two primary dimensions, "content" and "mathematical abilities," are supplemented with a dimension designated "mathematical power." For each dimension, the frameworks also describe the proportions and types of items and tasks that should appear on the final version of the NAEP assessments. (See Figures 4-2, 4-3, and 4-4 for diagrammatic representations of the current main NAEP frameworks in science, reading, and mathematics.) Following the development of the framework, test and item specifications are generated, also under the auspices of NAGB. These specifications, which provide a detailed blueprint for assessment development, are typically developed by a small subgroup of the individuals involved in the development of the framework, along with a subcontractor with experience in the development of specifications for large-scale assessments. The framework and specifications documents thus serve as guides for the development of assessment materials in each subject area. Item development and field-test administration and scoring are currently carried out by staff at the Educational Testing Service (ETS—under contract to NCES) in consultation with an assessment development committee of subject-area experts, some of whom have been involved in the development of the framework. Items and draft scoring rubrics are developed by the committee, ETS staff, and external item writers identified by ETS and by the committee. Items are developed to include a mix of multiple-choice and a variety of constructed-response items and performance tasks as specified in the framework and specifications. ETS staff and assessment development committee members review and edit all assessment materials, which are also reviewed for potential sources of bias. When time has permitted, some of the more complex performance-based items have been piloted with two to three classes, and students have been interviewed about the items and

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress FIGURE 4-2 The 1996 main NAEP science framework matrix. NOTE: Nature of Science: the historical development of science and technology, and the habits of mind that characterize these fields, and the methods of inquiry and problem solving. Themes: the "big ideas" of science that transcend scientific disciplines and induce students to consider problems with global implications. SOURCE: National Assessment Governing Board (no date, d:13). their responses to the items. It has not, however, been universal practice to pilot items before formal field testing. Field tests are administered to samples of students by WESTAT and scored by National Computer Systems (NCS). ETS staff and development committee members participate in the selection of items for the final version of the assessment and the revision of scoring rubrics based on the initial wave of incoming student responses. Constructed-response items are then scored by trained readers. ETS documents state that items or sets of items (in the case of reading passages or hands-on science tasks) are selected for the final assessment based on their fit with the framework, their fit with preliminary achievement-level descriptions, and their general statistical properties (e.g., level of difficulty, item-test correlations). Final assessment forms are again reviewed by the assessment development committee prior to administration by WESTAT to a nationally representative sample of students (generally a year after the field test was administered). Scoring is once again managed by NCS, with ETS staff and the assessment development committee overseeing any necessary revisions of the scoring guides prior to scoring by the trained readers.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress FIGURE 4-3 The 1992-1998 main NAEP reading framework matrix. SOURCE: National Assessment Governing Board (no date, b:16-17).

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress FIGURE 4-4 The 1996 main NAEP mathematics framework matrix. NOTE: Mathematical Power: consists of mathematical abilities within the broader context of reasoning and with connections across the broad scope of mathematical content and thinking. Communication is both a unifying threat and a way for students to provide meaningful responses to tasks. SOURCE: National Assessment Governing Board (no date, a:11). Subsequent analysis of the results and production of the initial report (known as the Report Card) leads to the release of overall summary score results approximately 12 to 18 months after the administration of the assessment. Achievement-level setting and the release of achievement-level results also occur within the same time period, since it is NAGB's goal to include these results in the initial report. Following the release of initial summary score and achievement-level results, a series of follow-up reports that provide univariate analyses of student achievement in relation to contextual variables are released, and public-use NAEP datasets are made available to those who have site licenses. NAGB's current plans call for NAEP final assessments to be readministered periodically (at 4-year intervals for reading, writing, mathematics, and science; see Table I-1). Because some assessment materials are released to the public after each administration of a final assessment, a new round of item development and field testing is conducted to replace those materials. The new materials and the revised final assessment are intended to reflect the goals of the original framework and specifications. Thus, the same framework serves as the basis for a series of assessments over time.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress SELECTED FINDINGS FROM PREVIOUS NAEP EVALUATIONS Our examination of NAEP's frameworks and the assessment development process has benefited greatly from the previous evaluations conducted by the National Academy of Education (NAE) and from a range of design initiatives and validity studies conducted by NAGB and NCES themselves. The NAE evaluations were mandated in NAEP's authorizing legislation and focused on the quality, validity, and utility of the NAEP assessments that were included as part of the trial state assessment program between 1990 and 1994 (the 1990 and 1992 mathematics assessments and the 1992 and 1994 reading assessments). Several major areas of observation and evaluation from the NAE studies are integral to discussions we present later in this chapter. Framework Consistency with Disciplinary Goals In general, the NAE panel found the NAEP frameworks for the 1990 and 1992 mathematics assessments and the 1992 and 1994 reading assessments to be reasonably well balanced with respect to current disciplinary reform efforts and common classroom practices in reading and mathematics. In reading, the panel concluded that the framework and the assessments were consistent with current reading research and practice, incorporating innovations in assessment technology such as interesting and authentic reading passages, longer testing time per passage, and a high proportion of constructed-response items (National Academy of Education, 1996:9). However, in their evaluation of the 1994 reading assessment, the panel contended that there were important aspects of reading not captured in the current reading framework, most notably differences in students' prior knowledge about the topic of their reading and contextual factors associated with differences in students' background, experiences, and interests (DeStefano et al., 1997). In mathematics, the panel concluded that the 1990 frameworks and assessments reflected much of the intent of the Curriculum and Evaluation Standards for School Mathematics of the National Council of Teachers of Mathematics (1989) and that appropriate steps were taken to bring the 1992 assessment materials even more in line with those widely accepted standards. They did recommend, however, that the current content-by-process matrix, which requires items to be classified in a single content category and a single process category, be replaced with a model that better represents the integrated nature of mathematical thinking (National Academy of Education, 1992:20, 1993:69). Fit of Items to Frameworks and Specifications Analyses conducted for the NAE panel show that for the 1990 and 1992 mathematics assessments, the fit of the items to major dimensions of the framework

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress was reasonable, particularly in the content categories. When a group of mathematics experts classified the items in the 1990 grade 8 mathematics assessment on the basis of the content and mathematical ability categories specified in the framework (see Figure 4-4), their classifications matched NAEP's classifications in content areas for 90 percent of the items, and they matched mathematical ability category classifications for 69 percent of the items (Silver et al., 1992). Nearly identical results were obtained when a similar study was conducted using the 1992 grade 4 items (Silver and Kenney, 1994). The lower congruence of classifications in the mathematical ability categories was judged to result from the fact that many items appeared to tap skills from more than one ability, making the classification of items into a single ability category a difficult task. For the 1992 grade 4 reading assessment, a group of reading experts judged the item distribution across "reading purposes" to be a reasonable approximation of the goals specified in the framework, but they noted that the assessment was lacking in items that adequately measured the personal response and critical stance categories of the "reading stance" dimension (Pearson and DeStefano, 1994). The panel reiterated the lack of clarity in the stance dimension following the evaluation of the 1994 reading assessment (DeStefano et al., 1997), positing that the assessment of this dimension, as currently carried out, added little to the interpretive value of NAEP results. Use of Constructed-Response and Other Performance-Based Items Across the assessments that it evaluated, the NAE panel repeatedly applauded NAEP's continued move to include increasing numbers and variations of constructed-response and other performance-based item types, and it encouraged further development and inclusion of such items as mechanisms for assessing aspects of the framework not easily measurable through more constrained item formats. They also recommended that special studies should be used to assess aspects of the frameworks not easily captured in the range of item types administered in a large-scale survey assessment format (National Academy of Education, 1992:28-29, 1993:69-72, 1996:25-28). Continuity Across Framework and Assessment Development Activities Recognizing the complex, multistep nature of the NAEP assessment development process, the NAE panel recommended that mechanisms be implemented to ensure continuity throughout the process. The panel suggested that the mechanism could be a set of subject-specific oversight committees that monitor all steps of the process, from framework development to reporting, in order to ensure that the intentions of the framework developers were reflected in the assessment materials and in reports of NAEP results (National Academy of Education, 1992:30).

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress Time Allotted for Assessment Development The NAE panel repeatedly noted the severe time constraints placed on the NAEP assessment development process, observing that "due to short authorization and funding cycles on one hand and time-consuming federal clearance procedures on the other, the actual development of the frameworks and assessment tasks has been squeezed into unconscionably short time frames" (National Academy of Education, 1996:27). The panel noted that such time constraints are antithetical to the iterative design and development processes required to develop innovative assessment tasks that measure aspects of student achievement not well measured through more constrained item formats. A Broader Definition of Achievement In their fifth and final evaluation report, Assessment in Transition: Monitoring the Nation's Educational Progress (National Academy of Education, 1997), the NAE panel provided arguments for the reconceptualization of the NAEP assessment domains to include aspects of achievement not well specified in the current frameworks or well measured in the current assessments. They recommended that particular attention be given to such aspects of student cognition as problem representation, the use of strategies and self-regulatory skills, and the formulation of explanations and interpretations. The NAE panel contended that consideration of these aspects of student achievement is necessary for NAEP to provide a complete and accurate assessment of achievement in a subject area. THE COMMITTEE'S EVALUATION Our evaluation of NAEP's frameworks and the assessment development process is organized around four topics: (1) an examination of the existing frameworks and assessment development process for main NAEP, (2) an argument for a broader conceptualization of student achievement in future NAEP frameworks and assessments, (3) a recommendation for the use of a multiple-methods strategy in the design of future NAEP assessments, and (4) a discussion of the types of portrayals of student achievement that can enable NAEP to better meet its interpretive function. Two underlying themes regarding the assessment development process emerged during the course of our evaluation. These serve as a foundation for the discussion in this chapter and are central to the successful implementation of the process improvements we recommend. First, we contend that the entire assessment development process must be guided by a clear understanding of the kinds of inferences and conclusions about student achievement that one wants to find in reports of NAEP results. For

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress and in the development of reading and mathematics items for the proposed voluntary national test (National Research Council, 1999b). It will also be important to include individuals with a broader range of expertise in assessment development activities than has previously been the case. Disciplinary specialists who conduct research about student learning and cognition as well as cognitive and developmental psychologists must be represented on committees that develop the frameworks and the assessment materials if implementation of the strategies we have recommended is to be accomplished. In addition to an exemplary design team, a successful development process relies on iteratively updating frameworks and conceptions of student thinking based on research and practice. Indeed, if, as we envision, NAEP is but one component of a larger system of data collections for assessing educational progress, then the range of contextual, interpretive information gained from this system could inform the development of the next generation of frameworks and assessments in new paradigm NAEP. Progress in the areas described above will not be easy to achieve and implementation of a multiple-methods NAEP will be incremental and evolutionary. For example, we anticipate that, largely for reasons of cost, multiple-methods NAEP would initially only be conducted as part of national NAEP, with the most feasible and informative components carried over to state NAEP administrations on a gradual, selected basis. However, despite the challenges posed by costs and funding reallocations, the need for an expanded research base, and the need to change assessment development models, the alternative is an unacceptable status quo—a NAEP that measures only those aspects of student achievement that can be assessed through a single, "drop-in-from-the-sky" large-scale survey and leaves other parts of the framework unaddressed. That alternative relegates NAEP to the role of an incomplete indicator of student achievement. Portraying Student Achievement in NAEP Reports Implementation of the committee's recommendations—to improve the translation of the goals of current frameworks into assessment materials and to evolve the frameworks to encompass broader conceptualizations of student achievement—would enable NAEP to produce broader and more meaningful descriptive information, both quantitative and qualitative. At a minimum, it would lead to an improved understanding of the current NAEP summary score results and, if capitalized on appropriately, would provide a much more useful picture of what it means to achieve in each subject area. This information would support the desires of NAEP's users for the enhanced interpretive function of NAEP discussed in Chapter 1. In this section, we further evaluate NAEP's current methods for portraying student achievement and describe how, even prior to the full implementation of the recommendations presented in this chapter, NAEP could improve the breadth and depth of how student achievement is portrayed.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress NAEP's Current Portrayals of Student Achievement A primary means by which NAEP currently describes student achievement is through summary scale scores, expressed on a proficiency scale from 0 to 300, 0 to 400, or 0 to 500. Summary scores (i.e., mean proficiencies) are reported for the overall national sample at each grade (4, 8, and 12) and for major demographic subgroups. In NAEP's 1996 mathematics and science Report Cards, the subgroups for which scale scores were reported were geographic regions, gender, race/ethnicity, level of parents' education, type of school, and socioeconomic level as indicated by a school's Title 1 participation and by free/reduced-price lunch eligibility (O'Sullivan et al., 1997; Reese et al., 1997). In previous Report Cards and in various follow-up reports, summary scores have been presented for additional subgroups (e.g., amount of television watching, time spent on homework). However, reporting by these types of variables in the Report Cards was recently abandoned by NAEP in an effort to streamline the reports, and because such stand-alone portrayals of student proficiency have been criticized for leading users to make inappropriate causal inferences about the effect of these single variables on student achievement. This latter concern notwithstanding, in addition to the Report Cards, NAEP also produces a variety of briefer follow-up reports, which are generally released 12 to 30 months after the release of the Report Cards. These reports provide the results of univariate analyses in which mean proficiency scores are presented as a function of variables presumed to be related to achievement (i.e., summary scores in reading as a function of number and types of literacy materials in the home; summary scores in history as a function of amount of time spent discussing studies at home each day). Another important means of reporting NAEP results is by the percentage of students performing at or above NAEP's basic, proficient, and advanced achievement levels. Achievement-level setting and the reporting of achievement-level results are discussed in Chapter 5. Toward More Informative Descriptions of Student Achievement In Chapter 1 we concluded that scores that summarize performance across items are, in general, reasonable and effective means for NAEP to fulfill the descriptive function of a social indicator. They provide a broad-brush view of the status of student achievement (albeit a more limited definition of achievement than we advocate) and do so in a way that can, when necessary, attract the attention of the public, educators, and policy makers to the results. However, summary scores should not be viewed as the only type of information needed to understand and interpret student achievement. In NAEP, we have argued that they represent performance on only a portion of the domain described in the frameworks, and thus they provide a somewhat simplistic view of educational

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress achievement. On their own, do they not allow NAEP to adequately fulfill one of the interpretive functions of a social indicator—that is, they do not provide information that helps NAEP's users to think about what to do in response to NAEP results. More in-depth descriptive portrayals of student achievement are needed for this function to be fulfilled. For example, much of the current debate regarding curriculum reform focuses on what should be taught, and decisions about what to teach are not entirely the province of curriculum developers and teachers. Policy decisions are made about content coverage and emphasis at state levels. NAEP could and should provide information that would assist those who make these decisions beyond simply portraying subject-area achievement as "better than it was four years ago" or "worse in one region of the country than in another." If one is faced with making a decision whether to shift emphasis in a state mathematics curriculum framework to focus on computational skills, as has recently been the case in California, it would be useful to have specific information about students' achievement in computational skills and how it relates to their understanding of underlying concepts and their ability to apply their skills to solve problems. A single score tells very little about where students' strengths and weaknesses are, nor does it help improve student achievement, whereas a more descriptive analysis of student achievement could provide guidelines for curriculum decisions. How can NAEP provide the kinds of information about student achievement that is needed to help the public, decision makers, and education professionals understand strengths and weaknesses in student performance and make informed decisions about education? The new paradigm NAEP that we recommend, in which assessment method is optimally matched with the assessment purpose (and the kinds of inferences to be drawn), has great potential to provide an impressive array of information from which such portrayals could be constructed. This entails a shift to more qualitative measures of student achievement, with an emphasis on describing critical features of student knowledge and understanding. In order to make progress in this direction in the short term, the following initial guidelines should be implemented: Scoring rubrics for constructed-response items and tasks (whether included as part of the large-scale survey assessments of core NAEP or in multiple-methods NAEP) should be constructed to describe critical differences in levels and types of student understanding; for example, rubrics should not be constructed simply to capture easily quantifiable differences in numbers of correct examples given or reasons cited. Thus scale scores generated from the accumulation of student responses would be more valid reflections of the intent of both current and envisioned frameworks. Scoring rubrics for constructed-response items and tasks should allow for the accumulation of information about more than one aspect of a student's performance. Although current scaling and analysis methodologies may not

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress enable all such information to be reflected in summary scores, information gleaned from student responses can be used to provide informative and useful descriptions of achievement. Assessment instruments should include families of related items, designed to support inferences about the levels of student understanding in particular portions of the frameworks. Analysis of patterns of student responses across these items can reflect the knowledge structure that underlies students' conceptual understanding, providing a richer interpretive context for understanding overall achievement results. In such a scenario, families of items serve as the unit of analysis; that is, each item is not simply a discrete source of information unconnected to other items. If we presume that these responses also contribute to summary scores, then this has implications for scaling —and appropriate modifications to existing scaling methodology would need to be explored and implemented. Finally, in an ideal situation, the reporting of information that provides an interpretive context for understanding patterns of achievement results would be released along with the Report Card that presents summary score results for the nation and major subgroups. However, given the current pressures to release summary results on an accelerated schedule, providing interpretive analyses in the Report Cards may not be feasible, at least in the short term. NAEP's current type of univariate interpretive follow-up reports represents a first-order type of interpretive reporting. We envision much more in-depth analyses, such as those described in the example in the following section. This level of analysis undoubtedly will present challenges to NAEP's time frames for reporting, which have been focused on presenting summary score results as shortly as possible after the administration of the assessment. Nevertheless, reports that provide interpretive context should be released by NCES as quickly as possible after the release of Report Cards, accompanied by the same kinds of high profile press conferences and press release packets that are used for the release of reports of national and state summary results. Although timely reporting of summary score results is a necessary and laudable goal, when these results are released in the absence of information that provides an interpretive context for helping users understand results, then the value of NAEP as an indicator is much diminished. A Successful First Step: NCTM's Interpretive Reports A multiple-methods NAEP has the potential to provide an array of in-depth information about achievement in NAEP disciplines; still, it is a relatively easy task to glean more detailed information from the current assessments than presently occurs. Examination of data (particularly students' responses to constructed-response items) from the current assessments provides a basis for profiling student knowledge. For example, it is possible to analyze students' specific errors, examine the quality of their explanations, and interpret overall performance

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress on relevant clusters of items in ways that characterize what students can and cannot do. Since the first mathematics assessment, the National Council of Teachers of Mathematics has written interpretive reports based on the analysis of students' responses to individual NAEP items. These reports, supported by funding external to NAEP, characterize student performance at different levels of detail appropriate for different audiences. For example, the most recent monograph, reporting on the sixth NAEP mathematics assessment, administered in 1992, includes an analysis of students' understanding of basic number concepts and properties, their computational skills, and their ability to apply number concepts and skills to solving problems, based on examinations of items that assess these skills and concepts (Kenney and Silver, 1997). The report includes data across approximately 100 individual NAEP items. For some items, responses are analyzed in some detail; for others, p-values are reported. The reports, however, go well beyond cataloging the results for individual items. Patterns of responses and errors are analyzed to draw conclusions about student performance on specific topics. For example, the authors of the 1996 report concluded (Kenney and Silver, 1997:137-138): [S]tudents at all three grade levels appear to have an understanding of place value, rounding and number theory concepts for whole and rational numbers in familiar, straightforward contexts. Students' understanding improves across grade levels but falls when the contexts are unfamiliar or complex. Students at all three grade levels perform well on addition and subtraction word problems with whole and rational numbers that are set in familiar contexts and only involve one step calculation. … [S]ome students at all three grade levels attempt to solve multistep problems as though they involved single-step procedures. … The most troubling results were the low performance levels associated with students' ability to justify or explain their answers to regular and extended, constructed-response items. The NCTM interpretive teams have consistently documented that the most critical deficiency in students' learning of mathematics at all ages is their inability to apply the skills that they have learned to solve problems. This conclusion is consistently supported by the fine-grained analysis of student performance in virtually every content area of the mathematics framework. The analyses also provide a perspective on relations between skill acquisition and the development of understanding of fundamental concepts. These conclusions, based on interpretive analyses of students' responses, address issues that are at the core of public debate regarding curriculum choices. NAEP should help inform this debate and provide a basis for more informed policy decisions by integrating these types of analyses and reports into plans for assessments in all NAEP subject areas. A good step in this direction was NAEP's establishment of collaborative

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress relationship with arts organizations to develop reports and dissemination strategies for the 1997 NAEP arts assessment. The collaboration with NCTM to conduct and report the results of interpretive analyses should be continued, and similar collaborations with organizations in NAEP's other subject areas should be established (e.g., the National Council of Teachers of English, the International Reading Association, the National Science Teachers Association, the National Council for Social Studies). Although the NCTM interpretive teams have learned a great deal by analyzing student performance, the NAEP mathematics assessment is not specifically designed to support these kinds of within-and across-item analyses. Much could be improved in the structure of NAEP items and rubrics to better capture students' understanding in mathematics. Because the response data are not accumulated in ways that facilitate these analyses (Kenney and Silver, 1997), the interpretations are less explicit than they might be if the assessment were specifically designed to support them. The conclusions identify both specific and general areas of student weakness, but it is not possible to aggregate data to provide specific percentages of students who demonstrated understanding of core concepts or proficiency in essential skills or who meet benchmark criteria for applying concepts and skills to solve problems, because the assessments were not designed to include sets of items that ensured that this sort of analysis and reporting would be possible. The NCTM reports provide an example of the educationally useful and policy-relevant information that can be gleaned from students' responses in the current assessments, and they point toward the even more useful information that could be provided if assessments were developed with these analyses in mind. A first step in this assessment development strategy—the development of families of items for use in large-scale assessments—is discussed next. Recommended Next Step: Developing Item Families The notion of item families in NAEP was first articulated in the framework for the 1996 main NAEP mathematics assessment. However, an analysis conducted by Patricia Kenney for this committee showed that the sets of items included in the 1996 mathematics assessment exhibited few of the characteristics of either of the two kinds of families of items described in the framework (Kenney, 1999). The framework describes two types of item families: a vertical family and a horizontal family. A vertical family includes items or tasks that measure students' understanding of a single important mathematics concept within a content strand (e.g., numerical patterns in algebra) but at different levels, such as providing a definition, applying the concept in both familiar and novel settings, and generalizing knowledge about the concept to represent a new level of understanding. A horizontal family of items involves the assessment of students' understanding of a concept or principle across the various content strands in the NAEP program within a grade level or across grade levels. For example, the

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress concept of proportionality can be assessed in a variety of contexts, such as number properties and operations, measurement, geometry, probability, and algebra. The framework also suggested that a family of items could be related through a common context that serves as a rich problem setting for the items. In the volume of research papers that accompanies this report, Minstrell (1999) and Kenney (1999) describe strategies for developing families of items for use in future NAEP large-scale assessments of science and mathematics. One such item family in mathematics and the rationale underlying its construction is presented in Appendix C. This family of items assesses the progression of grade 4 students' understanding of numerical patterns; it was constructed using a combination of items from the 1996 main NAEP assessment, supplemented with new items that together form a coherent family. This example illustrates one way in which improved interpretations of students' achievements can be generated by making relatively modest changes to NAEP's current assessment development strategy. We close this section by reiterating one of the chapter's underlying themes: frameworks and assessments must be designed with goals for reporting as a guide. We urge the implementation of a strategy for reporting NAEP results in which reports of summary scores are accompanied by, or at the very least quickly followed by, interpretive reports produced by disciplinary specialists and based on analyses of patterns of students responses across families of items as well as across multiple assessment methodologies. A VISION FOR ASSESSMENT DEVELOPMENT IN NAEP The goals that we have set forth in this chapter are ambitious. They are very challenging from the standpoints of assessment development and assessment administration and operations. These goals—improving the assessment of more complex aspects of the current frameworks, expanding the conceptualization of NAEP's dimensions of achievement; implementing a multiple-methods design, and extracting and reporting more in-depth interpretive information from students' responses— may even seem overwhelming. However, each is critical if an already respected program is to better fulfill its mission of assessing academic achievement and be well positioned to meet the information demands of its users in the next century. If these goals are implemented, what would be accomplished? What would the new paradigm NAEP look like? How would it differ from what exists now? If the recommendations presented in this chapter were implemented, NAEP would be characterized by: an assessment development process that is guided by a vision of the kinds of inferences and conclusions about student achievement to be described in reports of NAEP results,

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress an assessment design in which assessment purpose is aligned with assessment method, core NAEP subjects that are assessed using the current large-scale survey (for measurement of trends) and whatever multiple methods are best suited to assess aspects of the framework not well assessed through large-scale surveys, nontrend subjects assessed using whatever combination of surveys and alternative assessment methods is best suited to meet the goals described in the subject area's framework, an array of alternative assessment methods to assess the broader conceptualizations of achievement that are included in future NAEP frameworks, and subject-specific reports of achievement results that include in-depth portrayals of student achievement gleaned from the entire array of methods used to assess a subject area; in core subjects, such reports ideally would also include summary proficiency scores from large-scale assessments and results from achievement level setting. In Figure 4-6 we present a further-developed view of new paradigm NAEP and other measures of student achievement within the coordinated system of educational indicators that we proposed in Chapter 1. FIGURE 4-6 Measures of student achievement, including new paradigm NAEP. NOTE: TIMSS = Third International Mathematics and Science Study; NELS = National Education Longitudinal Study; ECLS = Early Childhood Longitudinal Study.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress MAJOR CONCLUSIONS AND RECOMMENDATIONS Conclusions Conclusion 4A. The current development of NAEP frameworks and assessments is not guided by a clear vision of the kinds of inferences to be drawn from the results. These frameworks and assessments support neither the reporting of achievement levels nor in-depth interpretations of student performance. Conclusion 4B. There are many complex steps between framework development and reporting, and the intentions of the framework developers are often lost in this sequence of activities. Although NAEP has made progress in improving continuity from one step to another, attending to the lack of coherence across steps is still a challenge. Conclusion 4C. Currently, NAEP focuses on the assessment of subject-area knowledge and skills but does not adequately capitalize on contemporary research, theory, and practice in ways that would support in-depth interpretations of student knowledge and understanding. Conclusion 4D. Measuring student achievement only through NAEP's current large-scale survey precludes adequate assessment of (1) the more cognitively complex portions of the domains described in the current frameworks and (2) expanded domains represented by conceptions of achievement that are responsive to the changing demands of society. Conclusion 4E. NAEP's current reporting metrics fail to capitalize on interpretive information that can be derived from responses to individual items or sets of items. Conclusion 4F. Insufficient time is allotted to assessment development, which restricts activities needed for developing the kinds of materials that support more interpretive analyses and more informative reporting.

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress Recommendations Recommendation 4A. The inferences to be made about student performance in NAEP reports should guide the development of NAEP frameworks. These inferential goals should also guide a coherent set of assessment development activities. Recommendation 4B. NAEP's frameworks and assessments should capitalize on research, theory, and practice about student learning in the content domains to guide (1) the development of items, tasks, scoring rubrics, and assessment designs that better assess the more complex aspects of the content domains and (2) the development of integrated families of items that support in-depth interpretations of student knowledge and understanding. Recommendation 4C. NAEP needs to include carefully designed targeted assessments to assess the kinds of student achievement that cannot be measured well by large-scale assessments or are not reflected in subject-area frameworks. Recommendation 4D. NAEP reports should provide interpretive information, derived from analyses of patterns of students' responses to families of related items, in conjunction with the overall achievement results. Recommendation 4E. More time, attention, and resources are needed for the initial stages of assessment development (task development, scoring, tryouts, and field tests) to produce a rich array of assessment materials. Recommendation 4F. In order to accomplish the committee's recommendations, NAEP's research and development agenda should emphasize the following: development of materials (items, tasks, families of items, and scoring rubrics) that support improved assessment of current frameworks in NAEP's large-scale survey assessment, development of targeted assessments that tap components of the current frameworks and expanded achievement domains not well assessed via large-scale survey methods, methods for producing and presenting more in-depth interpretive information in NAEP reports to make overall results more understandable,

OCR for page 114
GRADING THE NATION'S REPORT CARD: Evaluating NAEP and Transforming the Assessment of Educational Progress minimize improper or incorrect inferences, and support the needs of users who seek information that assists them in determining what to do in response to NAEP results, and development and implementation of sampling, scaling, and analysis models that accommodate the use of families of interdependent items in the large-scale survey assessment.