Skip to main content

Currently Skimming:

4 Reliability of the Achievement Levels
Pages 79-100

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 79...
... When evaluating any standard setting, it is important to separate the standard setting process used to establish cut scores from the achievement-level descriptors used to provide meaning to those cut scores. Even though the two are intertwined, it is possible to set cut scores through procedures that meet current criteria for best practices and yet create ALDs that are not aligned with those cut scores, and vice versa.
From page 80...
... 2. Description of Mathematics Achievement Levels-Setting Process and Proposed Achievement-Level Descriptions: 1992 National Assessment of Educational Progress, Volumes 1 and 2 (ACT, Inc., 1993a, 1993b)
From page 81...
... Setting Performance Standards for Student Achievement: A Report of the National Academy of Education Panel on the Evaluation of the NAEP Trial State Assessment: An Evaluation of the 1992 Achievement Levels (National Academy of Education, 1993a)
From page 82...
... First, however, we consider the nature of the judgment task that panelists perform with the modified Angoff procedure and the resulting data. The Judgment Task For the modified Angoff procedure that was used in the NAEP standard setting for dichotomous items, panelists make judgments about performance at the borderline of each achievement level: that is, at the borderline of Basic and Below Basic, the borderline of Proficient and
From page 83...
... In addition, panelists usually receive feedback called "consequence data," which shows the percentages of students that would be placed into each achievement level based on the cut scores for that round. Panelists then work separately to review their probability estimates for each item and make changes as needed.
From page 84...
... The ACT reports we considered used several different statistics to estimate interpanelist consistency, which we determined were statistically robust. However, they were generated through procedures that are very technically complex and difficult to explain to nonexperts, and the statistics were similarly reported in metrics that are also difficult to explain to nonexperts.7 To make this information more accessible, the NAEd evaluators, in reporting interpanelist consistency statistics, converted the ACT statistics to the NAEP scale score metric.
From page 85...
... Reading Table 4-4 shows the average cut scores for reading for each achievement level and grade at the end of each of the three rounds, and Table 4-5 shows the associated standard deviations. The cut scores varied across rounds slightly more than they did for mathematics.
From page 86...
... TABLE 4-2  Mathematics Achievement-Level Cut-Score Standard Deviations by Round Grade 4 Grade 8 Grade 12 Round 1 2 3 1 2 3 1 2 3 Basic 21.0 21.8 18.5 28.8 19.1 16.5 25.7 19.0 18.4 Proficient 16.9 16.9 14.7 20.7 17.5 16.6 14.0 14.1 13.9 Advanced 18.3 15.9 13.2 19.5 15.9 17.0 11.6 12.4 12.7 NOTE: The standard deviations are for the means in Table 4-1. See text for discussion.
From page 87...
... Overall, the standard deviations ranged from 7.3 to 17.1 and thus represented between 18.3 and 42.8 percent of the standard deviation for the test. Table 4-6 shows the consequence data for reading, that is, the percentage of students scoring at or above each of the recommended cut scores.
From page 88...
... TABLE 4-5  Reading Achievement-Level Cut-Score Standard Deviations by Grade and Round Grade 4 Grade 8 Grade 12 Round 1 2 3 1 2 3 1 2 3 Basic 19.6 12.9  9.40 15.5 14.7 17.1 13.5 12.7 12.8 Proficient 13.9  9.30  7.30 13.5 13.8 15.1  9.30  8.10 11.2 Advanced 17.5 15.5 14.2 13.9 11.8 14.7 24.2 11.5 16.3 NOTE: The standard deviations are for the means shown in Table 4-4. See text for discussion.
From page 89...
... This study focused on comparing panelists' ratings and response probabilities.8 The ACT researchers sorted the mathematics item pool into nine sets of items by grade and achievement level, using a response probability of 0.65 and a set of specific decisions rules. If an item had a mean rating greater than 0.65 at the Basic level, it was classified as Basic.
From page 90...
... for each item at the borderline of the cut score for the achievement level. They used a decision rule related to a response probability value of 0.65: if the student's response probability for an item classified at a particular achievement level was 0.65 or higher at the cut score for the level, then the judges' rating and the expected student performances matched.
From page 91...
... . They compared the matches across achievement levels and grades and concluded that the "cutpoints on the NAEP score scale are consistent with the conceptualization of the achievement levels incorporated in the ALDs [achievement-level descriptions]
From page 92...
... . In reading, the researchers conducted an additional analysis that examined the consistency of cut scores across reading purpose (literary experience, practical, and informational)
From page 93...
... The modified Angoff cut-score setting method was used for dichotomously scored items; a procedure called the boundary exemplars method was used for extended-response items: the cut scores for all three achievement levels were higher when based solely on the polytomous items than when based solely on the dichotomous items.9 In the ACT reports, the results of these analyses were based on generalizability theory and analysis-of-variance techniques and as such, they are unnecessarily complex for the purposes of this report. Instead, we draw from the analyses reported in the NAEd reports, which are reported in the NAEP scale score metric.
From page 94...
... For this analysis, cut scores were compared for TABLE 4-10  Mathematics Achievement-Level Cut Scores by Level and Item Type: Dichotomously Scored or Extended Response Grade 4 Grade 8 Grade 12 Scale % at or Scale % at or Scale % at or Level and Item Type Score Above Score Above Score Above Basic Dichotomous 210.4 61.0 255.6 63.0 288.9 61.0 Extended response 266.5 6.0 302.7 17.0 344.3 9.0 Proficient Dichotomous 250.0 16.0 297.5 21.0 333.9 16.0 Extended response 304.4 0.2 345.5 1.0 374.0 1.0 Advanced Dichotomous 282.7 2.0 333.3 3.0 366.2 2.0 Extended response 330.8 0.01 376.8 0.03 388.0 0.2 NOTES: Cut scores for the dichotomously scored items were set with the Angoff procedure. Cut scores for the extended-response items were set with the boundary exemplars procedure.
From page 95...
... That is, there was a difference of 32 points in the percentage of students scoring at or above the Basic level. However, for some levels and grades, identical or nearly identical cut scores resulted: for example, see the 4th and 8th grades for the Proficient level.
From page 96...
... This difference was the subject of many discussions by the Technical Ad visory Committee on Standard Setting, ACT's Technical Advisory Team, NAGB's Achievement Levels Committee, and the project staff of both ACT and NAGB. Several plausible hypotheses have been put forward and many analyses have been conducted.
From page 97...
... The analyses that can be conducted with the present data cannot produce conclusive results to determine why. The NAEd evaluators suggested that NAGB not report results using achievement levels until researchers could offer an explanation for the results (Shepard et al., 1993)
From page 98...
... The two pool halves were constituted to be as equivalent as possible. Table 4-14 shows the mean of the two sets of cut scores, as well as their standard errors.10 All the values are expressed in the NAEP score scale units.11 In assessing the magnitude of these standard errors, ACT compared them with the standard deviations for the tests.
From page 99...
... . This range means that if the standard setting was replicated numerous times, 68 percent of the time the cut score would be in the 210-213 range; 95 percent of the time it would be in the 207-215 range (211 plus or minus (1.87 × 2)
From page 100...
... . The committee queried NAGB about its rationale for adjusting the recommended cut scores for mathematics but not for reading, and NAGB provided further information in the form of excerpts from minutes for its quarterly board meetings held in 1992.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.