Skip to main content

Currently Skimming:

3 Setting Achievement Levels: NAEP's Process
Pages 57-78

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 57...
... Although procedural evidence cannot guarantee the validity of the resulting achievement levels, it can invalidate the results of the standard setting. As specified in the Standards (American Educational Research Association et al., 2014, p.
From page 58...
... Since 1992, a large number of cut-score setting methods have been developed, and research shows that the choice of method has an effect on the resulting performance standards. There are no firm decision rules to guide the choice of method, and measurement experts disagree about the strengths and weaknesses of the various methods.
From page 59...
... laid out a series of questions to guide the selection of panelists, including questions related to the desired demographic profile of the standard setting panel, the inclusion of certain constituencies, how many panelists to select, and how to select panelists. These questions remain a primary concern whenever a standard setting method is implemented.
From page 60...
... . For mathematics, the plan resulted in the identification of 424 individuals who were asked to be nominators of panelists for the achievementlevel setting process: 100 to serve as nominators of teachers, 180 to serve as nominators of nonteacher educators, and 144 to serve as nominators of representatives of the general public.
From page 61...
... Selecting Nominees The goal was to create six panels, one for each combination of subject area (reading, mathematics) and grade level (4, 8, and 12)
From page 62...
... 62 EVALUATION OF ACHIEVEMENT LEVELS ON NAEP TABLE 3-1  Descriptive Data for Standard-Setting Nominees, Panelists, and Nominators (in percentage) Mathematics Reading   Nominees Panelists Nominees Panelists Characteristic N = 424 N = 68 N = 366 N = 62 Grade Level  4 31.40 33.80 39.30 35.50  8 32.10 33.80 29.50 32.30 12 36.60 32.40 31.20 32.30 Panelist Type         Teacher 23.60 52.90 36.10 56.50 Nonteacher educator 42.60 19.10 37.70 16.10 General public 34.00 27.90 26.20 27.40 Gender         Male 44.10 41.20 20.00 24.20 Female 55.90 58.80 80.00 75.80 Race and Ethnicity         White 78.30 77.90 72.70 79.00 Black 16.00 14.70 19.40 14.50 Asian  1.00  1.50  1.10  1.60 Native American  1.00  1.50 -- -- Hispanic  3.50  4.40  3.80  4.80 No data -- --  3.00 -- Region of Nominator         West 22.40 23.05 24.90 22.50 Central 17.50 14.70 17.50 17.70 Southeast 29.20 32.40 38.30 40.30 Northeast 30.90 29.40 19.40 19.40 Community Type of         Nominator Low socioeconomic status -- 22.10 15.30 24.20 Not low socioeconomic -- 63.20 57.70 67.70 status No data -- 14.70 27.70  8.10 District Size of Nominator         More than 50,000 -- 41.20 30.90 33.90 Less than 50,000 -- 44.10 42.10 58.10 No data/not applicable -- 14.70 27.00  8.00 Institutional Affiliation         Public -- 93.90 -- -- Private --  6.10 -- --
From page 63...
... . In 1992, little guidance existed with regard to the development and use of achievement-level descriptions for standard setting, and they were rarely used during the actual process of setting cut scores (Bourque, 2000, cited in Egan et al., 2012)
From page 64...
... NAGB formulated the policy descriptors before the standard setting was done. The standard setting panelists drafted more detailed versions as part of the standard setting process.
From page 65...
... Arriving at a mutual understanding of what each achievement level "meant" was critical to the success of the process. The panelists needed more time than had been planned to work on developing the achievement-level descriptions in order to feel comfortable with using them in rating items and setting achievement levels.
From page 66...
... The purpose of this exercise was to familiarize panelists with the test content and scoring protocols, as well as to refresh their memories of test taking under time constraints. Working in small groups of five or six, separated by content area and grade level, panelists generated a list of descriptors that reflected what they thought student performance should be at each achievement level, using the NAEP framework and their experience in taking the test.
From page 67...
... The major purpose for having panelists develop their own set of gradespecific content-based descriptions of Basic, Proficient, and Advanced was to ensure that, to the extent possible, all panelists would have both a common set of content-based referents to use during the item-rating process and a common understanding of borderline performance for each of the three achievement levels at the specified grade levels. METHOD IMPLEMENTATION There are various ways to implement any standard setting method, and many decisions to make about the procedures.
From page 68...
... . The training involved multiple elements, including pur 3Borderline refers to the "cut point" or minimal competency point separating any two achievement levels.
From page 69...
... Lectures, visual aids, question-and-answer sessions, and practice were all used to provide panelists with the necessary instructions before they began the actual process of setting achievement levels. Grade-level groups were led by experienced facilitators who were trained in the process and who had spent many hours of preparation on how best to implement the process.
From page 70...
... To prepare for the first round, panelists responded to the set of assessment items they had been given and used the scoring keys and protocols to review their answers and score themselves. During round 1, panelists provided ratings for all items for all three achievement levels.
From page 71...
... . Panelists' ratings were generally positive, and the majority indicated that they had confidence in the resulting achievement levels.
From page 72...
... 28 The instructions on   Somewhat   what I was to do    Absolutely Clear Clear   Not at All Clear during the rating Math 26.47 61.76  7.35 2.94 0.00 4.13 sessions were Reading 36.67 55.00  8.33 0.00 0.00 4.28 29 My level of   Marginally   understanding of Totally Adequate Adequate Totally Inadequate the tasks I was to Math 32.35 58.82  5.88 1.47 0.00 4.24 accomplish during the rating session was Reading 40.00 55.00  5.00 0.00 0.00 4.35 30 The amount of time      Far Too Long About Right    Far Too Short   I had to complete the tasks I was to Math  0.00 14.71 80.88 2.94 0.00 3.12 accomplish during the Reading  1.67 23.33 66.67 6.67 1.67 3.17 rating sessions was
From page 73...
... 31 The amount of time      Far Too Long About Right    Far Too Short   I had to complete Math  0.00 13.24 80.88  1.47  2.94 3.06 the tasks I was to accomplish was Reading  0.00 20.00 76.67  3.33  0.00 3.17 generally: 32 The most accurate   Somewhat   description of my level   Totally Confident Confident Not at All Confident of confidence in the Math 27.94 54.41 16.18  0.00  0.00 4.12 achievement-levels ratings I provided was: Reading 16.67 73.33  8.33  1.67  0.00 4.05 33 I would describe the   Somewhat   effectiveness of the   Highly Effective Effective Not at all Effective achievement-levels Math 29.41 54.41  7.35  7.35  0.00 4.07 setting process as: Reading 23.33 60.00 15.00  1.67  0.00 4.05 34 During some of the   To Some   discussions, I felt a   To a Great Extent Extent   Not at All need to defend the Math  0.00  7.35 44.12 29.41 17.65 2.42 ratings I had made. Reading  5.00 21.67 41.67 25.00  6.67 2.93 35 During the round-2   To Some   Not at All   ratings, I felt coerced to   To a Great Extent Extent modify my ratings from Math  1.47  1.47 14.71 26.47 54.41 1.67 the previous round.
From page 74...
... Reading  1.67  0.00 20.00 28.33 45.00 1.79 37 I feel that this NAEP   To Some   Achievement Levels   To a Great Extent Extent   Not at All Study provided me Math 57.35 30.88  8.82  1.47  0.00 4.46 an opportunity to use my best judgment in Reading 53.33 31.67 15.00  0.00  0.00 4.38 selecting papers to set achievement levels for an NAEP assessment. 38 I feel that this NAEP   To Some   Achievement Levels   To a Great Extent Extent   Not at All Study would produce Math 42.65 42.65  8.82  4.41  0.00 4.25 achievement levels that would be defensible.
From page 75...
... Math 55.88 42.65  1.47 0.00 1.43 recommending use of the achievement levels Reading 65.00 28.33    6.67 0.00 1.42 that resulted from this achievement levels setting activity. NOTE: For all questions except the last (42)
From page 76...
... . •  majority said they were given the opportunity to use their best A judgment in setting achievement levels to a great extent (Item 37, see Table 3-2)
From page 77...
... The standard setting needed to be carried out in a way that would support these intended inferences. Our conclusions are based on our examination of the process for setting achievement levels and on comparing it with guidance from the Standards for Educational and Psychological Testing (American Educational Research Association et al., 1985)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.