Skip to main content

Currently Skimming:

5 Developing Performance Level Descriptions and Setting Cut Scores
Pages 108-166

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 108...
... The process of determining the cut scores involved using procedures referred to as "standard setting," which were introduced in Chapter 3. As we noted in Chapter 3, standard setting is intrinsically judgmental.
From page 109...
... In fact, although the method is still used for setting the cut scores for NAEP's achievement levels, other methods are being explored with the assessment (Williams and Schulz, 2005)
From page 110...
... Participants in the standard settings provided feedback on the performance-level descriptions, and we present the different versions of the descriptions and explain why they were revised. The results of the standard settings appear at the end of this chapter, where we also provide a description of the adapted version of the contrasting groups procedure that we used and make our recommendations for cut scores.
From page 111...
... . The method also provides an opportunity to revise performancelevel descriptions at the completion of the standard-setting process so they are better aligned with the cut scores.
From page 112...
... Panelists first consider the description of the basic literacy performance level and the content and skills assessed by the first question in the ordered item booklet, the easiest question in the booklet. Each panelist considers whether an individual with the skills described in the basic category would have a 67 percent chance of answering this question correctly (or stated another way, if an individual with the skills described in the basic category would be likely to correctly answer a question measuring these specific skills two out of three times)
From page 113...
... Usually, mean scale scores are also calculated, and the variability in panelists' judgments is examined to evaluate the extent to which they disagree about bookmark placements. At the conclusion of the standard setting, it is customary to allot time for panelists to discuss and write performance-level descriptions for the items reviewed during the standard setting.
From page 114...
... As a result, we could not use the ALSA questions in the bookmark procedure. This created a de facto cut score between the nonliterate in English and below basic performance levels.
From page 115...
... Regardless of the rationale for the decision, it precluded our setting an overall cut score. Participants in the Bookmark Standard Settings Selecting Panelists Research and experience suggest that the background and expertise the panelists bring to the standard-setting activity are factors that influence the cut score decisions (Cizek, 2001a; Hambleton, 2001; Jaeger, 1989, 1991; Raymond and Reid, 2001)
From page 116...
... In the end, we were somewhat concerned about their familiarity with adults with lower literacy skills and thought that it would be difficult for those who primarily work in college settings to make judgments about the skills of adults who would be classified at the levels below intermediate. There was a limit to the number of panelists we could include, and we tried to include those with experience working with adults whose skills fell at the levels primarily assessed on NALS and NAAL.
From page 117...
... BOOKMARK STANDARD SETTING WITH 1992 DATA The first standard-setting session was held to obtain panelists' judgments about cut scores for the 1992 NALS and to collect their feedback about the performance-level descriptions. A total of 42 panelists participated in the session.
From page 118...
... In preparation for Round 3, each table received a summary of the Round 2 bookmark placements made by each table member as well as the medians for the table. In addition, each table received information about the proportion of the 1992 population who would have been categorized as having below basic, basic, intermediate, or advanced literacy based on the
From page 119...
... The bookmark session concluded with a group session to obtain feedback from the panelists, both orally and through a written survey. Using Different Response Probability Instructions In conjunction with the July standard setting, the committee collected information about the impact of varying the instructions given to panelists
From page 120...
... the extent to which panelists understand and can make sense of the concept of response probability level when making judgments about cut scores and (2) the extent to which panelists make different choices when faced with different response probability levels.
From page 121...
... Each table of panelists used the same response probability level for the second content area as they did for the first. Refining the Performance-Level Descriptions The performance-level descriptions used at the July standard setting consisted of overall and subject-specific descriptors for the top four performance levels (see Table 5-2)
From page 122...
... English complex, inferences those phrases, straightforward literacy 2004 in material perform phrases, operations common quantitative of from multiple above of "Advanced July words, commonplace range well words, use material construct written when the letters, sophisticated multiple and performance-level simple inferences information, about texts, broad some simple making written complex of draw when a competence above During Literacy" texts and drawing information of the pieces across Basic in Used recognize short with to understanding understand in understand commonplace understand information degrees comparisons, and and inferences multiple and "Below able less and described quantitative competence be read read difficulty of The read including levels Descriptions reading information drawing denser, has use integrate systematic quantitative Description May Can Can Can measures range, those level: Literacy." this this Survey of at "Basic for respectively, outside scores Literacy Literacy Literacy Literacy Performance-Level above, who Basic Adult literacy required or Literacy Descriptions in is 5-2 Below Basic Intermediate Advanced below what Overall individual National I
From page 123...
... consistently common and recognize to letters; and fact implied and to few from an dense, a drawing cause author's able with read commonplace read contains level: able summarize, to be not selection. to that to brief, a this but sounds recognize able in difficulty conclusions distinguishing identifying in able moderately text able determine recognize Prose at May Is Is scores Descriptions who Basic Literacy Below Literacy Basic Intermediate Literacy Subject-Area individual I
From page 124...
... specified Quantitative Is that pieces from able analyze multiple documents displays; to in contrast information able and integrate complex sources. to synthesize information compare able of contain to information; and multiple Document Is synthesize complex complex, handle to to able perform able lengthy, text; and read texts; to able abstract conditional information inferences.
From page 125...
... understanding short quantitative use figures, in making are make and commonplace understand simple skills; inferred; make September with recognize and charts, sometimes and into understand, less to relationship these to easily and may reading complicated (e.g., read information or read, able relationships During English: able locate denser, to to in be to with more in material Used able locate stated of able and able displays information situations; to is demonstrate use inferences; mathematical complex be or or and difficulty such or that written able texts information simple independently some enter operation making Descriptions independently sometimes short contexts has independently English; and the and independently locate draw operation responses multipart operations Description May Is Is level this at scores Literacy Literacy Performance-Level who Descriptions Basic 5-3 Literacy Below Overall individual Basic Intermediate I
From page 126...
... 126 from exist by may more for these or there simple ed levels to l locate arithmetic in the English recognize able commonly in inferences or sophisticated requir frequently contexts; to be in as and/or in when Literacy" simple operations demonstrate able Nonetheless, formats material be perform that performance-leve sophisticated numbers numbers encountered sometimes perform operations used problems described multiple Quantitative May "Advanced written more above those when responses and development. the information, draw in sight or complex of above letters, able charts, written be literacy locate well Literacy" encountered labels more pieces information of (e.g., may to described common on instructions use Basic recognize as comparisons, generate able range and to to multiple levels sometimes forms)
From page 127...
... generate information texts identifiable; to distinguish identify a able moderately text able inferences, and to demonstrate Is Is Literacy Basic Intermediate Literacy II.
From page 128...
... Quantitative Is and pieces that in to from to analyze multiple able to able skills. responses located contrast and these information integrate displays; and sources; written to information able of complex compare information, synthesize multiple generate demonstrate Document Is skills.
From page 129...
... In addition, the combined median scale score (based on the data from both tables) was calculated for each level, and impact data provided about the percentages of adults who would fall into the below basic, basic, intermediate, and advanced categories if the combined median values were used as cut scores.5 Panelists from both tables discussed their reasons for choosing different bookmark placements, after which each panelist independently made a final judgment of items that separated the test among basic, intermediate, and advanced literacy.
From page 130...
... Panelists were instructed to review the test items that would fall into each performance level (based on the Round 3 median cut scores) and prepare more detailed versions of the performancelevel descriptions, including specific examples from the stimuli and associated tasks.
From page 131...
... Our analysis yields weak evidence in favor of the latter hypothesis.6 We conducted tests to evaluate the statistical significance of the differences in bookmark placements and in cut scores. The results indicated that, for a given literacy area and performance level, the bookmark placements were tending in the right direction but were generally not statistically significantly different under the three response probability instructions.
From page 132...
... 132 on or a an a (e.g., a in three why into when on find format provided underline (prose) date (e.g., to (e.g., sentence money or and ate)
From page 133...
... 133 used all s that in and continued and use train argument or and encompas ledger the order transaction Literacy" distinctions insurance) check listing)
From page 134...
... An independently May Is Is Content Continued 5-4 Literacy Basic Prose TABLE B Level Below Basic Intermediate
From page 135...
... 135 used aid continued benefits a author's Social pay a television argument document a on tuition the net specific editorial a the article level (e.g., an and an on where, telephone found infer in the identify on or form current times. to identify with time caller's terms 3 the number and newspaper assistance of and information listing identify viewpoints out room specific short 2 government identify of or slip a article between and associated caller a a and place)
From page 136...
... 136 (e.g., of a or that a written order in display listing) or a one in infer another not television in variables displays to is a.m.)
From page 137...
... is in operation not or locate one-step manipulate, locate or is concrete to to it to perimeter Content sometimes primarily texts individual operations problems very primarily able information simple operation mathematical easy are able in information arithmetic when decimals, and An independently May Is Is Literacy Basic Quantitative D Level Below Basic Intermediate
From page 138...
... 138 or be ounce one (e.g., compute will per to items that cost form level purchase Security the to change rating, consumer of Social times. with needed for 3 government of efficiency a amount costs annual in out money or 2 associated of the energy given tasks compare and/or monthly correctly amount and gallon, items)
From page 139...
... Comparison of the impact data reveals that the effects of the different response probability instructions were larger for the cut scores for the document and quantitative areas than for prose. These findings raise several questions.
From page 140...
... 80% 15.5 20.6 rp67, Placements 13 18.2 rp67, 226.0 236.2 Basic (1) 80% 213.0 215.3 with with rp80, rp50, the the Setting Bookmark score for Setting Bookmark score for cut cut placement panelists placement panelists Median Standard of Median Standard of median median score score 5-5a 2004 score deviation 5-5b 2004 score deviation bookmark cut below Number bookmark cut below Number cut cut July July Instructions: Instructions: TABLE the RP Median Median Mean Standard Percent NOTE: TABLE the RP Median Median Mean Standard Percent NOTE:
From page 141...
... 67% 244.0 243.6 rp80 and NALS Placements Basic (1) 80% 14 11.5 51.7 rp67, with 283.0 279.1 rp50, the Setting Bookmark score for cut placement panelists Median Standard of median score 5-5c 2004 score deviation bookmark cut below Number cut July Instructions: TABLE the RP Median Median Mean Standard Percent NOTE:
From page 142...
... In determining the final cut scores from the bookmark procedure, we used all of the judgments from September but only the judgments from July based on the rp67 criterion. We are aware that many in the adult education, adult literacy, and health literacy fields have grown accustomed to using the rp80 criterion in relation to NALS results, and that some may at first believe that use of a response probability of 67 constitutes "lowering the standards." We want to emphasize that this represents a fundamental, albeit not surprising, misunderstanding.
From page 143...
... Changing the response probability criterion in the report may be justified by the reasons discussed above, but we acknowledge that disadvantages to this recommendation include the potential for misinterpretations and a less preferable interpretation in the eyes of some segments of the user community. In addition, use of a response probability of 67 percent for the bookmark standard-setting procedure does not preclude using a value of 80 percent in determining exemplary items for the performance levels.
From page 144...
... Comparison of the variability in cut scores in each literacy area shows that, for all literacy areas, the standard deviation for the advanced cut score was at least twice as large as the standard deviation for the intermediate or basic cut scores. Comparison of the variability in cut scores across literacy areas shows that, for all of the performance levels, the standard deviations for the quantitative literacy cut scores were slightly higher than for the other two sections.
From page 145...
... recommend report ing information about the amount of variation in cut scores that might be expected if the standard-setting procedure were replicated. The design of our bookmark sessions provided a means for estimating the extent to which the cut scores would be likely to vary if another standard setting was held on a different occasion with a different set of judges.
From page 146...
... Using the performance-level descriptions, the panelists are asked to place examinees into the performance categories in which they judge the examinees belong without reference to their actual performance on the test. Cut scores are then determined from the actual test scores attained by the examinees placed in the distinct categories.
From page 147...
... Comparison of the distribution of literacy scores for these two groups provides information that can be used in determining cut scores. This approach, while not a true application of the contrasting groups method, seemed promising as a viable technique for generating a second set of cut scores with which to judge the reasonableness of the bookmark cut scores.
From page 148...
... Second, due to the nature of the background questions, the groups were not distinguished on the basis of characteristics described by the performance-level descriptions. Instead, we used background questions as proxies for the functional consequences of the literacy levels, and, as described in the next section, aligned the information with the performance levels in ways that seemed plausible.
From page 149...
... Therefore, the bookmark cut score between these two performance levels was compared with the contrast between individuals without a high school diploma or GED certificate and those with a high school diploma or GED. Furthermore, because of a general policy expectation that most individuals can and should achieve a high school level education but not necessarily more, we expected the contrast between the basic and intermediate levels to be associated with a number of other indicators of unsuccessful versus successful 7 We could have used discriminant function analysis to determine the cut score, but in the usual normal assumption, the maximally discriminating point on the literacy scale would be the point at which equal proportions of the higher group were below and the lower group were above.
From page 150...
... 150 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS TABLE 5-8 Comparison of Weighted Median Scaled Scores for Groups Contrasted to Determine the QCG Cut Scores for Basic Literacy Weighted Median Scorea Groups Contrasted 1992 2003 Prose Literacy Education: No high school 182 159 Some high school 236 229 Average of medians 209.0 194.0 Self-perception of reading skills: Do not read well 140 144 Read well 285 282 Average of medians 212.5 213.0 Contrasting groups cut score for prose: 207.1b Document Literacy Education: No high school 173 160 Some high school 232 231 Average of medians 202.5 195.5 Self-perception of reading skills: Do not read well 138 152 Read well 279 276 Average of medians 208.5 214.0 Contrasting groups cut score for document: 205.1 functioning in society available on the background questionnaire, specifically the contrast between: · Needing a lot of help with reading versus not needing a lot of help with reading. · Never reading the newspaper versus sometimes reading the newspaper.
From page 151...
... These QCG cut scores for prose (243.5) , document (241.6)
From page 152...
... 152 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS TABLE 5-9 Comparison of Weighted Median Scaled Scores for Groups Contrasted to Determine the QCG Cut Scores for Intermediate Literacy Weighted Median Scorea Groups Contrasted 1992 2003 Prose Literacy Education: Some high school 236 229 High school diploma 274 262 Average of medians 255.0 245.5 Extent of help needed with reading: A lot 135 153 Not a lot 281 277 Average of medians 208.0 215.0 Read the newspaper: Never 161 173 Sometimes, or more 283 280 Average of medians 222.0 226.5 Read at work: Never 237 222 Sometimes, or more 294 287 Average of medians 265.5 254.5 Financial status: Receive federal assistance 246 241 Receive interest, dividend income 302 296 Average of medians 274.0 268.5 Contrasting groups cut score for prose: 243.5b Document Literacy Education: Some high school 232 231 High school diploma 267 259 Average of medians 249.5 245.0 Extent of help needed with reading: A lot 128 170 Not a lot 275 273 Average of medians 201.5 221.5 Read the newspaper: Never 154 188 Sometimes, or more 278 275 Average of medians 216.0 231.5 Read at work: Never 237 228 Sometimes, or more 289 282 Average of medians 263.0 255.0
From page 153...
... b The cut score is the overall average of the weighted medians for the groups contrasted.
From page 154...
... . These QCG cut scores for prose (292.1)
From page 155...
... Procedures for Using QCG Cut Scores to Adjust Bookmark Cut Scores Most authorities on standard setting (e.g., Green, Trimble, and Lewis, 2003; Hambleton, 1980; Jaeger, 1989; Shepard, 1980; Zieky, 2001) suggest that, when setting cut scores, it is prudent to use and compare the
From page 156...
... . Like the standard-setting procedure itself, determination of final cut scores is ultimately a judgment-based task that authorities on standard setting maintain should be based on both quantitative and qualitative information.
From page 157...
... 500) recommended considering all of the results from the standard setting together with "extra-statistical factors" to determine the final cut scores.
From page 158...
... Overall, this comparison suggests that the bookmark cut scores should be lowered slightly. We designed a procedure for combining the two sets of cut scores that was intended to make only minor adjustments to the bookmark cut scores, and we examined its effects on the resulting impact data.
From page 159...
... Therefore, the bookmark cut score would be reduced from 270 to 267. Application of these rules to the remaining cut scores indicates that all of the bookmark cut scores should be adjusted except the basic cut scores for prose and document literacy.
From page 160...
... Overall, the adjustment procedure tended to produce a distribution of participants across the performance levels that resembled the distribution produced by the original bookmark cut scores. The largest changes were in
From page 161...
... In our view, the procedures used to determine the adjustment were sensible and served to align the bookmark cut scores more closely with the relevant background measures. The adjustments were relatively small and made only slight differences in the impact data.
From page 162...
... ALSA and NAAL items were not analyzed or calibrated together and hence were not placed on the same scale. We were therefore not able to use the ALSA items in our procedures for setting the cut scores.
From page 163...
... PERFORMANCE-LEVEL DESCRIPTIONS AND SETTING CUT SCORES 163 TABLE 5-12b Comparison of Impact Data for Document Literacy Based on Rounded Bookmark Cut Scores, Rounded Adjusted Cut Scores, and Rounded Confidence Interval for Cut Scores Basic Intermediate Advanced Roundeda bookmark cut score 205 255 345 Percent below cut score: 1992 16.8b,c 40.8 89.2 2003 14.2c,d 39.4 91.1 Roundeda adjusted cut score 205 250 335 Percent below cut score 1992 16.8 37.8 85.8 2003 14.2 36.1 87.7 Roundede confidence interval 192-211 246-265 321-373 Percent below cut scores: 1992 12.9-18.9 35.5-47.0 79.9-95.6 2003 10.5-16.3 33.7-46.0 81.6-96.9 See footnotes to Table 15-12a. TABLE 5-12c Comparison of Impact Data for Quantitative Literacy Based on Rounded Bookmark Cut Scores, Rounded Adjusted Cut Scores, and Rounded Confidence Interval for Cut Scores Basic Intermediate Advanced Roundeda bookmark cut score 245 300 355 Percent below cut score: 1992 33.3b,c 65.1 89.3 2003 27.9c,d 61.3 88.6 Roundeda adjusted cut score 235 290 350 Percent below cut score 1992 28.5 59.1 87.9 2003 23.1 55.1 87.0 Roundede confidence interval 226-263 283-306 343-396 Percent below cut scores: 1992 24.7-42.9 55.0-68.5 85.6-97.1 2003 19.2-37.9 50.5-64.9 84.1-97.2 See footnotes to Table 15-12a.
From page 164...
... This percentage plus those in the below basic category would be equivalent to the 1992 below basic category. Adjusted 16.8 21.0 48.0 14.2 Below Basic Basic Intermediate Advanced Bookmark 16.8 24.0 48.4 10.8 FIGURE 5-3 Comparison of the percentages of adults in each performance level based on the bookmark cut scores and adjusted cut scores for 1992 document literacy.
From page 165...
... This percentage plus those in the below basic category would be equivalent to the 1992 below basic category. Adjusted 28.5 30.6 28.8 12.1 Below Basic Basic Intermediate Advanced Bookmark 33.3 31.8 24.2 10.7 FIGURE 5-5 Comparison of the percentages of adults in each performance level based on the bookmark cut scores and adjusted cut scores for 1992 quantitative literacy.
From page 166...
... It is therefore with some reservations that we include the advanced category in our recommendation for performance levels, and we leave it to NCES to ultimately decide on the utility and meaning of this category. With regard to the lower and upper ends of the score scale, we make the following recommendation: RECOMMENDATION 5-2: Future development of NAAL should include more comprehensive coverage at the lower end of the continuum of literacy skills, including assessment of the extent to which individuals are able to recognize letters and numbers and read words and simple sentences, to allow determination of which individuals have the basic foundation skills in literacy and which individuals do not.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.