Skip to main content

Currently Skimming:

2 Setting Achievement Levels: History
Pages 35-56

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 35...
... 101) , cut scores embody value judgments as well as technical and empirical considerations.
From page 36...
... EVOLUTION OF NAEP ACHIEVEMENT LEVELS NAEP was first administered in 1969 as a way to report on the academic performance and progress of the nation's students. Reflecting concerns about potential federal intrusion into the country's decentralized education system, NAEP was initially defined "more in terms of what it would not do than what it would do" (Bourque, 1999, p.
From page 37...
... . The law authorized a voluntary Trial State Assessment Program to enable the use of NAEP for cross-state comparisons.2 It also created NAGB, a bipartisan and broadly representative body, to set policy for NAEP, set achievement goals, and establish guidelines for reporting and disseminating NAEP results.
From page 38...
... In its 1990 policy statement, NAGB established three benchmarks -- Basic, Proficient, and Advanced -- which would be referred to as achievement levels. These levels, and the definition of performance for each level, are generic standards that are applied across NAEP assessments and grade levels.4 They are often called "policy definitions" or "policy standards." In 1993, NAGB adopted a new policy statement in which the policy definitions were revised, in part to reflect issues raised during evaluations of the standard setting in 1990 and 1992.
From page 39...
... were revised, although the cut scores were not reset. At the same time, the mathematics framework for grade 12 was also adjusted and changes were made to the ALDs.
From page 40...
... These scores are regarded as predictors of college performance. As such, they represent a move toward developing achievement levels (or benchmarks)
From page 41...
... . Standard setting has long been used in the context of professional licensing and certification testing, where the focus is primarily on determining the cutoff score between "pass" and "fail." In the context of education, standard setting developed as part of the criterion-referenced testing movement through the 1960s and 1970s (see below)
From page 42...
... Gradually, there was increased focus on how to establish and describe these points. Setting Standards The articles in the 1978 issue of the Journal of Educational Measurement focus more on whether to set a cut score or performance standard than on how to do it.
From page 43...
... . EVALUATIONS OF NAEP'S STANDARD SETTINGS Given the potential value to the nation of setting achievement levels for NAEP and the importance of "getting it right," the procedures and
From page 44...
... 5.  he process used by NAGB did not facilitate the development of T consensus, either in developing descriptions or in setting cut scores.
From page 45...
... . • considerations in adopting cut scores (Geisinger and McCormick, 2010; Giraud et al., 2000)
From page 46...
... . Using achievement levels to summarize assessment results for largescale educational assessments has come to be routine: the results of nearly all assessment programs administered in K-12 education are reported using achievement levels.
From page 47...
... The evaluations of the 1992 standard settings harshly criticized the modified Angoff method, noting that it presented panelists with an unreasonable cognitive task. One measurement expert described the complexity of the task and the objections to it (Haertel, 2001, pp.
From page 48...
... Training panelists to perform these tasks is key to obtaining reliable and valid results. EVOLUTION OF ACHIEVEMENT-LEVEL DESCRIPTORS In 1992, little guidance existed with regard to the development and use of ALDs for standard setting, and they were rarely used during the actual process of setting cut scores (Bourque, 2000, cited in Egan et al., 2012)
From page 49...
... SETTING ACHIEVEMENT LEVELS: HISTORY 49 TABLE 2-1  Comparison of Standard Setting Guidance in Successive Editions of Standards for Educational and Psychological Testing Category 1985 1999 2014 Validity When subject-matter When a validation When a validation experts have been rests in part on the rests in part on the asked to judge decisions of expert decisions of expert whether items are an judges, observers, or judges, observers, or appropriate sample raters, procedures raters, procedures of a universe or are for selecting such for selecting such correctly scored, or experts and for experts and for when criteria are eliciting judgments eliciting judgments composed of rater or ratings should or ratings should judgments, the be fully described. be fully described.
From page 50...
... . Decision If specific cut scores When a test or When a test or Consistency are recommended combination of combination of for decision making measures is used to measures is used to (for example, make classification make classification in differential decisions, estimates decisions, estimates diagnosis)
From page 51...
... . Selection and When cut scores When cut scores Training of defining pass-fail defining pass-fail or Judges or proficiency proficiency levels categories are based are based on direct on direct judgments judgments about the about the adequacy adequacy of item or of item or test test performances, the performances or judgmental process performance levels, should be designed so the judgmental the participants can process should be bring their knowledge designed so that and experience to bear judges can bring in a reasonable way their knowledge and (Standard 5.22)
From page 52...
... This information should scores and the validity information should be included in the of their recommended include the validity test's documentation. interpretations, of the cut scores or When relevant for and the methods configural rules used test interpretation, for establishing and a description test documents performance cut of the samples from ordinarily should scores (Standard 7.4)
From page 53...
... SOURCE: Adapted from American Educational Research Association et al.
From page 54...
... Besides the Standards (American Educational Research Association et al., 1985, 1999, 2014) , the edited volume, Educational Measurement, issued under the guidance of the National Council on Measurement in Education, provides an historical account of the psychometric considerations associated with cut scores.
From page 55...
... In particular, setting multiple performance standards had not been done in the past, and the use of different standard setting methods for multiple-choice items and constructed-response items was new. Subsequent revisions of the Standards provide more explicit guidance and standards for using these methods in achievement testing.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.