setting methods. Some researchers have suggested that variability can be minimized if multiple methods are used in setting achievement standards and a panel then uses these results to set the final standards.
A critical aspect of the standard-setting process is the collection of evidence to support the validity of the standards and the decisions that are made in using them (Kane, 2001). A first step in doing this is to examine the coherence of the standard-setting process—that is, the standard-setting methods should be consistent with the design of the assessment and the model of achievement underlying the assessment program (Kane, 2001). Evidence regarding the soundness of the design and implementation of the standard-setting study is needed; this could include reviews of the procedures used for selecting and training judges and for crafting descriptors for the achievement levels. Researchers advocate that descriptors for the achievement levels be developed before the cut scores are established so that the judges have a clear definition of each of the levels.
Evidence of the extent of internal inconsistency, or variability, in judgments is needed as well. Variability among judges can be examined at the different stages of the standard-setting process. For example, after training, judges can be asked to independently set cut scores in the first round of the process. Each judge can then be provided with information on the cut scores set by other judges; after a group discussion, the judges can be asked to review their own cut scores and make any modifications they deem necessary. The variability of the judges can also be examined after this second round. Judges can be shown impact data (demonstrating the effects of setting cut scores at particular levels, for example) and then be asked to discuss how this affects their chosen cut scores. Afterward they could have another opportunity to make modifications to their cut scores if they wish. The variability can be examined again at this point.
The consistency of the standards set by independent sets of judges representing the same constituencies should also be evaluated. This would require forming independent panels of comparably qualified judges to set standards under the direction of comparable leaders using the same method, procedures, instructions, and materials. The variance in the standards set by the independent panels provides a measure of the error present with panels and standard-setting leaders (Linn, 2003). Supplementary data should also be collected regarding the judges’ level of satisfaction with the standard-setting process as well as their degree of confidence in the resulting cut scores. Surveys and interviews can provide these data.
Evidence of external validity is also needed. NCLB requires states to participate in biennial administrations of the state-level National Assessment of Educational Progress (NAEP) in reading and mathematics at grades 4 and 8. In the near future, states will also be able to participate in administrations of NAEP in sci-