the exercises, instructions, and rubrics should be piloted as part of development; and
the test forms should be piloted for timing and feasibility of the assessment process for candidates.
After preliminary versions of the assessments have been constructed, they should be field tested on representative samples of candidates. Assessment analysis is conducted after field testing and after the assessment is administered operationally. To the extent feasible, the analysis should include an assessment of the adequacy of functioning of the assessment exercises and an examination of responses for differential item functioning for major population groups. In particular, the criteria for this phase include the following:
the assessments should be field tested on an adequate sample that is representative of the intended candidates;
where feasible, assessment responses should be examined for differential functioning by major population groups to help ensure that the exercises do not advantage or disadvantage candidates from particular geographic regions, races, gender, cultures, or educational ideologies or with those disabilities;
assessment analysis (e.g., item difficulty and discrimination) methods should be consistent with the intended use and interpretation of scores; and
clearly specified criteria and procedures should be used to identify, revise, and remove flawed assessment exercises.
Appropriate administration conditions, scoring processes, quality control procedures, confidentiality requirements, and procedures for handling assessment materials should be used. Clear policies on retaking the examination and on the appeals process should be communicated to candidates. In particular, the committee’s criteria include the following:
proctors and scorers should be appropriately qualified;
uniform assessment conditions should be provided for candidates to test under standard conditions;
appropriate accommodations should be made for candidates with disabilities;
scorers of performance assessments and other kinds of open-ended test responses should be appropriately recruited and trained, including being trained to score responses from a culturally diverse group of candidates;