improvements in validity (assuming such improvements can be achieved) justify continued use of the GATB for job referral purposes. Expressed differently, the department needs to determine how much of an increase in predictive validity would justify continued use of the GATB given that the remaining selection error could still impose a disproportionate burden on minority-group job candidates because of their tendency, as a group, to get lower scores than majority-group candidates.2 These questions must lay at the foundation of the GATB research and development program.

The Board on Testing and Assessment finds that simply continuing the improvement program as currently outlined may be economically unwise, given the limited prospects that the program will lead to significant improvements in validity and reductions in adverse impact. The board therefore recommends that the department broaden the scope of the research and development plan to allow for modifications to the structure and format of the GATB, to explore new methods of measuring abilities for which the test is intended, and to evaluate alternative modes of delivery of test contents and responses. As detailed below, certain elements of the current program warrant continuation, while others could be deemphasized or eliminated.

NOTE ON TERMINOLOGY

The concept of “adverse impact” has its origins in federal legislation and case law that reflect the growing national commitment to equality of employment opportunity. Title VII of the 1964 Civil Rights Act outlawed employment practices that “adversely affect” an individual's status as an employee because of that employee's race, color, religion, sex, or national origin. Implementation of Title VII and subsequent legislation has been controversial because of the difficulties in pinpointing the causes of observed differences in the employment opportunities of various population groups. The core question has been whether discrepant employment outcomes--such as differential hiring, promotion, or retention rates--reflect real differences in ability or performance or whether they are primarily the result of inherently unfair or biased employment practices.

Adverse impact, defined as differential hiring rates for various groups in the population of job candidates, can occur whenever there are differences in scores on a selection device. It has long been known, for example, that the use of standardized tests of cognitive ability as a determinant of employee selection can contribute significantly to inequality in the hiring changes of majority- and minority-group candidates. This finding was reported in an earlier NRC report, which noted: “When candidates are ranked according to [ability] test score and when test results are a determinant in the employment decision, a

2  

See the committee report, pages 255-258, for discussion of the effects of prediction error on low-scoring and high-scoring test takers.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 2
Evaluation of the U.S. Employment Service Workplan for the GATB Improvement Project improvements in validity (assuming such improvements can be achieved) justify continued use of the GATB for job referral purposes. Expressed differently, the department needs to determine how much of an increase in predictive validity would justify continued use of the GATB given that the remaining selection error could still impose a disproportionate burden on minority-group job candidates because of their tendency, as a group, to get lower scores than majority-group candidates.2 These questions must lay at the foundation of the GATB research and development program. The Board on Testing and Assessment finds that simply continuing the improvement program as currently outlined may be economically unwise, given the limited prospects that the program will lead to significant improvements in validity and reductions in adverse impact. The board therefore recommends that the department broaden the scope of the research and development plan to allow for modifications to the structure and format of the GATB, to explore new methods of measuring abilities for which the test is intended, and to evaluate alternative modes of delivery of test contents and responses. As detailed below, certain elements of the current program warrant continuation, while others could be deemphasized or eliminated. NOTE ON TERMINOLOGY The concept of “adverse impact” has its origins in federal legislation and case law that reflect the growing national commitment to equality of employment opportunity. Title VII of the 1964 Civil Rights Act outlawed employment practices that “adversely affect” an individual's status as an employee because of that employee's race, color, religion, sex, or national origin. Implementation of Title VII and subsequent legislation has been controversial because of the difficulties in pinpointing the causes of observed differences in the employment opportunities of various population groups. The core question has been whether discrepant employment outcomes--such as differential hiring, promotion, or retention rates--reflect real differences in ability or performance or whether they are primarily the result of inherently unfair or biased employment practices. Adverse impact, defined as differential hiring rates for various groups in the population of job candidates, can occur whenever there are differences in scores on a selection device. It has long been known, for example, that the use of standardized tests of cognitive ability as a determinant of employee selection can contribute significantly to inequality in the hiring changes of majority- and minority-group candidates. This finding was reported in an earlier NRC report, which noted: “When candidates are ranked according to [ability] test score and when test results are a determinant in the employment decision, a 2   See the committee report, pages 255-258, for discussion of the effects of prediction error on low-scoring and high-scoring test takers.

OCR for page 2
Evaluation of the U.S. Employment Service Workplan for the GATB Improvement Project comparatively large fraction of blacks and Hispanics are screened out.”3 The scientific problem has been to determine whether the adverse impact can be ascribed to imperfections in the test or to real differences in the abilities of job applicants. (The issue of whether society can or should accept any level of adverse impact, regardless of its causes, is a matter of policy, not scientific analysis.) The 1989 NRC report addressed the scientific question with respect to the GATB by analyzing the extent to which the adverse impact it produces is caused by prediction (selection) error inherent in the test. The NRC committee found: The GATB has modest predictive validity. The GATB produces sizable group differences in test scores, with minority test-takers scoring lower. As would be the case for any test with modest predictive validity and group differences in scores, the GATB produces large classification error (i.e., qualified candidates classified as unqualified and unqualified candidates classified as qualified). The burden of qualified applicants being misclassified as unqualified falls disproportionately on minority candidates. Conversely, majority-group applicants tend to score better on the test than they do on the job and therefore benefit from errors of false acceptance. Misclassification error occurs with any test that has less than perfect validity, and the misclassification rate increases as the validity decreases. The misclassification of qualified individuals as unqualified will always affect the lower scoring group, however defined. On the basis of these findings, the NRC report concluded (page 7) that “the impact of selection error on minority and nonminority applicants demonstrates that in the absence of score adjustments, minority applicants who could perform successfully on the job will be screened out of the referral group in greater proportions than are equivalent majority-group applicants.”4 3   Alexandra K. Wigdor and Wendell R. Garner, eds., 1982. Ability Testing: Uses,Consequences, and Controversies. Committee on Ability Testing, National Research Council. Washington, D.C.: National Academy Press. 4   It is important to note in this context that a test with perfect predictive validity could, theoretically, produce adverse impact because of real differences in the performance of tested individuals; in this case, however, the adverse impact would not be ascribed to scientific flaws in the test.