Click for next page ( 252


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 251
Recommendations for Referral and Score Reporting A particular charge of the committee is to review the use of within- group scoring in the VG-GATB Referral System. This method of scoring transforms raw scores into percentile scores referenced to particular subpopulations (black, Hispanic, and other). It was adopted to prevent the test-based referral system from adversely affecting the employment opportunities of minority applicants. The adjustments made by computing percentile scores within the specified subpopulations have the effect of erasing average group differences in reported test performance. There are several steps in the production of within-group percentile scores. First, the raw test scores for each applicant are converted into five job family scores, based on predetermined weightings of the cognitive, perceptual, and psychomotor composites. Then each of the applicant's five job family scores is converted to a percentile score, which shows the applicant's ranking with respect to others in the same ethnic or racial subgroup on a scale of 1 to 100. That ranking is derived from norm groups constructed from samples of blacks, Hispanics, and majority-group job incumbents who took the test in a number of General Aptitude Test Battery (GATE) validity studies. In the VG-GATB system, applicants are referred to jobs in order of their percentile scores, and the scores are reported to employers without designations of the applicant's group identity. Hence a black applicant with a Job Family IV within-group score of 70 percent will have the same referral status as a white ("other") applicant with a within-group score of 70 percent, although their raw scores would be 283 and 327, respectively 251 .

OCR for page 251
252 CONCLUSIONS AND RECOMMENDATIONS Within-group scoring is without question race-conscious. It is an example of what some commentators describe as an inclusionary or benign racial classification, because it was adopted by the U.S. Employ- ment Service (USES) in order to enrich the employment opportunities of black and Hispanic job seekers (while at the same time promoting the overall quality of applicants referred to an employer). Others, chief among them the former Assistant Attorney General for Civil Rights, Wm. Bradford Reynolds, view within-group scoring as intentional racial dis- crimination, an abridgment of the equal protection clause of the Consti- tution and illegal under Title VII of the Civil Rights Act of 1964. In its interim report (Wigdor and Hartigan, 1988) the committee concluded that, as an instrument of public policy, the "within-group referral procedure is an effective way to balance the conflicting goals of productivity and racial equity,'' at least as far as the individual employer is concerned. Nevertheless, the committee refrained from endorsing the way within-group percentile scores are being used in the VG-GATB Referral System because of concerns about its legal status, about the representativeness of the norm groups used in score conversions, and about potential misunderstanding by employers and applicants in inter- preting the reported scores. The sole use of group-based percentile scores, in the absence of any information about the applicant's self- reported group membership or about the size of the adjustments made to minority scores, would encourage two kinds of misinterpretation on the part of employers: 1. The employer could easily assume that all individuals with the same reported score achieved the same raw score on the GATB. 2. The employer might also be led to assume that all candidates with the same percentile score on the test would have the same expected performance on the job. We could have added a third reservation, for, if the VG-GATB Referral System became a very important route to employment, policy makers would have to anticipate that at least some applicants might claim minority status at the local Job Service office in order to get the benefit of preferential score adjustments and make no such claim at the workplace, so that the meaning of the reported score would be interpreted with reference to the majority group. Despite these reservations, we conclude this chapter with the recom- mendation that score adjustments, possibly within-group percentile score adjustments, continue to play a role, albeit a somewhat different role, in the VG-GATB Referral System for reasons that emerge from our technical analyses of GATE data as well as considerations of social policy.

OCR for page 251
REFERRAL AND SCOW SPORTING 253 The analysis in the committee's interim report was based on theoretical comparisons of within-group scoring and a number of alternative referral and reporting options. It was taken as given that referral would be based on a test in which minority average scores were substantially lower than majority average scores. The assumptions that allowed the theoretical comparisons were chosen to match, as best we knew, the circumstances of Employment Service referrals. The comparisons also depended on assumptions about the validity of the test and its predictive behavior for different racial groups. We are now in a position to look again at alternative score reporting and referral models, but at this time many of the earlier assumptions can be replaced by empirical statements. Evidence presented earlier in this report establishes that the average scores of black Job Service clients are substantially lower than those of majority clients, although the difference varies somewhat by job family. Our earlier assumption that the GATB does not predict differently for different racial groups needs some qualification in light of the analyses presented in Chapter 9. There is evidence that the GATB has somewhat lower correlations with supervisor ratings of job performance for blacks compared with whites. Nevertheless, the use of a regression equation based on the combined group of black and nonminority workers would generally not give predictions that are biased against blacks. Insofar as the total-group equation gives systematically different predictions, it is somewhat more likely to overpredict the performance of blacks than to underpredict. The degree of overprediction is slight at the lower score ranges, and somewhat larger at higher score levels. We have now made independent estimates of GATB validities (pre- sented in Chapter 8), taking account of recent (post-1972) validity studies. The modest relationship between GATB scores and ratings of perfor- mance on the job-our estimate is an average corrected validity of .30 with about 90 percent of the jobs studied falling in the range of .20 to .40 is one important factor for policy makers to consider in assessing various referral alternatives. PERSPECTIVES ON TEST FAIRNESS What makes the use of a test fair? Like most Americans, testing specialists have wrestled with questions of equity and fairness in the past two decades. A number of models for the fair use of tests have been proposed in the psychometric literature. The following discussion of fairness draws on this literature as well as more popular sources to build a framework for the analysis of score reporting and referral methods.

OCR for page 251
254 CONCLUSIONS ED RECOMMENDATIONS To illustrate various perspectives on fairness, we take as given the conditions that would apply in the proposed VG-GATB system: 1. Applicants meeting other criteria set by the employer will be referred in order of their scores on the test. 2. The test is modestly predictive of job performance, so expected performance increases with test score. 3. The applicants represent several population subgroups. 4. There are substantial subgroup differences in average scores on the test. When can the use of a test be said to be fair to the various subgroups? The perspectives offered by psychometricians are derived from quantita- tive analysis of the joint distributions of group status, test score, and job performance (as indicated by a criterion measure such as supervisor ratings of performance). Since only group status and test scores are known for applicants, information about future job performance must be extrapolated from validity studies of job incumbents who have taken the test. The many definitions of fairness that have grown out of concern about the use of employment tests can be distilled for our purposes into two general approaches: fairness in predicting job performance from test score and fairness in selection, given job performance. Fairness in Predicting Job Performance from Test Score It can be argued that selection is fair if the predicted distribution of job performance for people with a given test score does not vary by population subgroup. We expect a white person with a test score of 70 to perform about the same as a black or Hispanic person with a test score of 70. In this conception of fairness, the focus is on prediction and whether the test predicts differently for different groups. If there is no evidence of differential prediction by group, then knowing any individual's test score is sufficient to predict job performance; the employer can make the same inferences about future job performance for all applicants. If, however, a test is found to predict differentially (as the GATB appears to for white and black applicants), then information about group status would be necessary to make appropriate inferences from test scores. In this definition, fairness consists of the evenhandedness with which the test predicts the future job performance of various subgroups. If a given test score can be associated with the same level of future job performance for black and white applicants, that is to say, if there is no predictive bias, then the test is fair and, to the extent that one feels that selection should be based solely on predicted performance, the selection

OCR for page 251
REFERRAL AND SCORE REPORTING 255 system is fair. Note that this definition of test fairness does not address group differences in average test scores or the legal problem of adverse impact. This definition is the classical one (Cleary, 1968) and the conception of fairness most widely accepted in the psychometric literature, at least as a minimum requirement (e.g., Petersen and Novick, 1976; American Edu- cational Research Association et al., 19851. When testing professionals refer to test bias, it is differential prediction that they have in mind (contrary to certain popular usage, in which the claim of bias refers to group differences in average scores). The general approach also appears in the fair pay literature. In that context, fairness requires that the formula best predicting pay as a function of legally compensable factors (qualifi- cations, experience, seniority) be the same for all groups. Because of the existence of substantial group differences in average test scores, particularly differences between black and majority-group job applicants, many now find this definition of fairness insufficient, at least as it pertains to allotting employment opportunity. A test may be fair in predicting performance, but nevertheless predict performance rather poorly. When that is so, many able workers will be rejected by the test, including a disproportionately large number of able minority workers. Fairness in Selection, Given Job Performance An alternative approach to fairness focuses not on prediction equa- tions, but on realized job performance (e.g., Darlington, 1971; Cole, 1973~. Selection can be considered "performance fair" if people with a given level of performance on the job have the same distribution of test scores, no matter what population subgroup they belong to. In that case, a rule that selects workers in order of test score will select the same proportion of good workers in each population subgroup. The question asked from this perspective is, Do workers of equal job proficiency in the several groups have the same chance of selection? At first glance, it would seem that if the use of a test is fair in the first sense, it would also be fair in the second. But it is possible to satisfy both definitions of fairness only if prediction of job performance from test score is perfect, or if all groups have the same joint distribution of test score and performance. Neither of these conditions is met in the GATB. Tests are at best only moderately good predictors of job performance. Human performance is far too complex to expect any- thing approaching perfect prediction. One of the consequences of prediction error is that some people who could perform well on the job but who score in the lower ranges on the test are screened out, whereas

OCR for page 251
256 CONCLUSIONS AND RECOMMENDATIONS high o LL cutoff tr A = / / \ group | ~ Black ~ it C B low cutoff high TEST FIGURE 13-1 Effects of imperfect prediction when there are subpopulation differences in average test scores. some others who do well on the test, and hence are selected, will perform inadequately on the job. So long as there are average group differences in test scores-and these are likely to manifest themselves whenever racially or ethnically identifiable subgroups live in circum- stances of comparative disadvantage the ejects of imperfect predic- tion will fall more heavily on these disadvantaged minorities than on other social groups. Figure 13-1 shows why the effects of imperfect prediction fall disproportionately on groups that have lower average test scores than the majority group. It should be remembered, however, that the phenomenon is not the result of some racial or ethnic bias inherent in the test; the impact is the same for all low-scoring individuals, regard- less of group identity. Not only do low scorers have a greater likelihood

OCR for page 251
REFERRAL AND SCOW REPORTING 257 of being erroneously rejected, but high scorers also have a greater likelihood of being erroneously accepted. In the figure the horizontal line labeled "criterion cutoff' distin- guishes adequate from unsatisfactory performance on the job. The vertical line labeled "test cutoff'' represents the score below which no applicant will be selected. Ellipses representing the joint distribution of job and test performance for majority and minority groups are super- imposed, one upon the other. Note that the white group has higher job performance and test scores on average, although there is also a good deal of overlap between the two groups. The intersection of the criterion cutoff and test cutoff creates four sectors: Sector A = successful performance on both test and criterion; Sector B = success- ful test performance, unsuccessful job performance; Sector C = unsuc- cessful performance on both test and criterion; and Sector D = successful job performance and unsuccessful test performance. Sectors B and D represent prediction error. Because the average test and performance scores are higher for the majority group than for the minority group, more of the majority ellipse falls in Sector A (successful performance on both test and criterion). Conversely, more of the minority ellipse falls in Sector C (unsuccessful performance on both test and criterion). Now observe Sectors B and D. A larger segment of the majority ellipse than the minority ellipse can be seen to fall in B. which means that proportionally greater numbers of majority applicants will be selected but will perform unsuccessfully. And a larger segment of the minority ellipse falls in Sector D, which means that minority applicants who could have performed adequately on the job will be screened out in greater numbers. It is the Sector B and D effects that violate the conception of fairness that we have called "performance fair." They occur despite the absence of any predictive bias in the test itself. Richard T. Seymour, representing the Lawyers' Committee for Civil Rights Under Law at a meeting of the committee and its liaison group, made a forceful statement of this view of fairness as a function of performance (Seymour, 1988~. His analysis, which is based on GATE validity data for 47 jobs, illustrates the effects of rejection errors and acceptance errors: many more of the successful black job incumbents in the validity studies would not have been referred had the test scores been the basis of referral; conversely, of the marginal job incumbents (those who received low supervisor ratings), a greater proportion of whites than blacks would have been referred had test scores been used. These effects of prediction error led him to conclude that the GATE produces "an extreme degree of racial unfairness" (Seymour, 19881:

OCR for page 251
258 CONCLUSIONS AND RECOMMENDATIONS The evidence is overwhelming that tests work differently for blacks and for whites, and that they both systematically under-predict black job performance and over-predict white job performance. [Reliance on cognitive ability tests] can only be justified as an affirmative-action program for whites, to ensure that whites are represented in desirable jobs at rates beyond the natural limits of their abilities. As a consequence, he strongly recommends against further use of the VG-GATE) Referral System. Mr. Seymour seems not to acknowledge the two types of fairness analysis we have described when he claims (erroneously) that the GATB underpredicts black job performance and overpredicts white perfor- mance. We must reemphasize the point that the effects he describes are not inherently bound up with race or ethnicity, but rather with high and low scores. Nevertheless, the undoubted effect of imperfect prediction when social groups have different average test scores is to place the greater burden of prediction error on the shoulders of the lower-scoring group. Is this fair? In the final analysis, we think not. But there are complexities to the question that require explication. An Example Comparing Different Concepts of Fairness As a more concrete way of illustrating the effects pictured in Figure 13-1, we present the results of a GATB validity study on carpenters that included 91 whites and 45 blacks. The individuals in the study were already on the job. They took the GATB test and were rated by their supervisor. Arbitrary cutoffs were used to divide the groups into high and low test scorers and high and low performers on the job. The frequency counts showing joint distributions of job and test performance for each group are shown in the table below: Frequency Counts Showing the Joint Distributions of Test Performance and Job Performance for 91 White and 45 Black Workers: Test Performance Whites (N = 91) Blacks (N = 45) Job Performance Fail Pass Fail Pass Good 11 1 60 8 1 8 Poor 11 9 24 5 There are three different ways to convert these frequency counts to percentages, and each presents a different perspective on fairness. The first method evaluates predictive fairness. The raw data are converted to percentages so that the columns sum to 100, as shown in the table below.

OCR for page 251
REFERRAL AND SCORE REPORTING 259 Column Percentages Computed to Elucidate the Conception of Predictive Fairness: Test Performance Whites Blacks Job Performance Fail (pro) Pass (%) Fail (%) Pass (%) Good Poor so 187 50 13 (100) (100) (100) 25 162 75 138 (100) Now we can see that 50 percent of white carpenters (11 of 22) who fail the test do well on the job, whereas only 25 percent of black carpenters (8 of 32) do so. And whereas only 13 percent of whites who pass the test do poorly on the job, the figure for blacks is 38 percent. When analyzed this way, the data reveal that more white test failers than black ones would do satisfactory work if given the chance, and more blacks than whites are passing the test and proving to be unsatisfactory workers. Thus the test overpredicts black job performance and is predictively unfair to whites. The second method of converting the frequency counts illustrates performance fairness. It creates percentages in such a way that the row percentages sum to 100, as shown in the table below. Row Percentages Computed to Elucidate the Conception of Performance-Based Fairness: Test Performance Whites Job Performance Good Poor Blacks Fail (%) Pass (do) Fail (%) Pass (%) 15 ~85 (1005to) 50 ~50 (100~o) 55 1 45 (100%) 83 1 17 (100%) Look first at good workers who fail the test and would therefore never have been referred to the employer had a test-based system been in place (sector D in Figure 13-14. The numbers are 15 percent for white carpenters (11 of 71) and 50 percent for black carpenters (8 of 161. For the poor workers, 45 percent of white workers who are poor performers (9 of 20) pass the test and thus are among those who would have been referred for employment (sector B in Figure 13-1~. By comparison, only 17 percent of blacks (5 of 29) who are poor workers passed the test. Viewed this way, the percentages say that good black workers will be dispropor- tionately screened out in a test-based referral system, and unsatisfactory white workers disproportionately screened in. The test is performance- biased against black workers.

OCR for page 251
260 CONCLUSIONS ED RECOMMENDATIONS There is a third way to look at the frequency data, and that is to compute percentages within each racial group. The effect is to show what the numbers in each cell would be for blacks and for whites if the sample size was 100 for each group, as shown in the table below. Proportional Percentages of White and Black Workers in Each Test Perfo~ance by Job Performance Category: Test Performance Whites Job Performance Blacks Fail (%) Pass (%) Fail (56) Pass (Jo) Good 12 ~66 l8 ~18 Poor 12 1 10 53 1 11 (100%) (100~o) This presentation of the data also tells an important story. First, group differences in test performance and job performance are a reality. Black carpenters score substantially lower on the test, so any system of top-down referral will find proportionally more blacks below the cutoff score than whites, 71 percent compared with 24 percent. Black carpenters also perform poorly on the job in substantially greater proportions, or, put the other way, a larger percentage of whites perform satisfactorily on the job, 78 percent compared with 36 percent of black carpenters. (This numerical demonstration assumes that the supervisor ratings of perfor- mance are themselves valid.) Second, the proportion of correct classifications is reasonably similar for the two groups; 78 percent of white carpenters were correctly classified compared with 71 percent of blacks. But the damaging predic- tion errors fall more heavily on the black carpenters. Of the 36 percent who performed well on the job, 18 percent fully one-half would not have been referred for employment under a straight rank-ordering of applicants. Each way of looking at the data provides insights about the effects of using a test to screen job applicants. Which truth is the most important truth? At this point in our history, it is certain that the use of the GATE without some sort of score adjustments would systematically screen out blacks, some of whom could have performed satisfactorily on the job. Fair test use would seem to require at the very least that the inadequacies of the technology should not fall more heavily on the social groups already burdened by the effects of past and present discrimination.

OCR for page 251
REFERRAL AND SCOW REPORTING 26) EQUITY AND EFFICIENCY: COMPARISON OF FOUR REFERRAL MODELS The question of fair use of the GATE is not one that can be settled by psychometric considerations alone-but neither can referral policy be decided on the basis of equity concerns alone. If there is a strong federal commitment to helping blacks, women, and certain other minority groups move into the economic mainstream, there is also a compelling interest in improving productivity and strengthening the competitive position of the country in the world market. The underlying principle of the VG-GATB system is to make the maximization of performance the basis of the personjob match. It is a productivity-oriented referral procedure that, through the addition of score adjustments, has been made responsive to equal employment opportunity policy. In our interim report, we evaluated six possible referral rules for their effect on estimated job performance and on the proportion of minority- group members who would be referred. In the following discussion we look at four rules, including one new variant, that most clearly illustrate the available policy options. Two of the rules use linear adjustments to minority scores, different for each group, to increase minority referral rates. The four rules presented for consideration are: (1) raw-score, top-down referral; (2) within-group percentile score, top-down referral; (3) performance-based score, top-down referral; and (4) minimum com- petency referral. Raw-score, top-down referral is referral made from the total group of applicants in order of unmodified test score. This rule complements the conception of fairness as lack of differential prediction. If the predicted job performance for a given test score is the same for all population groups, then the set of applicants with highest expected productivity is obtained by referring in order of test score. However, given current average group score differences, the rule would produce substantial adverse impact on the lower-scoring groups. The question that policy makers must ask of the VG-GATB system is whether the gains in expected performance are sufficient to justify this impact. Within-group percentile score, top-down referral is referral in which a percentile score is computed for each applicant by comparing the raw score for that applicant with the scores obtained by a norm group of the same racial or ethnic identity. (Equivalently, a different linear transfor- mation is applied to the raw test score for the different groups so that the mean and the variance of test scores are the same for all groups. In the simplest case, the quantity m is added to each minority score, where m is the difference between majority and minority means.) Referral is made from the total group of applicants in order of modified test score. Given

OCR for page 251
270 CONCLUSIONS AND RECOMMENDATIONS rules of exclusion (the overall minimum correct response rate, 0.40, and the differential correct response rate, 0.15) work at cross purposes, with the result that the procedure will not necessarily reduce the between- group difference in means. This is so because the items with the smallest between-group difference in proportion correct are the very easy and the very difficult items. The minimum 0.40 rule eliminates the difficult items (Linn and Drasgow, 1987; Marco, 1988~. Moreover, even without the minimum 0.40 rule, the reductions in group differences in item scores would not come close to eliminating the degree of adverse impact associated with top-down, total-group selection (Marco, 19881. In other words, if the policy goal is to eliminate adverse impact, the Golden Rule procedure, although also race-conscious, is not nearly as effective as either of the score adjustment strategies discussed above. The Golden Rule procedure s effects on the quality of tests, however, would be detrimental. The construct validity of a test would be altered if items were selected primarily on a basis other than optimal measurement. Moreover, the predictive value of the test would be reduced for majority and minority examiners. Test reliability would also be reduced. Items of middle difficulty and items most closely associated with total score would tend to be eliminated more than easy items. As a result, the reliability of the test might be increased for lower-scoring examiners, but for middle- and high-scoring examiners, the opposite result is more likely (Marco, 19881. We do not see the Golden Rule procedure as a viable alternative for the Department of Labor to consider. For technical and practical reasons it does not rival score adjustment strategies. Moreover, the losses in test validity incurred are not offset by the marginally improved legal attrac- tions it offers. An Alternative Referral Rule From the perspective of fairness to all Employment Service applicants, the major drawback of the two rules that require score adjustments is that white applicants will be referred to employers in somewhat smaller numbers than they otherwise would have been. In other words, increasing the referral rates of racial and ethnic minorities will produce a concomi- tant reduction in the referral chances of some white applicants with higher raw test scores and somewhat greater predicted success on the job. In order to avoid that diminution in the prospects of majority-group applicants while at the same time enhancing the competitive position of minority applicants, the committee recommends the consideration of a referral rule that combines the essential features of both the raw-score,

OCR for page 251
REFERRAL AND SCOW SPORTING 27 ~ top-down and the within-group score, top-down rules. To achieve both kinds of fairness, all applicants who would have been chosen by a straight ranking of unadjusted scores will be referred, and, in addition, all applicants whose adjusted scores qualify them will also be referred. Thus, no job seeker will be denied an opportunity that would have been available under either fairness model. Since the score adjustment is commensurate with the effects on minority groups of imperfect prediction and since no group is greatly damaged by the combined-rules approach, the legal objections raised by the Assistant Attorney General for Civil Rights to the VG-GATB testing program may be assuaged. Although we recommend the Combined Rules Referral Plan to the serious consideration of the Department of Labor and other federal authorities in the fair employment practices area, we cannot claim that it is a panacea for the legal stalemate in which many employers find themselves. It is a compromise and as such may fail to satisfy advocates on either side of the fairness question. Depending on an employer s selection decisions, the total procedure could produce some degree of adverse impact on minority groups, although of far lesser severity than would a referral system based on unadjusted scores. At the same time, majority job seekers could claim that enrichment of the referral pool by definition dilutes their chances for selection. Policy makers at the Department of Labor will need to consider the potential legal risks of this referral strategy just as they do the risks of other referral plans. On a practical level, if there is a burden imposed by the Combined Rules Referral Plan, it is that the local Job Service office must deal with a somewhat larger number of people to fill a job order and the employer must consider more applicants than is absolutely necessary under either rule alone. There is some concern that this necessity might make the strategy impractical for small, low-volume offices. Operationalizing the Combined Rules Referral Plan For illustrative purposes, the plan is presented as it might work in a local office that has a sufficiently large number of otherwise qualified job seekers on hand to allow selectivity. The thrust of the plan is to increase the flexibility of the employer by referring either more high scorers or more minority applicants than would otherwise have been seen. An employer sends a job order for 10 job openings and asks to see 20 applicants. Twenty becomes the base number. The referral group is 'Although we phrase our recommendation in terms of within-group score adjustments, performance-based adjustments could be substituted with virtually identical results. Our slight preference for the within-group strategy is that it is easier to put into practice.

OCR for page 251
272 CONCLUSIONS ED RECOMMENDATIONS TABLE 13-3 Applicants Referred Under Total-Group, Within-Group, and a Combined Rules Referral Plan Percentile Score Referral Method Total- Within- Total- Within- Combined Applicant Race Group Group Score Group Rules 1 W 71 X X X 2 W 65 X X 3 W 63 X X 4 B 60 82 X X X 5 W 58 - 6 W 57 _ 7 W 54 8 B 51 73 X X 9 B 48 70 - X X 10 B 38 60 - NOTE: X = Referred; = Not referred. assembled in two stages. First, a list of all otherwise eligible candidates in the files is compiled on the basis of rank-ordered, total-group scores. The top 20 scorers are identified; they will be placed in the referral group. Second, the same list of candidates is reordered with minority scores converted to within-group percentile scores. Again, the top 20 scorers are identified for placement in the referral group. Thus an applicant is placed in the referral group by having a high total-group percentile score, a high within-group percentile score, or both. There will be a good deal of overlap between the stage-one and stage-two selections, so the total referral group will be less than double the baseline figure. Under the Combined Rules Referral Plan no applicant is excluded who would have been referred if the Employment Service had made the baseline 20 referrals on just total-group or just within-group percentile scores. To illustrate, Table 13-3 describes a situation in which the employer has two job openings and has asked for a referral ratio of 2:1. The baseline referral figure is 4. On the basis of file search there are 10 applicants who meet the employer's initial requirements (education, minimum cutoff score, and so on). The 10 are listed in order of total-group percentile score. A total-group referral procedure would refer the first four candi- dates listed. The within-group method would in this example refer three black applicants, two of whom had lower total-group scores than com- peting majority candidates. With this set of scores, the combined rules would result in a referral group augmented by two for a total of six applicants who will be referred to the employer.

OCR for page 251
REFERRAL AND SCOW SPORTING 273 Not the least of the attractions of the Combined Rules Referral Plan, in the committee's judgment, is that it places responsibility for the compo- sition of the work force with the employer. It gives the employer the flexibility to emphasize predicted performance, racial and ethnic repre- sentativeness, or a combination of these policies according to the job in question, the affirmative action posture of the firm, or other situational factors. The Job Service is not placed in the position of appearing to relieve the employer of these decisions, an implication that some employ- ers seem to have drawn from the VG-GATB system of referral based only on within-group scores. Norm Groups for Within-Group Scoring If any referral plan that incorporates the within-group score adjustment strategy is adopted, USES will need to undertake the construction of more satisfactory norm groups on which to base the score adjustments. In practice, there will be considerable variation in the applicant groups for different jobs in different localities. There is evidence from the data supporting the within-group percentile tables, from employer representa- tives in the committee's liaison group, and from some applicant data obtained by the committee, of noticeable differences between the national norm group currently used by the Employment Service for score conver- sions and applicant groups. Differences in means or standard deviations of the applicant groups from the norm group could cause quite different referral rates and validities of the within-group score for particular jobs. If, for example, an employer set qualifications for a job that are correlated with test score, then the applicants for the job would be expected to have a smaller standard deviation in test score than the norm group, and the differences between majority-group and minority-group mean score would be ex- pected to be lower. The effect of using within-group scoring based on national norms would be to refer minorities in larger fractions than in the applicant pool, and to significantly reduce the validity of the test, because of overestimates of standard deviations. It obviously is not practical for the Employment Service to devise a different additive factor for every job in every locality. But we do recommend that norm groups be developed by job family and, if possible, by smaller, more homogeneous clusters of jobs. In addition, the score adjustment factor should be computed differently than is currently done. Currently the adjustment factor is computed as the difference between the mean scores in a given job family composite of all majority- and minority-group workers in the national norm group. The correct factor is the mean score difference between majority-group and

OCR for page 251
274 CONCLUSIONS AND RECOMMENDATIONS minority-group applicants for the same job, averaged over all jobs. Similarly, standard deviations should be computed for applicants to a particular job, and then averaged over jobs. The current computation does not properly allow for differences between jobs. Suppose, for example, that there are two jobs, and applicants for the jobs scored as follows: Job Minority Majority 1 7 12 1720 1822 2 15 19 19232525 The Employment Service calculation pools the scores for all jobs to obtain a difference of 7 between majority- and minority-group average scores. The difference between average scores for each job is 6. In order to assess the effect of the current within-group referral norm groups on actual jobs, we used 72 jobs from David Synk and David Swarthout's research (U.S. Department of Labor, 1987~. The differences between minority and nonminority mean test scores expressed in majority standard deviations showed wide variation, with a median of 0.85 and quartiles of 0.65 and 1.10. (The quartiles would be 0.74 and 0.96 if the variation were due only to sampling error; thus there is evidence of substantial real variation in the standardized population differences.) We applied the within-group referral rule to the incumbents in each job, with a selection ratio set so that 50 percent of the nonminority workers would be accepted. The median acceptance rate for minority workers was 55 percent. There is thus some evidence that the referral rule accepts minority workers at a slightly higher rate than nonminority workers. However, these are workers on the job, not applicants, and if there were greater differences between mean scores for applicants than for workers, the referral rates for minority and nonminority workers might be about the same. THE PROBLEM OF REPORTING SCORES The general principle that should guide policy on reporting test scores is that the employer and the applicants should be given sufficient information to make correct inferences about a candidate's likely job performance from the test score. This information should include one or more scores, a description of the method of computing the scores, and information about the validity of the test. We have suggested the possibility of using two scores in creating the group of applicants to be referred on a job order, a total-group percentile score and a within-group percentile score. For score reporting purposes we again find merit in a combination of scores because neither the

OCR for page 251
REFERRAL AND SCORE REPORTING 275 total-group nor the within-group percentile score is an entirely satisfac- tory means of communicating information about job applicants. Reporting Within-Group Percentile Scores In the VG-GATB Referral System as it now operates, the Employment Service reports the candidate's within-group percentile score to the employer with an explanation of the scoring method, but without infor- mation about which adjustment, if any, has be made to the score. The within-group percentile scores reported to the employer are potentially misleading. The purpose of the scoring method is to indicate an individual's predicted job performance with reference to other applicants within his or her own ethnic or racial group. But employers may mistakenly infer that two applicants with the same percentile score did equally well on the test, no matter what their racial or ethnic identity. Employers are not given the conversion tables and so have no way of determining the correspondence between scores obtained within different groups. On one hand, this could lead employers to underesti- mate the magnitude of group differences in raw scores (for example, on certain GATE composites a raw score that places an applicant at the 50th percentile among blacks would place an applicant at the 16th percentile among whites). On the other hand, it could lead employers to underestimate the amount of overlap in test scores that exists between the groups. The within-group percentile scores have been reported to applicants without their being informed that the percentile scores are based on different norm groups for different racial and ethnic groups. That practice is deceptive. Reporting Total-Group Percentile Scores Reporting total-group percentile scores is also potentially misleading, because the employer has no information about the levels of job perfor- mance that can be expected from a particular score. It is tempting for the employer to infer that a person at the 16th percentile of whatever norm group on the test score will be at the 16th percentile of the norm group in job performance; Employment Service literature promoting the VG- GATB Referral System indicates that the most able workers within each ethnic group are being referred. But the correspondence between test score percentile and job performance percentile depends on the correla- tion between test score and job performance. For example, if that correlation is .3, a person at the 16th percentile on the test score is expected to be at the 38th percentile on job performance. Finally,

OCR for page 251
276 CONCLUSIONS ED RECOMMENDATIONS TABLE 13-4 Total Group Percentiles and Corresponding Expectancy of Above-Average Job Performance (Test Score and Job Performance Are Jointly Normal with Correlation .3) Percentiles Expectancies of Above-Average Performance (Jo) 2.5 16.0 50.0 84.0 97.5 27 38 50 62 73 providing a score referenced to the total group without qualifying its relevance to a particular job could have a harmful effect on minority applicants, who, on the average, score lower on the GATB. They will appear to be unqualified for the job, but their scores may have only a modest relationship to performance on the job. Expectancy Reporting There are methods of reporting information to employers that directly incorporate the degree of predictability of job performance from test score. One such method uses expectancies specifying the probability that a worker with a given test score will be above average in job performance. Whereas percentile scores show where an applicant is located on the test with reference to all other applicants in the relevant population, an expectancy score tells the likelihood of above-average performance given the validity of the test. The real value of this approach to scoring is that it gives the employer a much more realistic basis for comparing candidates than is possible with raw scores or percentile scores. When a test has only modest validity for predicting job performance, score differences that look enormous when expressed as percentiles are shown to predict a much closer likelihood of above-average performance on the job. Suppose we take the average GATE validity of .3. As Table 13-4 shows, extreme scores on the test distribution correspond to modest scores on the expectancy distribution, reflecting the modest predictability of job performance from test score. Proposed Protocol for Reporting Scores In the committee s judgment, a combination of percentile and expect- ancy scores will provide job applicants and prospective employers with

OCR for page 251
REFERR'4L AND SCOW SPORTING 277 the best picture of the applicant's comparative suitability for the job. Our proposal is that two scores be reported for each applicant: 1. A within-group percentile score with the corresponding norm group identified. 2. An expectancy score (derived from the total-group score) equal to the probability that an applicant Will have above-average job performance. The first score indicates how the applicant fared on the test in comparison with others in the same ethnic or racial group. This informa- tion is particularly useful to employers who are actively working to increase the representation of minority groups in their work force. The second score gives the employer a better means of comparing applicants against the criterion of job performance. And in general it will show applicants and employers alike that low scorers on the test have a reasonable chance of being above-average workers. Examples of such a reporting protocol using a validity of .3 would look as follows: Within-Group Total-Group Expectancy Percentile Computed Score: Chance of Being Name for "Black" Group* Better-Than-Average Worker Grace Birley 16 25 James Jones 50 40 Shelton Pike 84 50 Within-Group Total-Group Expectancy Percentile Computed Score: Chance of Being Name for "Other" Group* Better-Than-Average Worker Nancy Rathouse 16 40 William Cole 50 50 Theresa Brewer 84 60 Within-Group Total-Group Expectancy Percentile Computed Score: Chance of Being Name for "Hispanic" Group* Better-Than-Average Worker . Juan Gomez 16 33 Chester Alverez 50 44 Olivia Gerber 84 56 *GATE subpopulation norms exist for "black," "Hispanic," and "other" groups. CONCLUSIONS Fair Use of the GATB 1. Use of GATE scores in strict top-down, rank-ordered fashion is fair in the sense that a given test score predicts about the same level of job

OCR for page 251
278 CONCLUSIONS ED COMMENDATIONS performance for majority-group and minority-group applicants. However, it would have severe adverse impact on minority job seekers. 2. This adverse effect on minority job seekers cannot be justified on the grounds of efficiency, for at the levels of validity typical of the GATB, the efficiency tosses from adjusting minority scores are slight. 3. Although the GATB does not appear to be inherently biased against minority-group test takers, the undoubted effect of imperfect prediction when social groups have different average test scores is to place the greater burden of measurement error on the shoulders of the lower- scoring group. Since black, Hispanic, and Native American minority groups have lower group means on the GATB, able workers in these groups will experience higher rejection rates than workers having the same level of job performance in the majority group when referral is based on a rank-ordering of all test scores. 4. In the judgment of the committee, fair test use requires at the very least that the inadequacies of the technology should not fall more heavily on the social groups already burdened by the effects of past and present discrimination. 5. The so-called Golden Rule procedure, a strategy for reducing group differences in test scores through the selection of test items, does not appear to be defensible technically and does not provide the intended practical remedy. 6. The committee therefore concludes that, for purposes of referral, equity and productivity will be best served by a policy of adjusting the GATB test scores of black, Hispanic, and Native American job seekers served by the Employment Service system. Referral Rules 7. Raw-score, top-down referral gives the highest expected perfor- mance in the referred group and the lowest proportion of minority-group members referred. At the levels of validity we find for the GATB, this referral method has an adverse impact on minority applicants that is out of all proportion to the productivity gains. 8. Within-group score, top-down referral achieves the highest propor- tions of minority referrals, with slight overall losses in estimated job performance. Given present GATB validities, this score adjustment strategy is an efficient way of referring workers at a given level of job performance in about the same proportion, whatever their racial or ethnic group. 9. Performance-referenced score, top-down referral (adjustments to minority scores based on the predictive validity of the test) produces results virtually identical to within-group score, top-down referral at the

OCR for page 251
REFERRAL AND SCOW SPORTING 279 validities observed for the GATB. It demonstrates similarly slight losses in efficiency and large gains in the proportion of minorities referred. However, this method is responsive to changes in test validities; with high validities, smaller score adjustments would be made and the proportion of minorities referred would be reduced. This may make it legally the more acceptable of the score adjustment strategies. 10. Both score adjustment strategies are race-conscious; both would virtually eliminate the adverse impact of the GATB on black and Hispanic subpopulations, and both adjustments would be commensurate with the far less than perfect relation between the GATB test score and job performance. 11. Minimum competency referral results in significant losses in ex- pected job performance and would still produce markedly unequal referral rates for majority and minority applicants. Reporting Test Scores 12. The test scores reported to employers and job seekers should allow them to make the most accurate possible judgments about likely job performance. 13. Neither the within-group percentile scores currently reported un- der the VG-GATB Referral System nor total-group percentile scores convey sufficient information, and both are potentially misleading. RECOMMENDATIONS If the Department of Labor continues to promote a test-based referral system for filling job orders, we recommend the following alterations to the current VG-GATB Referral Program. Referral Rule 1. The committee recommends the continued use of score adjustments for black and Hispanic applicants in choosing which applicants to refer to an employer, because the elects of imperfect prediction fall more heavily on minority applicants as a group due to their lower mean test scores. We endorse the adoption of score adjustments that give approximately equal chances of referral to able minority applicants and able majority appli- cants: for example, within-group percentile scores, performance-based scores, or other adjustments. Given current GATB validities, such adjustments are necessary to ensure that able black and Hispanic workers will not experience higher rejection rates than workers of the same level of job performance in the

OCR for page 251
280 CONCLUSIONS ED RECOMMENDATIONS majority group. Referral in order of within-group percentile scores is one effective way to balance the dual goals of productivity and racial equity, given the modest levels of GATE validities. Should these validities increase dramatically as testing technology improves, the performance- based rule would warrant consideration. 2. We also recommend that USES study the feasibility of what we call a Combined Rules Referral Plan, under which the referral group is composed of all those who would have been referred by the total-group or by the within-group ranking method. Score Reporting 3. The committee recommends that two scores be reported to employ- ers and applicants: a. A within-group percentile score with the corresponding norm group identified. b. An expectancy score (derived from the total-group percentile score) equal to the probability that an applicant will have above-average job performance. This combination of scores indicates how well an applicant performed on the test with reference to others of the same subpopulation, informa- tion that is useful to employers who are actively seeking to increase the representation of minorities in their work force under an affirmative action program. The expectancy score shows that even low scorers have a reasonable chance of success on the job and will help employers avoid placing totally unwarranted weight on small score differences. Norm Groups 4. If the within-group score adjustment strategy is chosen, we recom- mend that USES undertake research to develop more adequate norming tables. The data on Native Americans is particularly weak, but all of the norming samples are idiosyncratic convenience samples. As a conse- quence, there is reason to doubt that the particular constant factors added to minority scores are the most appropriate ones. 5. An attempt should be made to develop norms for homogeneous groups of jobs, at the least by job family, but if possible by more cohesive clusters of jobs in Job Families IV and V if possible. 6. The adjustment factor that should be computed is the mean score difference between majority-group and minority-group applicants for the same job, averaged over all jobs.