| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 251
Recommendations for Referral and
Score Reporting
A particular charge of the committee is to review the use of within-
group scoring in the VG-GATB Referral System. This method of scoring
transforms raw scores into percentile scores referenced to particular
subpopulations (black, Hispanic, and other). It was adopted to prevent
the test-based referral system from adversely affecting the employment
opportunities of minority applicants. The adjustments made by computing
percentile scores within the specified subpopulations have the effect of
erasing average group differences in reported test performance.
There are several steps in the production of within-group percentile
scores. First, the raw test scores for each applicant are converted into five
job family scores, based on predetermined weightings of the cognitive,
perceptual, and psychomotor composites. Then each of the applicant's
five job family scores is converted to a percentile score, which shows the
applicant's ranking with respect to others in the same ethnic or racial
subgroup on a scale of 1 to 100. That ranking is derived from norm groups
constructed from samples of blacks, Hispanics, and majority-group job
incumbents who took the test in a number of General Aptitude Test
Battery (GATE) validity studies.
In the VG-GATB system, applicants are referred to jobs in order of
their percentile scores, and the scores are reported to employers without
designations of the applicant's group identity. Hence a black applicant
with a Job Family IV within-group score of 70 percent will have the same
referral status as a white ("other") applicant with a within-group score of
70 percent, although their raw scores would be 283 and 327, respectively
251
.
OCR for page 252
252 CONCLUSIONS AND RECOMMENDATIONS
Within-group scoring is without question race-conscious. It is an
example of what some commentators describe as an inclusionary or
benign racial classification, because it was adopted by the U.S. Employ-
ment Service (USES) in order to enrich the employment opportunities of
black and Hispanic job seekers (while at the same time promoting the
overall quality of applicants referred to an employer). Others, chief
among them the former Assistant Attorney General for Civil Rights, Wm.
Bradford Reynolds, view within-group scoring as intentional racial dis-
crimination, an abridgment of the equal protection clause of the Consti-
tution and illegal under Title VII of the Civil Rights Act of 1964.
In its interim report (Wigdor and Hartigan, 1988) the committee
concluded that, as an instrument of public policy, the "within-group
referral procedure is an effective way to balance the conflicting goals of
productivity and racial equity,'' at least as far as the individual employer
is concerned. Nevertheless, the committee refrained from endorsing the
way within-group percentile scores are being used in the VG-GATB
Referral System because of concerns about its legal status, about the
representativeness of the norm groups used in score conversions, and
about potential misunderstanding by employers and applicants in inter-
preting the reported scores. The sole use of group-based percentile
scores, in the absence of any information about the applicant's self-
reported group membership or about the size of the adjustments made to
minority scores, would encourage two kinds of misinterpretation on the
part of employers:
1. The employer could easily assume that all individuals with the same
reported score achieved the same raw score on the GATB.
2. The employer might also be led to assume that all candidates with
the same percentile score on the test would have the same expected
performance on the job.
We could have added a third reservation, for, if the VG-GATB Referral
System became a very important route to employment, policy makers
would have to anticipate that at least some applicants might claim
minority status at the local Job Service office in order to get the benefit of
preferential score adjustments and make no such claim at the workplace,
so that the meaning of the reported score would be interpreted with
reference to the majority group.
Despite these reservations, we conclude this chapter with the recom-
mendation that score adjustments, possibly within-group percentile
score adjustments, continue to play a role, albeit a somewhat different
role, in the VG-GATB Referral System for reasons that emerge from
our technical analyses of GATE data as well as considerations of social
policy.
OCR for page 253
REFERRAL AND SCOW SPORTING 253
The analysis in the committee's interim report was based on theoretical
comparisons of within-group scoring and a number of alternative referral
and reporting options. It was taken as given that referral would be based
on a test in which minority average scores were substantially lower than
majority average scores. The assumptions that allowed the theoretical
comparisons were chosen to match, as best we knew, the circumstances
of Employment Service referrals. The comparisons also depended on
assumptions about the validity of the test and its predictive behavior for
different racial groups.
We are now in a position to look again at alternative score reporting and
referral models, but at this time many of the earlier assumptions can be
replaced by empirical statements. Evidence presented earlier in this
report establishes that the average scores of black Job Service clients are
substantially lower than those of majority clients, although the difference
varies somewhat by job family.
Our earlier assumption that the GATB does not predict differently for
different racial groups needs some qualification in light of the analyses
presented in Chapter 9. There is evidence that the GATB has somewhat
lower correlations with supervisor ratings of job performance for blacks
compared with whites. Nevertheless, the use of a regression equation
based on the combined group of black and nonminority workers would
generally not give predictions that are biased against blacks. Insofar as
the total-group equation gives systematically different predictions, it is
somewhat more likely to overpredict the performance of blacks than to
underpredict. The degree of overprediction is slight at the lower score
ranges, and somewhat larger at higher score levels.
We have now made independent estimates of GATB validities (pre-
sented in Chapter 8), taking account of recent (post-1972) validity studies.
The modest relationship between GATB scores and ratings of perfor-
mance on the job-our estimate is an average corrected validity of .30
with about 90 percent of the jobs studied falling in the range of .20 to .40
is one important factor for policy makers to consider in assessing various
referral alternatives.
PERSPECTIVES ON TEST FAIRNESS
What makes the use of a test fair? Like most Americans, testing
specialists have wrestled with questions of equity and fairness in the
past two decades. A number of models for the fair use of tests have
been proposed in the psychometric literature. The following discussion
of fairness draws on this literature as well as more popular sources to
build a framework for the analysis of score reporting and referral
methods.
OCR for page 254
254 CONCLUSIONS ED RECOMMENDATIONS
To illustrate various perspectives on fairness, we take as given the
conditions that would apply in the proposed VG-GATB system:
1. Applicants meeting other criteria set by the employer will be
referred in order of their scores on the test.
2. The test is modestly predictive of job performance, so expected
performance increases with test score.
3. The applicants represent several population subgroups.
4. There are substantial subgroup differences in average scores on the
test.
When can the use of a test be said to be fair to the various subgroups?
The perspectives offered by psychometricians are derived from quantita-
tive analysis of the joint distributions of group status, test score, and job
performance (as indicated by a criterion measure such as supervisor
ratings of performance). Since only group status and test scores are
known for applicants, information about future job performance must be
extrapolated from validity studies of job incumbents who have taken the
test.
The many definitions of fairness that have grown out of concern about
the use of employment tests can be distilled for our purposes into two
general approaches: fairness in predicting job performance from test
score and fairness in selection, given job performance.
Fairness in Predicting Job Performance from Test Score
It can be argued that selection is fair if the predicted distribution of job
performance for people with a given test score does not vary by
population subgroup. We expect a white person with a test score of 70 to
perform about the same as a black or Hispanic person with a test score of
70. In this conception of fairness, the focus is on prediction and whether
the test predicts differently for different groups. If there is no evidence of
differential prediction by group, then knowing any individual's test score
is sufficient to predict job performance; the employer can make the same
inferences about future job performance for all applicants. If, however, a
test is found to predict differentially (as the GATB appears to for white
and black applicants), then information about group status would be
necessary to make appropriate inferences from test scores.
In this definition, fairness consists of the evenhandedness with which
the test predicts the future job performance of various subgroups. If a
given test score can be associated with the same level of future job
performance for black and white applicants, that is to say, if there is no
predictive bias, then the test is fair and, to the extent that one feels that
selection should be based solely on predicted performance, the selection
OCR for page 255
REFERRAL AND SCORE REPORTING 255
system is fair. Note that this definition of test fairness does not address
group differences in average test scores or the legal problem of adverse
impact.
This definition is the classical one (Cleary, 1968) and the conception of
fairness most widely accepted in the psychometric literature, at least as a
minimum requirement (e.g., Petersen and Novick, 1976; American Edu-
cational Research Association et al., 19851. When testing professionals
refer to test bias, it is differential prediction that they have in mind
(contrary to certain popular usage, in which the claim of bias refers to
group differences in average scores). The general approach also appears
in the fair pay literature. In that context, fairness requires that the formula
best predicting pay as a function of legally compensable factors (qualifi-
cations, experience, seniority) be the same for all groups.
Because of the existence of substantial group differences in average
test scores, particularly differences between black and majority-group
job applicants, many now find this definition of fairness insufficient, at
least as it pertains to allotting employment opportunity. A test may be
fair in predicting performance, but nevertheless predict performance
rather poorly. When that is so, many able workers will be rejected by
the test, including a disproportionately large number of able minority
workers.
Fairness in Selection, Given Job Performance
An alternative approach to fairness focuses not on prediction equa-
tions, but on realized job performance (e.g., Darlington, 1971; Cole,
1973~. Selection can be considered "performance fair" if people with a
given level of performance on the job have the same distribution of test
scores, no matter what population subgroup they belong to. In that case,
a rule that selects workers in order of test score will select the same
proportion of good workers in each population subgroup. The question
asked from this perspective is, Do workers of equal job proficiency in the
several groups have the same chance of selection?
At first glance, it would seem that if the use of a test is fair in the first
sense, it would also be fair in the second. But it is possible to satisfy
both definitions of fairness only if prediction of job performance from
test score is perfect, or if all groups have the same joint distribution of
test score and performance. Neither of these conditions is met in the
GATB. Tests are at best only moderately good predictors of job
performance. Human performance is far too complex to expect any-
thing approaching perfect prediction. One of the consequences of
prediction error is that some people who could perform well on the job
but who score in the lower ranges on the test are screened out, whereas
OCR for page 256
256 CONCLUSIONS AND RECOMMENDATIONS
high
o
LL cutoff
tr
A
=
/ / \ group |
~ Black ~ it
C B
low cutoff high
TEST
FIGURE 13-1 Effects of imperfect prediction when there are subpopulation
differences in average test scores.
some others who do well on the test, and hence are selected, will
perform inadequately on the job. So long as there are average group
differences in test scores-and these are likely to manifest themselves
whenever racially or ethnically identifiable subgroups live in circum-
stances of comparative disadvantage the ejects of imperfect predic-
tion will fall more heavily on these disadvantaged minorities than on
other social groups.
Figure 13-1 shows why the effects of imperfect prediction fall
disproportionately on groups that have lower average test scores than
the majority group. It should be remembered, however, that the
phenomenon is not the result of some racial or ethnic bias inherent in
the test; the impact is the same for all low-scoring individuals, regard-
less of group identity. Not only do low scorers have a greater likelihood
OCR for page 257
REFERRAL AND SCOW REPORTING 257
of being erroneously rejected, but high scorers also have a greater
likelihood of being erroneously accepted.
In the figure the horizontal line labeled "criterion cutoff' distin-
guishes adequate from unsatisfactory performance on the job. The
vertical line labeled "test cutoff'' represents the score below which no
applicant will be selected. Ellipses representing the joint distribution of
job and test performance for majority and minority groups are super-
imposed, one upon the other. Note that the white group has higher job
performance and test scores on average, although there is also a good
deal of overlap between the two groups. The intersection of the
criterion cutoff and test cutoff creates four sectors: Sector A =
successful performance on both test and criterion; Sector B = success-
ful test performance, unsuccessful job performance; Sector C = unsuc-
cessful performance on both test and criterion; and Sector D = successful
job performance and unsuccessful test performance. Sectors B and D
represent prediction error.
Because the average test and performance scores are higher for the
majority group than for the minority group, more of the majority ellipse
falls in Sector A (successful performance on both test and criterion).
Conversely, more of the minority ellipse falls in Sector C (unsuccessful
performance on both test and criterion). Now observe Sectors B and D.
A larger segment of the majority ellipse than the minority ellipse can be
seen to fall in B. which means that proportionally greater numbers of
majority applicants will be selected but will perform unsuccessfully.
And a larger segment of the minority ellipse falls in Sector D, which
means that minority applicants who could have performed adequately
on the job will be screened out in greater numbers. It is the Sector B and
D effects that violate the conception of fairness that we have called
"performance fair." They occur despite the absence of any predictive
bias in the test itself.
Richard T. Seymour, representing the Lawyers' Committee for Civil
Rights Under Law at a meeting of the committee and its liaison group,
made a forceful statement of this view of fairness as a function of
performance (Seymour, 1988~. His analysis, which is based on GATE
validity data for 47 jobs, illustrates the effects of rejection errors and
acceptance errors: many more of the successful black job incumbents in
the validity studies would not have been referred had the test scores been
the basis of referral; conversely, of the marginal job incumbents (those
who received low supervisor ratings), a greater proportion of whites than
blacks would have been referred had test scores been used. These effects
of prediction error led him to conclude that the GATE produces "an
extreme degree of racial unfairness" (Seymour, 19881:
OCR for page 258
258 CONCLUSIONS AND RECOMMENDATIONS
The evidence is overwhelming that tests work differently for blacks and for
whites, and that they both systematically under-predict black job performance
and over-predict white job performance. [Reliance on cognitive ability tests] can
only be justified as an affirmative-action program for whites, to ensure that whites
are represented in desirable jobs at rates beyond the natural limits of their
abilities.
As a consequence, he strongly recommends against further use of the
VG-GATE) Referral System.
Mr. Seymour seems not to acknowledge the two types of fairness
analysis we have described when he claims (erroneously) that the GATB
underpredicts black job performance and overpredicts white perfor-
mance. We must reemphasize the point that the effects he describes are
not inherently bound up with race or ethnicity, but rather with high and
low scores. Nevertheless, the undoubted effect of imperfect prediction
when social groups have different average test scores is to place the
greater burden of prediction error on the shoulders of the lower-scoring
group. Is this fair? In the final analysis, we think not. But there are
complexities to the question that require explication.
An Example Comparing Different Concepts of Fairness
As a more concrete way of illustrating the effects pictured in Figure
13-1, we present the results of a GATB validity study on carpenters that
included 91 whites and 45 blacks. The individuals in the study were
already on the job. They took the GATB test and were rated by their
supervisor. Arbitrary cutoffs were used to divide the groups into high and
low test scorers and high and low performers on the job. The frequency
counts showing joint distributions of job and test performance for each
group are shown in the table below:
Frequency Counts Showing the Joint Distributions of Test Performance and Job
Performance for 91 White and 45 Black Workers:
Test Performance
Whites (N = 91)
Blacks (N = 45)
Job Performance Fail Pass Fail Pass
Good 11 1 60 8 1 8
Poor 11 9 24 5
There are three different ways to convert these frequency counts to
percentages, and each presents a different perspective on fairness. The
first method evaluates predictive fairness. The raw data are converted to
percentages so that the columns sum to 100, as shown in the table below.
OCR for page 259
REFERRAL AND SCORE REPORTING 259
Column Percentages Computed to Elucidate the Conception of Predictive Fairness:
Test Performance
Whites
Blacks
Job Performance Fail (pro) Pass (%) Fail (%) Pass (%)
Good
Poor
so 187
50 13
(100) (100) (100)
25 162
75 138
(100)
Now we can see that 50 percent of white carpenters (11 of 22) who fail
the test do well on the job, whereas only 25 percent of black carpenters
(8 of 32) do so. And whereas only 13 percent of whites who pass the test
do poorly on the job, the figure for blacks is 38 percent. When analyzed
this way, the data reveal that more white test failers than black ones
would do satisfactory work if given the chance, and more blacks than
whites are passing the test and proving to be unsatisfactory workers.
Thus the test overpredicts black job performance and is predictively
unfair to whites.
The second method of converting the frequency counts illustrates
performance fairness. It creates percentages in such a way that the row
percentages sum to 100, as shown in the table below.
Row Percentages Computed to Elucidate the Conception of Performance-Based Fairness:
Test Performance
Whites
Job Performance
Good
Poor
Blacks
Fail (%) Pass (do)
Fail (%) Pass (%)
15 ~85 (1005to) 50 ~50 (100~o)
55 1 45 (100%) 83 1 17 (100%)
Look first at good workers who fail the test and would therefore never
have been referred to the employer had a test-based system been in place
(sector D in Figure 13-14. The numbers are 15 percent for white
carpenters (11 of 71) and 50 percent for black carpenters (8 of 161. For the
poor workers, 45 percent of white workers who are poor performers (9 of
20) pass the test and thus are among those who would have been referred
for employment (sector B in Figure 13-1~. By comparison, only 17
percent of blacks (5 of 29) who are poor workers passed the test. Viewed
this way, the percentages say that good black workers will be dispropor-
tionately screened out in a test-based referral system, and unsatisfactory
white workers disproportionately screened in. The test is performance-
biased against black workers.
OCR for page 260
260 CONCLUSIONS ED RECOMMENDATIONS
There is a third way to look at the frequency data, and that is to
compute percentages within each racial group. The effect is to show what
the numbers in each cell would be for blacks and for whites if the sample
size was 100 for each group, as shown in the table below.
Proportional Percentages of White and Black Workers in Each Test Perfo~ance by Job
Performance Category:
Test Performance
Whites
Job Performance
Blacks
Fail (%) Pass (%)
Fail (56) Pass (Jo)
Good 12 ~66 l8 ~18
Poor 12 1 10 53 1 11
(100%)
(100~o)
This presentation of the data also tells an important story. First, group
differences in test performance and job performance are a reality. Black
carpenters score substantially lower on the test, so any system of
top-down referral will find proportionally more blacks below the cutoff
score than whites, 71 percent compared with 24 percent. Black carpenters
also perform poorly on the job in substantially greater proportions, or, put
the other way, a larger percentage of whites perform satisfactorily on the
job, 78 percent compared with 36 percent of black carpenters. (This
numerical demonstration assumes that the supervisor ratings of perfor-
mance are themselves valid.)
Second, the proportion of correct classifications is reasonably similar
for the two groups; 78 percent of white carpenters were correctly
classified compared with 71 percent of blacks. But the damaging predic-
tion errors fall more heavily on the black carpenters. Of the 36 percent
who performed well on the job, 18 percent fully one-half would not
have been referred for employment under a straight rank-ordering of
applicants.
Each way of looking at the data provides insights about the effects of
using a test to screen job applicants. Which truth is the most important
truth? At this point in our history, it is certain that the use of the GATE
without some sort of score adjustments would systematically screen out
blacks, some of whom could have performed satisfactorily on the job.
Fair test use would seem to require at the very least that the inadequacies
of the technology should not fall more heavily on the social groups
already burdened by the effects of past and present discrimination.
OCR for page 261
REFERRAL AND SCOW REPORTING 26)
EQUITY AND EFFICIENCY:
COMPARISON OF FOUR REFERRAL MODELS
The question of fair use of the GATE is not one that can be settled by
psychometric considerations alone-but neither can referral policy be
decided on the basis of equity concerns alone. If there is a strong federal
commitment to helping blacks, women, and certain other minority groups
move into the economic mainstream, there is also a compelling interest in
improving productivity and strengthening the competitive position of the
country in the world market. The underlying principle of the VG-GATB
system is to make the maximization of performance the basis of the
personjob match. It is a productivity-oriented referral procedure that,
through the addition of score adjustments, has been made responsive to
equal employment opportunity policy.
In our interim report, we evaluated six possible referral rules for their
effect on estimated job performance and on the proportion of minority-
group members who would be referred. In the following discussion we
look at four rules, including one new variant, that most clearly illustrate
the available policy options. Two of the rules use linear adjustments to
minority scores, different for each group, to increase minority referral
rates. The four rules presented for consideration are: (1) raw-score,
top-down referral; (2) within-group percentile score, top-down referral;
(3) performance-based score, top-down referral; and (4) minimum com-
petency referral.
Raw-score, top-down referral is referral made from the total group of
applicants in order of unmodified test score. This rule complements the
conception of fairness as lack of differential prediction. If the predicted
job performance for a given test score is the same for all population
groups, then the set of applicants with highest expected productivity is
obtained by referring in order of test score. However, given current
average group score differences, the rule would produce substantial
adverse impact on the lower-scoring groups. The question that policy
makers must ask of the VG-GATB system is whether the gains in
expected performance are sufficient to justify this impact.
Within-group percentile score, top-down referral is referral in which a
percentile score is computed for each applicant by comparing the raw
score for that applicant with the scores obtained by a norm group of the
same racial or ethnic identity. (Equivalently, a different linear transfor-
mation is applied to the raw test score for the different groups so that the
mean and the variance of test scores are the same for all groups. In the
simplest case, the quantity m is added to each minority score, where m is
the difference between majority and minority means.) Referral is made
from the total group of applicants in order of modified test score. Given
OCR for page 270
270 CONCLUSIONS AND RECOMMENDATIONS
rules of exclusion (the overall minimum correct response rate, 0.40, and
the differential correct response rate, 0.15) work at cross purposes, with
the result that the procedure will not necessarily reduce the between-
group difference in means. This is so because the items with the smallest
between-group difference in proportion correct are the very easy and the
very difficult items. The minimum 0.40 rule eliminates the difficult items
(Linn and Drasgow, 1987; Marco, 1988~. Moreover, even without the
minimum 0.40 rule, the reductions in group differences in item scores
would not come close to eliminating the degree of adverse impact
associated with top-down, total-group selection (Marco, 19881. In other
words, if the policy goal is to eliminate adverse impact, the Golden Rule
procedure, although also race-conscious, is not nearly as effective as
either of the score adjustment strategies discussed above.
The Golden Rule procedure s effects on the quality of tests, however,
would be detrimental. The construct validity of a test would be altered if
items were selected primarily on a basis other than optimal measurement.
Moreover, the predictive value of the test would be reduced for majority
and minority examiners. Test reliability would also be reduced. Items of
middle difficulty and items most closely associated with total score would
tend to be eliminated more than easy items. As a result, the reliability of
the test might be increased for lower-scoring examiners, but for middle-
and high-scoring examiners, the opposite result is more likely (Marco,
19881.
We do not see the Golden Rule procedure as a viable alternative for the
Department of Labor to consider. For technical and practical reasons it
does not rival score adjustment strategies. Moreover, the losses in test
validity incurred are not offset by the marginally improved legal attrac-
tions it offers.
An Alternative Referral Rule
From the perspective of fairness to all Employment Service applicants,
the major drawback of the two rules that require score adjustments is that
white applicants will be referred to employers in somewhat smaller
numbers than they otherwise would have been. In other words, increasing
the referral rates of racial and ethnic minorities will produce a concomi-
tant reduction in the referral chances of some white applicants with higher
raw test scores and somewhat greater predicted success on the job.
In order to avoid that diminution in the prospects of majority-group
applicants while at the same time enhancing the competitive position of
minority applicants, the committee recommends the consideration of a
referral rule that combines the essential features of both the raw-score,
OCR for page 271
REFERRAL AND SCOW SPORTING 27 ~
top-down and the within-group score, top-down rules. To achieve both
kinds of fairness, all applicants who would have been chosen by a straight
ranking of unadjusted scores will be referred, and, in addition, all
applicants whose adjusted scores qualify them will also be referred. Thus,
no job seeker will be denied an opportunity that would have been
available under either fairness model. Since the score adjustment is
commensurate with the effects on minority groups of imperfect prediction
and since no group is greatly damaged by the combined-rules approach,
the legal objections raised by the Assistant Attorney General for Civil
Rights to the VG-GATB testing program may be assuaged.
Although we recommend the Combined Rules Referral Plan to the
serious consideration of the Department of Labor and other federal
authorities in the fair employment practices area, we cannot claim that it
is a panacea for the legal stalemate in which many employers find
themselves. It is a compromise and as such may fail to satisfy advocates
on either side of the fairness question. Depending on an employer s
selection decisions, the total procedure could produce some degree of
adverse impact on minority groups, although of far lesser severity than
would a referral system based on unadjusted scores. At the same time,
majority job seekers could claim that enrichment of the referral pool by
definition dilutes their chances for selection. Policy makers at the
Department of Labor will need to consider the potential legal risks of this
referral strategy just as they do the risks of other referral plans.
On a practical level, if there is a burden imposed by the Combined
Rules Referral Plan, it is that the local Job Service office must deal with
a somewhat larger number of people to fill a job order and the employer
must consider more applicants than is absolutely necessary under either
rule alone. There is some concern that this necessity might make the
strategy impractical for small, low-volume offices.
Operationalizing the Combined Rules Referral Plan
For illustrative purposes, the plan is presented as it might work in a
local office that has a sufficiently large number of otherwise qualified job
seekers on hand to allow selectivity. The thrust of the plan is to increase
the flexibility of the employer by referring either more high scorers or
more minority applicants than would otherwise have been seen.
An employer sends a job order for 10 job openings and asks to see 20
applicants. Twenty becomes the base number. The referral group is
'Although we phrase our recommendation in terms of within-group score adjustments,
performance-based adjustments could be substituted with virtually identical results. Our
slight preference for the within-group strategy is that it is easier to put into practice.
OCR for page 272
272 CONCLUSIONS ED RECOMMENDATIONS
TABLE 13-3 Applicants Referred Under Total-Group, Within-Group,
and a Combined Rules Referral Plan
Percentile Score Referral Method
Total- Within- Total- Within- Combined
Applicant Race Group Group Score Group Rules
1 W 71 X X X
2 W 65 X X
3 W 63 X X
4 B 60 82 X X X
5 W 58 -
6 W 57 _
7 W 54
8 B 51 73 X X
9 B 48 70 - X X
10 B 38 60 -
NOTE: X = Referred; = Not referred.
assembled in two stages. First, a list of all otherwise eligible candidates
in the files is compiled on the basis of rank-ordered, total-group scores.
The top 20 scorers are identified; they will be placed in the referral
group. Second, the same list of candidates is reordered with minority
scores converted to within-group percentile scores. Again, the top 20
scorers are identified for placement in the referral group. Thus an
applicant is placed in the referral group by having a high total-group
percentile score, a high within-group percentile score, or both. There
will be a good deal of overlap between the stage-one and stage-two
selections, so the total referral group will be less than double the
baseline figure.
Under the Combined Rules Referral Plan no applicant is excluded who
would have been referred if the Employment Service had made the
baseline 20 referrals on just total-group or just within-group percentile
scores.
To illustrate, Table 13-3 describes a situation in which the employer
has two job openings and has asked for a referral ratio of 2:1. The baseline
referral figure is 4. On the basis of file search there are 10 applicants who
meet the employer's initial requirements (education, minimum cutoff
score, and so on). The 10 are listed in order of total-group percentile
score. A total-group referral procedure would refer the first four candi-
dates listed. The within-group method would in this example refer three
black applicants, two of whom had lower total-group scores than com-
peting majority candidates. With this set of scores, the combined rules
would result in a referral group augmented by two for a total of six
applicants who will be referred to the employer.
OCR for page 273
REFERRAL AND SCOW SPORTING 273
Not the least of the attractions of the Combined Rules Referral Plan, in
the committee's judgment, is that it places responsibility for the compo-
sition of the work force with the employer. It gives the employer the
flexibility to emphasize predicted performance, racial and ethnic repre-
sentativeness, or a combination of these policies according to the job in
question, the affirmative action posture of the firm, or other situational
factors. The Job Service is not placed in the position of appearing to
relieve the employer of these decisions, an implication that some employ-
ers seem to have drawn from the VG-GATB system of referral based only
on within-group scores.
Norm Groups for Within-Group Scoring
If any referral plan that incorporates the within-group score adjustment
strategy is adopted, USES will need to undertake the construction of
more satisfactory norm groups on which to base the score adjustments. In
practice, there will be considerable variation in the applicant groups for
different jobs in different localities. There is evidence from the data
supporting the within-group percentile tables, from employer representa-
tives in the committee's liaison group, and from some applicant data
obtained by the committee, of noticeable differences between the national
norm group currently used by the Employment Service for score conver-
sions and applicant groups.
Differences in means or standard deviations of the applicant groups
from the norm group could cause quite different referral rates and
validities of the within-group score for particular jobs. If, for example, an
employer set qualifications for a job that are correlated with test score,
then the applicants for the job would be expected to have a smaller
standard deviation in test score than the norm group, and the differences
between majority-group and minority-group mean score would be ex-
pected to be lower. The effect of using within-group scoring based on
national norms would be to refer minorities in larger fractions than in the
applicant pool, and to significantly reduce the validity of the test, because
of overestimates of standard deviations.
It obviously is not practical for the Employment Service to devise a
different additive factor for every job in every locality. But we do
recommend that norm groups be developed by job family and, if possible,
by smaller, more homogeneous clusters of jobs.
In addition, the score adjustment factor should be computed differently
than is currently done. Currently the adjustment factor is computed as the
difference between the mean scores in a given job family composite of all
majority- and minority-group workers in the national norm group. The
correct factor is the mean score difference between majority-group and
OCR for page 274
274 CONCLUSIONS AND RECOMMENDATIONS
minority-group applicants for the same job, averaged over all jobs.
Similarly, standard deviations should be computed for applicants to a
particular job, and then averaged over jobs. The current computation
does not properly allow for differences between jobs.
Suppose, for example, that there are two jobs, and applicants for the
jobs scored as follows:
Job Minority Majority
1 7 12 1720 1822
2 15 19 19232525
The Employment Service calculation pools the scores for all jobs to
obtain a difference of 7 between majority- and minority-group average
scores. The difference between average scores for each job is 6.
In order to assess the effect of the current within-group referral norm
groups on actual jobs, we used 72 jobs from David Synk and David
Swarthout's research (U.S. Department of Labor, 1987~. The differences
between minority and nonminority mean test scores expressed in majority
standard deviations showed wide variation, with a median of 0.85 and
quartiles of 0.65 and 1.10. (The quartiles would be 0.74 and 0.96 if the
variation were due only to sampling error; thus there is evidence of
substantial real variation in the standardized population differences.)
We applied the within-group referral rule to the incumbents in each job,
with a selection ratio set so that 50 percent of the nonminority workers would
be accepted. The median acceptance rate for minority workers was 55
percent. There is thus some evidence that the referral rule accepts minority
workers at a slightly higher rate than nonminority workers. However, these
are workers on the job, not applicants, and if there were greater differences
between mean scores for applicants than for workers, the referral rates for
minority and nonminority workers might be about the same.
THE PROBLEM OF REPORTING SCORES
The general principle that should guide policy on reporting test scores
is that the employer and the applicants should be given sufficient
information to make correct inferences about a candidate's likely job
performance from the test score. This information should include one or
more scores, a description of the method of computing the scores, and
information about the validity of the test.
We have suggested the possibility of using two scores in creating the
group of applicants to be referred on a job order, a total-group percentile
score and a within-group percentile score. For score reporting purposes
we again find merit in a combination of scores because neither the
OCR for page 275
REFERRAL AND SCORE REPORTING 275
total-group nor the within-group percentile score is an entirely satisfac-
tory means of communicating information about job applicants.
Reporting Within-Group Percentile Scores
In the VG-GATB Referral System as it now operates, the Employment
Service reports the candidate's within-group percentile score to the
employer with an explanation of the scoring method, but without infor-
mation about which adjustment, if any, has be made to the score.
The within-group percentile scores reported to the employer are
potentially misleading. The purpose of the scoring method is to indicate
an individual's predicted job performance with reference to other
applicants within his or her own ethnic or racial group. But employers
may mistakenly infer that two applicants with the same percentile score
did equally well on the test, no matter what their racial or ethnic
identity. Employers are not given the conversion tables and so have no
way of determining the correspondence between scores obtained within
different groups. On one hand, this could lead employers to underesti-
mate the magnitude of group differences in raw scores (for example, on
certain GATE composites a raw score that places an applicant at the
50th percentile among blacks would place an applicant at the 16th
percentile among whites). On the other hand, it could lead employers to
underestimate the amount of overlap in test scores that exists between
the groups.
The within-group percentile scores have been reported to applicants
without their being informed that the percentile scores are based on
different norm groups for different racial and ethnic groups. That practice
is deceptive.
Reporting Total-Group Percentile Scores
Reporting total-group percentile scores is also potentially misleading,
because the employer has no information about the levels of job perfor-
mance that can be expected from a particular score. It is tempting for the
employer to infer that a person at the 16th percentile of whatever norm
group on the test score will be at the 16th percentile of the norm group in
job performance; Employment Service literature promoting the VG-
GATB Referral System indicates that the most able workers within each
ethnic group are being referred. But the correspondence between test
score percentile and job performance percentile depends on the correla-
tion between test score and job performance. For example, if that
correlation is .3, a person at the 16th percentile on the test score is
expected to be at the 38th percentile on job performance. Finally,
OCR for page 276
276 CONCLUSIONS ED RECOMMENDATIONS
TABLE 13-4 Total Group Percentiles and Corresponding Expectancy
of Above-Average Job Performance (Test Score and Job Performance
Are Jointly Normal with Correlation .3)
Percentiles
Expectancies of
Above-Average
Performance (Jo)
2.5
16.0
50.0
84.0
97.5
27
38
50
62
73
providing a score referenced to the total group without qualifying its
relevance to a particular job could have a harmful effect on minority
applicants, who, on the average, score lower on the GATB. They will
appear to be unqualified for the job, but their scores may have only a
modest relationship to performance on the job.
Expectancy Reporting
There are methods of reporting information to employers that directly
incorporate the degree of predictability of job performance from test
score. One such method uses expectancies specifying the probability that
a worker with a given test score will be above average in job performance.
Whereas percentile scores show where an applicant is located on the test
with reference to all other applicants in the relevant population, an
expectancy score tells the likelihood of above-average performance given
the validity of the test.
The real value of this approach to scoring is that it gives the employer
a much more realistic basis for comparing candidates than is possible with
raw scores or percentile scores. When a test has only modest validity for
predicting job performance, score differences that look enormous when
expressed as percentiles are shown to predict a much closer likelihood of
above-average performance on the job. Suppose we take the average
GATE validity of .3. As Table 13-4 shows, extreme scores on the test
distribution correspond to modest scores on the expectancy distribution,
reflecting the modest predictability of job performance from test score.
Proposed Protocol for Reporting Scores
In the committee s judgment, a combination of percentile and expect-
ancy scores will provide job applicants and prospective employers with
OCR for page 277
REFERR'4L AND SCOW SPORTING 277
the best picture of the applicant's comparative suitability for the job. Our
proposal is that two scores be reported for each applicant:
1. A within-group percentile score with the corresponding norm group
identified.
2. An expectancy score (derived from the total-group score) equal to the
probability that an applicant Will have above-average job performance.
The first score indicates how the applicant fared on the test in
comparison with others in the same ethnic or racial group. This informa-
tion is particularly useful to employers who are actively working to
increase the representation of minority groups in their work force. The
second score gives the employer a better means of comparing applicants
against the criterion of job performance. And in general it will show
applicants and employers alike that low scorers on the test have a
reasonable chance of being above-average workers.
Examples of such a reporting protocol using a validity of .3 would look
as follows:
Within-Group Total-Group Expectancy
Percentile Computed Score: Chance of Being
Name for "Black" Group* Better-Than-Average Worker
Grace Birley 16 25
James Jones 50 40
Shelton Pike 84 50
Within-Group Total-Group Expectancy
Percentile Computed Score: Chance of Being
Name for "Other" Group* Better-Than-Average Worker
Nancy Rathouse 16 40
William Cole 50 50
Theresa Brewer 84 60
Within-Group Total-Group Expectancy
Percentile Computed Score: Chance of Being
Name for "Hispanic" Group* Better-Than-Average Worker
.
Juan Gomez 16 33
Chester Alverez 50 44
Olivia Gerber 84 56
*GATE subpopulation norms exist for "black," "Hispanic," and "other" groups.
CONCLUSIONS
Fair Use of the GATB
1. Use of GATE scores in strict top-down, rank-ordered fashion is fair
in the sense that a given test score predicts about the same level of job
OCR for page 278
278 CONCLUSIONS ED COMMENDATIONS
performance for majority-group and minority-group applicants. However,
it would have severe adverse impact on minority job seekers.
2. This adverse effect on minority job seekers cannot be justified on
the grounds of efficiency, for at the levels of validity typical of the GATB,
the efficiency tosses from adjusting minority scores are slight.
3. Although the GATB does not appear to be inherently biased against
minority-group test takers, the undoubted effect of imperfect prediction
when social groups have different average test scores is to place the
greater burden of measurement error on the shoulders of the lower-
scoring group. Since black, Hispanic, and Native American minority
groups have lower group means on the GATB, able workers in these
groups will experience higher rejection rates than workers having the
same level of job performance in the majority group when referral is based
on a rank-ordering of all test scores.
4. In the judgment of the committee, fair test use requires at the very
least that the inadequacies of the technology should not fall more heavily
on the social groups already burdened by the effects of past and present
discrimination.
5. The so-called Golden Rule procedure, a strategy for reducing group
differences in test scores through the selection of test items, does not
appear to be defensible technically and does not provide the intended
practical remedy.
6. The committee therefore concludes that, for purposes of referral,
equity and productivity will be best served by a policy of adjusting the
GATB test scores of black, Hispanic, and Native American job seekers
served by the Employment Service system.
Referral Rules
7. Raw-score, top-down referral gives the highest expected perfor-
mance in the referred group and the lowest proportion of minority-group
members referred. At the levels of validity we find for the GATB, this
referral method has an adverse impact on minority applicants that is out
of all proportion to the productivity gains.
8. Within-group score, top-down referral achieves the highest propor-
tions of minority referrals, with slight overall losses in estimated job
performance. Given present GATB validities, this score adjustment
strategy is an efficient way of referring workers at a given level of job
performance in about the same proportion, whatever their racial or ethnic
group.
9. Performance-referenced score, top-down referral (adjustments to
minority scores based on the predictive validity of the test) produces
results virtually identical to within-group score, top-down referral at the
OCR for page 279
REFERRAL AND SCOW SPORTING 279
validities observed for the GATB. It demonstrates similarly slight losses
in efficiency and large gains in the proportion of minorities referred.
However, this method is responsive to changes in test validities; with high
validities, smaller score adjustments would be made and the proportion of
minorities referred would be reduced. This may make it legally the more
acceptable of the score adjustment strategies.
10. Both score adjustment strategies are race-conscious; both would
virtually eliminate the adverse impact of the GATB on black and Hispanic
subpopulations, and both adjustments would be commensurate with the
far less than perfect relation between the GATB test score and job
performance.
11. Minimum competency referral results in significant losses in ex-
pected job performance and would still produce markedly unequal
referral rates for majority and minority applicants.
Reporting Test Scores
12. The test scores reported to employers and job seekers should allow
them to make the most accurate possible judgments about likely job
performance.
13. Neither the within-group percentile scores currently reported un-
der the VG-GATB Referral System nor total-group percentile scores
convey sufficient information, and both are potentially misleading.
RECOMMENDATIONS
If the Department of Labor continues to promote a test-based referral
system for filling job orders, we recommend the following alterations to
the current VG-GATB Referral Program.
Referral Rule
1. The committee recommends the continued use of score adjustments
for black and Hispanic applicants in choosing which applicants to refer to
an employer, because the elects of imperfect prediction fall more heavily
on minority applicants as a group due to their lower mean test scores. We
endorse the adoption of score adjustments that give approximately equal
chances of referral to able minority applicants and able majority appli-
cants: for example, within-group percentile scores, performance-based
scores, or other adjustments.
Given current GATB validities, such adjustments are necessary to
ensure that able black and Hispanic workers will not experience higher
rejection rates than workers of the same level of job performance in the
OCR for page 280
280 CONCLUSIONS ED RECOMMENDATIONS
majority group. Referral in order of within-group percentile scores is one
effective way to balance the dual goals of productivity and racial equity,
given the modest levels of GATE validities. Should these validities
increase dramatically as testing technology improves, the performance-
based rule would warrant consideration.
2. We also recommend that USES study the feasibility of what we call
a Combined Rules Referral Plan, under which the referral group is
composed of all those who would have been referred by the total-group or
by the within-group ranking method.
Score Reporting
3. The committee recommends that two scores be reported to employ-
ers and applicants:
a. A within-group percentile score with the corresponding norm
group identified.
b. An expectancy score (derived from the total-group percentile
score) equal to the probability that an applicant will have
above-average job performance.
This combination of scores indicates how well an applicant performed
on the test with reference to others of the same subpopulation, informa-
tion that is useful to employers who are actively seeking to increase the
representation of minorities in their work force under an affirmative
action program. The expectancy score shows that even low scorers have
a reasonable chance of success on the job and will help employers avoid
placing totally unwarranted weight on small score differences.
Norm Groups
4. If the within-group score adjustment strategy is chosen, we recom-
mend that USES undertake research to develop more adequate norming
tables.
The data on Native Americans is particularly weak, but all of the
norming samples are idiosyncratic convenience samples. As a conse-
quence, there is reason to doubt that the particular constant factors added
to minority scores are the most appropriate ones.
5. An attempt should be made to develop norms for homogeneous
groups of jobs, at the least by job family, but if possible by more cohesive
clusters of jobs in Job Families IV and V if possible.
6. The adjustment factor that should be computed is the mean score
difference between majority-group and minority-group applicants for the
same job, averaged over all jobs.
Representative terms from entire chapter:
test score