| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 134
7
Validity Generalization Applied to
the GATB
CRITERION-RELATED VALIDITY RESEARCH AND
VALIDITY GENERALIZATION
Since the 1940s the U.S. Employment Service (USES) has conducted
some 750 criterion-related validity studies of the General Aptitude Test
Battery (GATB). The great majority of the studies used supervisor ratings
as the criterion, although a sizable minority of studies were conducted
using training criteria. The purpose of these studies was to develop
Specific Aptitude Test Batteries (SATBs) for specific jobs. SATBs consist
of a subset (2 to 4 aptitudes) of the GATB with associated cutoff scores
that best differentiate the good from the poor workers. Applicants whose
scores on the chosen aptitudes exceeded the cutoff scores would be
regarded as qualified to do the job.
Events following the passage of the Civil Rights Act in 1964 mandated
increased emphasis on investigations of GATB test fairness for minori-
ties. In 1967, USES initiated an effort to validate its tests for minorities.
Jobs studied for SATE development tended to be those with large
numbers of workers, in part because sufficiently large samples are easier
to obtain in populous occupations. The minimum sample size acceptable
was 50, small for the statistical task of validating prediction of perfor-
mance from test scores, but large in light of the difficulty of finding
cooperative employers who have 50 workers in a single job. Some SATE
samples, particularly in apprenticeable occupations, were considerably
larger, although they often came from multiple establishments. Although
134
OCR for page 135
VALIDI~ GENE~IZATION APPLIED TO THE GATB ~ 35
the larger sample sizes were desirable, the comparability of the pooled
establishments is not known.
As stated in a USES memorandum to the committee describing its
testing program, by 1980, USES believed the GATB testing program to
be at a crossroads. There were now over 450 SATBs covering over 500
occupations. But there are over 12,000 jobs in the Dictionary of
Occupational Titles (DOT). The extraordinary difficulty of validating
SATBs on minorities, because of small sample sizes, precluded increas-
ing the number of occupations covered by more than two to five a year.
Even with the best methods of sample search and data collection and
analysis, it was clear that developing and validating test batteries for
each of the 12,000 occupations was a practical impossibility. Moreover,
the technology used in SATBs, requiring both selection of aptitudes and
estimation of multiple cutoffs, had been identified as obsolete, techni-
cally deficient, and premised on incorrect assumptions by outside
professional experts (see, e.g., Buros s Seventh Mental Measurements
Yearbook, 19721.
At about the same time, the methodology of meta-analysis was receiv-
ing attention in mainstream psychology. USES staff saw possibilities in
the work of John Hunter and Frank Schmidt, who were among the leaders
in developing validity generalization, a variant of meta-analysis applied to
validity coefficients, for use in personnel and industrial psychology.
The working assumption of industrial psychology prior to the late 1970s
was that the sizable observed variation in validity coefficients from one
criterion-related validity study to the next, even in apparently similar
situations, was a reflection of reality. That is, validity was thought to be
situation-specific, the sizes of validity coefficients being influenced by
subtle, undetected differences across different workplaces. Schmidt and
Hunter argued to the contrary, saying that most validation research has
been done with small samples and most studies have inadequate statistical
power to demonstrate the statistical significance of their results. The
observed variation in validities, they proposed, is due to statistical
artifacts, primarily sampling error, rather than to true differences in
validity from one situation to another.
The VG-GATB Referral System is supported by a series of USES test
research reports written by Hunter (U.S. Department of Labor,
1983b,c,d,e). Hunter analyzed the existing validity data base for the
GATB, which at the time of his analysis consisted of reports of 515
validity studies carried out by the U.S. Employment Service and coop-
erating state employment services over the period 1945-1980.
His analysis may be divided into three parts. First, in The Dimen-
sionality of the General Aptitude Test Battery (U.S. Department of
Labor, 1983b), he argues that it is unnecessary to use all nine compo
OCR for page 136
~36 GATB VALIDITIES AND VALIDI~GENE^~IZATION
nent aptitudes of the GATB in predicting job performance and that it is
sufficient to use two composites, called cognitive ability and psycho-
motor ability, in making predictions. Second, Test Validation for 12,000
Jobs (U.S. Department of Labor, 1983c) constructs a classification of
all jobs into five job families based on the data and things scale used in
the DOT code for each job. A different weighting of cognitive and
psychomotor ability, for prediction of job performance, is to be used
within each job family. And finally, the same report generalizes the
validities for the GATB studies within each job family to all jobs in the
job family.
In this chapter we review the first two parts of Hunter's plan:
dimension reduction and job classification. The next chapter presents
Hunter's validity generalization analysis of the 515 GATB studies and
compares his results with 264 more recent studies that suggest somewhat
different ranges of validities than the earlier studies.
REDUCTION OF NINE APTITUDES TO COGNITIVE AND
PSYCHOMOTOR FACTORS
The intention of the original GATB validity research program was to
identify, for each job studied, a combination of specific aptitudes and
minimum levels for those aptitudes, that an applicant should attain before
being referred to a job; these are the so-called SATBs prepared for each
job.
There are too many jobs in the U.S. economy, and too many new jobs
being created, for the GATB research program ever to hope to cover
more than a small fraction of them. Two kinds of problems stand in the
way. First, it is not immediately clear that a validity study done for a
particular job title in a particular plant is applicable to the same job title
in another plant; the same duties in the job description may be
performed in quite different working environments by different groups
of workers. Thus some mechanism must be discovered for generalizing
the validity results for jobs studied to jobs not studied, if the research
is to be useful.
Second, the statistical base for a single job, consisting usually of a
sample of fewer than 100 workers, is not by itself adequate to carry out
the complex estimation involved in identifying three or four of the nine
GATB aptitudes as relevant to the job and selecting minimum compe-
tency levels for the aptitudes. A good dose of job analyst's judgment must
be used in selecting and calibrating the aptitudes, since the data available
do not provide a sufficient basis for decision. Again, we wish to increase
the statistical strength of conclusions by making some sensible combina-
tion of data for different jobs.
OCR for page 137
VALIDITY GENE~~=TION SPEWED TO THE GATB ~ 37
GATB Dimensions
Faced with the need to generalize validity results from the 491 jobs
represented in the 515 studies to the other jobs in the economy, faced also
with the problem of small sample sizes that plagues the SATB approach,
Hunter's strategy (U.S. Department of Labor, 1983b) is both to reduce
the number of variables relevant to predicting job performance, and to
assume that the same prediction equations will apply across broad classes
of jobs, so that all data for jobs in the same class may be combined in
estimating the equation.
In developing his own position, Hunter describes two theories of job
performance the specific aptitude theory and the general ability theory.
Traditional thinking in the GATB program was that job performance
would be best predicted by the specific aptitude or aptitudes measured by
the SATB and required by the job. For example, performance as a
bookkeeper would be better predicted by the numerical aptitude than by
general cognitive ability, and performance as an editor would be better
predicted by the verbal aptitude than general cognitive ability. In this
view, general intelligence has only an indirect relation to job perfor-
mance; it is mediated by specific aptitudes.
The other position, which was the dominant view early in the twentieth
century and is currently enjoying renewed popularity, is that one general
cognitive ability, commonly called intelligence, underlies the specific
abilities a person develops in school, at play, and on the job. In this view,
the validities of the SATBs that were demonstrated in 40 years of research
would be the effect of joint causation by a common prior variable, the
underlying general cognitive ability. Hunter's analysis of the dimension-
ality of the GATB brings him to a variant of the general ability interpre-
tation.
Hunter argues that, contrary to the SATB analyses, multiple regression
techniques should be used in predicting job performance from the nine
GATB aptitudes, because the nine are strongly intercorrelated (Table
7-1~. However, the correlations between aptitudes, which must be known
in order to apply multiple regression, are only poorly estimated in any one
study, and a full multiple regression determining specific weights for each
aptitude cannot estimate the weights accurately enough. On the basis of
an analysis of the covariation of aptitudes across jobs, he proposes that
the nine specific aptitudes fall into three categories of general abilities:
cognitive, perceptual, and psychomotor. Although the cognitive and
psychomotor abilities are only moderately correlated with one another,
both are highly correlated with the perceptual composite (Table 7-2~. As
a consequence of this overlap, Hunter says that the perceptual composite
will add little to the predictive power of the GATB; the nine GATB
OCR for page 138
~ 353 GATB VALIDITIES AND VALIDITY GENERALIZATION
TABLE 7-1 Correlations Between Aptitudes Based on 23,428 Worker
and Aptitude Reliabilities (Decimals Omitted)
G V N S P Q K F M
Intelligence (G) 100
Verbal aptitude (V) 84 100
Numerical aptitude (N) 86 67 100
Spatial aptitude (S) 74 46 51 100
Form perception (P) 61 47 58 59 100
Clerical perception (Q) 64 62 66 39 65 100
Motor coordination (K) 36 37 41 20 45 51 100
Finger dexterity (F) 25 17 24 29 42 32 37 100
Manual dexterity (M) 19 10 21 21 37 26 46 52 100
Reliability 88 85 83 8' 79 75 86 76 77
SOURCE: U.S. Department of Labor. 1983. The Dimensionality of the General
Aptitude Test Battery (GA TB) and the Dominance of General Factors Over Specific Factors
in the Prediction of Job Performance for the U.S. Employment Service. USES Test
Research Report No. 44. Division of Counseling and Test Development, Employment and
Training Administration. Washington, D.C.: U.S. Department of Labor, p. 18.
aptitudes may be satisfactorily replaced by just two composite aptitudes:
cognitive ability, composed of general intelligence, verbal ability, and
numerical ability; and psychomotor ability, composed of motor coordi-
nation, finger dexterity, and manual dexterity. (It should be noted that the
general intelligence variable is the sum of verbal aptitude, spatial apti-
tude, and numerical aptitude with the computation test score removed; it
is not measured independently of the others.) Predicting performance for
a particular job thus can be reduced to appropriately weighting cognitive
ability and psychomotor ability in a combined score for predicting
performance, a much simpler task than assessing the relative weights of
nine aptitudes.
TABLE 7-2 Correlations Between Composites (Decimals Omitted)
GVN SPQ
KFM
Cognitive composite (GVN) 100 76 35
Perceptual composite (SPQ) 76 100 51
Psychomotor composite (KFM) 35 51 100
SOURCE: U.S. Department of Labor. 1983. The Dimensionality of the General
Aptitude Test Battery (GA TB) and the Dominance of General Factors Over Specific Factors
in the Prediction of Job Performance for the U.S. Employment Service. USES Test
Research Report No. 44. Division of Counseling and Test Development, Employment and
Training Administration. Washington, D.C.: U.S. Department of Labor, p. 22.
OCR for page 139
VALIDITY GENE^~IZATION~PLlED TO THE GATE 139
What Gets Lost in the Simplifying Process?
One obvious question to ask is whether the power of the GATE to
predict for different kinds of jobs, that is, its usefulness in classifying
applicants, is diminished by this broad-brush approach. A number of
experts have commented to the committee (e.g., Lee J. Cronbach, letter
dated July 6, 1988) on the exclusion of the perceptual composite. Hunter
argues that the perceptual ability composite (S + P + Q) could be
predicted essentially perfectly from the cognitive (G + V + N) and
psychomotor composites (K + F + M)- if the composites were perfectly
measured. With the actual composites, the multiple correlation for
predicting SPQ from GVN and KFM is .80 and the perceptual composite
is dropped from all but Job Family 1. But part of the reason that GVN and
SPQ are so highly correlated is that the spatial factor S is included in both
G and SPQ.
A more general observation is that the composites do not predict the
specific aptitudes very accurately, even after adjusting for less than
perfect reliability.' The question remains whether the specific aptitudes
need to be included with separate weights in the regression equations for
job performance, or whether the effect of each specific aptitude is
captured sufficiently well by including the corresponding composite in the
equations predicting job performance. If the latter holds, the task of
setting aptitude weights for jobs is much simplified.
In building the case, Hunter proposes that validities of aptitudes for
jobs are constant for aptitudes in the same composite, so that it is
appropriate to use only the composites and not the separate aptitudes in
predicting performance. Thus the V and N aptitudes might have validities
.25 .25 for one job, .20 .20 for another job, .30 .30 for another job. (The
G aptitude must be treated differently.) If this is so, then the correlation
between such validities over jobs would be 1. He therefore considers the
correlations between aptitude validities over jobs (Table 7-31.
The reliability measure in Table 7-3 is based on the sampling error in
estimating validities for individual studies. Since the average sample size
is 75, a sample validity differs from a true validity by an error with
variance approximately .013. The variance of sample validities over all
studies is about .026. Thus the variance of true validities over studies is
about .013. One way to compute reliability is the ratio of variance of true
'The reliability of a measurement is the correlation between repeated measurements of
the same individual, so, for example, if the reliability were 1.0, repeated measurements
would be exactly the same. If two variables are not reliably measured, the correlation
between them will be lower than that between perfect measurements and may be increased
by correcting for unreliability. Note that the same correction does not apply to correlations
with intelligence, however, because it is not independently measured.
OCR for page 140
~40 GATB VALIDITIES AND VALIDITY GENERALIZATION
TABLE 7-3 Correlations Between Validities Over 515 Jobs
(Decimals Omitted)
V N S P Q K F M
Intelligence (G)
Verbal aptitude (V)
Numerical aptitude (N)
Spatial aptitude (S)
Form perception (P)
Clerical perception (Q)
Motor coordination (K)
Finger dexterity (F)
Manual dexterity (M)
Reliability
100
80 100
81
67
45
57
19
9
-2
54
32
30
54
16
-7
100
40 100
48 53 100
30 57 100
8 41 40 100
26 45 23 46 100
14 36 19 56 62 100
47 46 44 45 53 52
SOURCE: U.S. Department of Labor. 1983. The Dimer~sionality of the General
Aptitude Test Battery (GA TB) and the Dominance of General Factors Over Specific Factors
in the Prediction of Job Performance for the U.S. Employment Service. USES Test
Research Report No. 44. Division of Counseling and Test Development, Employment and
Training Administration. Washington, D.C.: U.S. Department of Labor, p. 32.
24
15
9
47 47
validities to the variance of measured validities, which would be about .5
here.
Hunter suggests that the above table of correlations between validities
supports his `'general ability theory," which would predict correlations of
I between specific aptitudes in the same general ability group. He adjusts
the given correlations by the reliability correction, which increases the
within-block correlations to an average value of 1.09.
This is inaccurate, however. The standard reliability correction is
inappropriate here because the errors in measuring different validity
coefficients are correlated. Thus if the sample validity for form perception
is higher than the true validity, then the sample validity for clerical
perception is likely to be higher than the true validity for that sample.
When the correlation between sample validities for form perception and
clerical perception is computed across studies, it will tend to be positive
simply because form perception and clerical perception are positively
correlated.
Suppose for example that there were no variations in true validities
between jobs. The true variance of validities would be zero. The
correlation matrix of sample validities would then be approximately the
same as the original correlation matrix between variables, because of
correlated sampling errors.
At the other extreme, suppose the sample sizes were very large so that
the sampling variance of validities was zero. Then the correlation matrix
between sample validities would be the correlation matrix between true
validities.
OCR for page 141
VALIDITY GENERALIZATION APPLIED TO THE GATE ~4~
TABLE 7-4 Estimated Correlations Between True Job Validities
(Decimals Omitted)
G V N S
P Q K F M
Intelligence (G) 100
Verbal aptitude (V) 76 100
Numerical aptitude (N) 76 55100
Spatial aptitude (S) 60 1829 100
Form perception (P) 29 1338 47 100
Clerical perception (Q) 50 4660 21 49 100
Motor coordination (K) 2 -57 -12 37 29 100
Finger dexterity (F) -7 -156 23 48 14 54 100
Manual dexterity (M) -23 -24-3 7 37 12 66 72 100
-
NOTE: Each entry is estimated by multiplying by 2 the corresponding entry in Table
7-3 and subtracting the corresponding entry in Table 7-1. A slightly more accurate estimate
would subtract from each correlation the product of the average validities of the variables,
which will be about .04.
In the present case, taking about half the variance in true validities and
half the variance in the sampling error, as in the Hunter analysis, suggests
(after complex computations) that the correlation of observed validities is
about half the correlation of the true validities plus half the correlation
between the variables. This produces an estimated matrix of correlations
between true job validities (Table 7-4), which is quite different from
Hunter's matrix using the standard correction for reliability.
If this is the way the true validities covary, then we can expect to find
jobs with many different weightings appropriate for specific aptitudes. If
cognitive ability and psychomotor ability were sufficient to predict job
performance, then we would expect to be able to predict accurately the
validities of all aptitudes for a given job by knowing the validities for these
two composites. It is evident that the accepted composites do not predict
the validity of individual aptitudes at all accurately. The perceptual
aptitudes are not well predicted by the two composites, so that there must
be many jobs in which they would have useful validities.
Since G is composed of a mixture of cognitive and perceptual aptitudes,
let us look at the eight independently measured aptitudes. How should
they be combined so that the combined aptitudes are sufficient for use in
prediction equations? The highly correlated groups are VNQ, SP, and
KFM. Composites based on these variables would predict validities for all
variables reasonably well, and the correlations between the validities of
the composites would be relatively small. These would be useful com-
posites for classifying jobs into different groups within which different
prediction equations might apply. It is interesting to note that GVN and
KFM have negative correlations in Table 7-4, so that jobs for which GVN
OCR for page 142
)42 GATBVALIDITIES~DVALIDI~GENE^~IZATION
has high validity tend to be jobs for which KFM has low validity and vice
versa.
Hunter and Schmidt (1982) consider models in which economic gains
from job matching are obtained by using spatial aptitude and perceptual
ability in addition to general cognitive ability. We offer this as further
evidence that the SP composite might be of value.
Although it is convenient and simplifying to consider only cognitive and
psychomotor ability in predicting job performance, the analysis support-
ing this reduction is flawed. The estimated correlations of true validities
suggest that different relative weights for specific aptitudes might signif-
icantly improve prediction of job performance.
In developing prediction equations for a specific job, it is not at all
necessary to use only the data available for that job. We know the overall
correlations between specific aptitudes. We have an estimate of joint
distribution of true validities. These collective data may be combined with
specific data available for the job to develop regression equations predict-
ing performance on the job. For jobs with no direct validity data, we
would still need indicators of the specific aptitude validities for the job,
such as provided by the five job families for Hunter's two-composite
model.
The cognitive ability composite is defined as G + V + N. where G has
already been defined as the sum of test scores on vocabulary, arithmetic
reasoning, and three-dimensional space. Thus G already includes terms
for verbal aptitude, numerical aptitude, and spatial aptitude. In terms of
original standardized test scores, GVN is approximately
three-dimensional space + 3 x vocabulary
+ 3 x arithmetic reasoning
+ 2 x computation.
These weights have developed as a historical accident, caused by the
definition of G first and GVN second. Are these the correct variables to
include in the cognitive factor? The correlations between aptitudes
suggest that clerical perception, being highly correlated with verbal and
numerical aptitude, might be sensibly included in a cognitive factor, and
indeed this is suggested in the factor analyses of Fozard et al. (1972) and
also by the pattern of estimated correlations of true validities (Table 7-4~.
If only two composites are to be used, one for cognitive ability and one for
psychomotor ability, it is necessary to establish weights for the specific
aptitudes in the composites. Since the aptitudes are highly correlated, it
does not make too much difference which weights are used, but one
would like to use weights that have some justification.
The case for rejecting the SPQ composite, because it is predicted by the
other two composites with correlation .80, is weak. It is a mathematical
OCR for page 143
VALIDITY GENERATION SPORED TO THE GATE ]43
truism that if several variables are highly correlated, then linear combi-
nations of some of the variables will predict other linear combinations
with high correlation. The question is whether the SPQ composite adds
usefully to the prediction of job performance, and it is known that it does
in some jobs. For the same reason, the case for rejecting specific aptitudes
is weak. Not enough is known about predicting job performance to
conclude quickly that two composites alone are sufficient, however
convenient it is to work with only two variables in classifying jobs and
constructing regression equations.
THE FIVE JOB FAMILIES
The question remains, what is the appropriate predictor for a job not
previously studied? There would be no issue if cognitive ability alone
were useful in predicting performance validity might vary from one job
to another, but, for every job, applicants would be referred in order of
their cognitive score. But if two factors (or several factors) are to be used,
their relative weight must be decided in each job.
Constructing the Five Job Families
Hunter divides all 12,000 jobs in the Dictionary of Occupational Titles
into five job families (U.S. Department of Labor, 1983c), and a different
weighting of the two abilities is proposed for predicting job performance
within each job family. Before deciding on the specifics of the clustering
techniques, he examined five different classification schemes for their
effectiveness in predicting cognitive and psychomotor validities; each
scheme uses attributes available for any job:
1. the test development analyst s judgments;
2. the mean aptitude requirements listed for each job in the Dictionary
of Occupational Titles;
3. a five-level job complexity scale based on the DOT data-people-
things scale, organized from 1 to 5 in descending order of complexity;
4. predictors from the Position Analysis Questionnaire (PAQ) (McCor-
mick et al., 19721; and
5. the Occupational Analysis Pattern (OAP) structure developed by
R.C. Droege and R. Boese (U.S. Department of Labor, 1979, 19804.
All five classification schemes were reported to perform about equally
well in predicting observed validity with correlation .30, although Hunter
notes that both PAQ and OAP offer some potential improvements over
the data-people-things job complexity classification. However, since the
data-people-things classification is available for all jobs through the
OCR for page 144
|44 GATE VALIDITIES AND VA~DI7Y GENERALIZATION
Dictionary of Occupational Titles, that classification was used in validity
generalization from the GATE validity studies. The five job families used
in the VG-GATB Referral System are therefore the five complexity-based
families of the data-people-things classification, with one important
difference: the order in which they are numbered does not reflect
complexity.
Sample Jobs in the Job Families:
Family I-set-up/precision work: machinist; cabinet maker; metal fabri-
cator; loom fixer
Family II feeding/offbearing: shrimp picker; cornhusking machine op-
erator; cannery worker; spot welder
Family III synthesize/coordinate: retail food manager; fish and game
warden; biologist; city circulation manager
Family IV-analyze/compile/compute: automobile mechanic; radiologi-
cal technician; automotive parts counterman; high school teacher
Family V~opy/compare: assembler; insulating machine operator; fork-
lift truck operator
For the mean observed validities for job complexity categories, see Table
7-5.
The final step in the classification system was the development of
regression equations that predict job performance as a function of the
cognitive, perceptual, and psychomotor composites within each job
family (Table 7-61. (There are different recommended equations for
training success, but these apply to a small fraction of jobs and applicants
only.) It will be noted that the recommended regression equations differ
somewhat from the equations computed for the observed validities. The
TABLE 7-5 Mean Observed Validities for Job Complexity Categories,
and Beta-Weights of GVN, SPQ, and KFM in Predicting Job
Performance for Jobs Within Each Category (Decimals Omitted)
Validities Beta-Weights Num
Job her of
Family Complexity Levels GVN SPQ KFM GVN SPQ KFM r Jobs
I 1. Setup 34 35 19 18 20 3 37 21
III 2. Synthesize/coordinate 30 21 13 34 -7 5 31 60
IV 3. Analyze/compile/compute 28 27 24 21 3 15 32 205
V 4. Copy/compare 22 24 30 9 5 25 33 209
II 5. Feeding/of~bearing 13 15 35 5 -6 37 36 20
SOURCE: U.S. Department of Labor. 1983. Test Validation for 12,000 Jobs: An
Application of Job Classification and Validity Generalization Analysis to the General
Aptitude Test Battery. USES Test Research Report No. 45. Division of Counseling and Test
Development, Employment and Training Administration. Washington, D.C.: U.S. Depart-
ment of Labor, p. 21.
OCR for page 145
VA~DI7Y GENERALIZATION APPLIED TO THE GATE 145
TABLE 7-6 Recommended Regression Equations for Predicting Job
Performance (JP)
Job Complexity Multiple
Family Level Regression Equation Correlation
1 1 JP=.40 GVN + .19 SPQ + .07 KFM .59
III 2 JP=.58 GVN .58
IV 3 JP=.45 GVN + .16 KFM .53
V 4 JP=.28 GVN + .33 KFM .50
II 5 JP=.07 GVN + .46 KFM .49
SOURCE: U.S. Department of Labor. 1983. Test Validation for 12,000 Jobs: An
Application of Job Classification and Validity Generalization Analysis to the General
Aptitude Test Battery. USES Test Research Report No. 45. Division of Counseling and Test
Development, Employment and Training Administration. Washington, D.C.: U.S. Depart
ment of Labor, p. 39.
new equations are computed from validities corrected for restriction of
range (in the worker populations studied compared with the applicant
populations for whom the predictions will be made) and for reliability of
supervisor ratings.
The effect of these corrections is to increase the multiple correlation
that indicates the accuracy of the prediction by about 65 percent. Since
GVN has greater restriction of range than KFM, the corrections tend to
increase the estimated GVN validities more, and so give greater weight to
GVN in the regression equations.
Do the Five Job Families Electively Increase Predictability?
The majority of the GATE studies (84 percent of workers studied for
job performance) fall into job complexity categories 3 and 4, which
correspond to the Job Families IV and V in the eventual VG-GATB
referral protocol. And indeed, about the same proportion of Job Service
applicants apply for jobs in those categories. From Table 7-3, the
correlation between GVN and KFM is .35. This means that the correla-
tion between the predictor of success for Job Family IV and the predictor
of success for Job Family V is .93. If we used a single predictor, say 2
GVN + KFM, it would have correlation greater than .96 with both these
predictors. Thus the ordering of applicants by the score 2 GVN + KFM
would be almost indistinguishable from the orderings by the different
predictors for Job Families IV and V, and would have correlation at least
.93 with the predictors in all job families except Job Family II (complexity
level 5), which contains only 5 percent of the jobs.
We conclude that the job complexity classification based on data and
things fails to yield classes of jobs within which prediction of job perfor
OCR for page 146
)46 GATE VALIDITIES AND VALIDITY GENERALIZATION
mance is usefully advanced by weighting the composites GVN and SPQ
and KFM separately. The only class that justified different weighting was
the small class of low-complexity jobs that included only 5 percent of the
workers. For all the rest of the jobs we would have effectively the same
predictive accuracy, and effectively the same order of referral of workers, 2
by using the single weighting 2 GVN + KFM.
Prediction of performance from a single factor would be expected by
the proponents of Spearman~s g, a single numerical measure of intelli-
gence. A recent issue of the Journal of Vocational Behavior (vol. 31,
1986) is devoted to the role of g in predictions of all kinds. The general
argument offered is that g does just as well as specialized test batteries
developed, following Hull s (1928) prescription, by multiple regression.
For example, Hunter (1986) argues that the specialized test batteries
developed by the military for different groups of jobs (mechanical,
electronic, skilled services, and clerical) predict performance no better
in the category they were developed for than in other categories, and no
better than g in any category. Thorndike (1986) argues that specialized
batteries developed for optimal prediction on a set of people show
marked drops in validity when cross-validated against other groups of
people, and that a general predictor g is to be preferred unless the
regression weights are based on large groups. Jensen (1986) asserts that
practical predictive validity of psychometric tests is mainly dependent
on their g-loading, although he concedes that clerical speed and
accuracy and spatial visualization add a significant increment to the
predictive validity of the GATE for certain clerical and skilled blue
collar occupations.
We, for our part, remain unconvinced by the USES analysis that finer
differentiation is not possible. We do acknowledge that the development
of distinct aptitudes that allow differential prediction of success in various
jobs has proven to be a thorny problem. The committee believes that the
data reported in Army and Air Force studies (Chapter 4) did in fact tend
to show slightly higher validities for the aptitude area composites (e.g.,
mechanical, electronic) than for the more general Armed Forces Qualifi-
cation Test composite-but the operative word is slightly.
However, differential prediction (in this usage meaning the ability to
predict that an individual would have greater chances of success in certain
classes of jobs and lesser chances in others, depending on the aptitude
2Since the average differences between black and white examinees are higher for GVN
than for KFM, there is an advantage in terms of reducing adverse impact to retaining Job
Family V, which has a relatively higher loading on KFM. However, these advantages will
not be significant if referral is in order of within-group percentiles, which have the same
average for blacks and whites.
OCR for page 147
VALIDITY GENERALIZATION APPLIED TO THE GATB ~47
requirements of the jobs) is critical. It is precisely what is needed for a job
counseling program to be of value for matching people more effectively to
jobs.
Although the technical challenge of developing job area aptitude
composites that provide differential prediction is great, the committee
believes that the continued pursuit of more sophisticated occupational
classification systems, such as that attempted in the GAP classification
scheme, is worthwhile. The potential for very large data-gathering
efforts exists if the use of the GATB is expanded. We suggest that
USES make full use of such data to vigorously pursue the possibility of
increased precision in the differential prediction of success in various
kinds of jobs.
CONCLUSIONS
1. Although it is convenient and simplifying to reduce all nine GATB
aptitudes to two composites-cognitive aptitude and psychomotor apti-
tude- for predicting job performance, the USES analysis supporting this
reduction is flawed. Our analysis suggests that different relative weights
for specific aptitudes might significantly improve prediction of job perfor-
mance. And, as a matter of fairness, some individuals would look better
if measured by the specific aptitudes for a class of jobs.
2. The case for rejecting the perceptual composite is weak. The two
composites GVN and KFM do not predict the validity of the individual
aptitudes accurately. The perceptual aptitudes are not well predicted by
the two VG-GATB composites, which indicates that in some jobs the SPQ
(perceptual) composite could add usefully to the prediction of job
performance.
3. The categorization of all jobs into five job families on the basis of job
complexity ratings derived from the DOT data-people-things job classifi-
cation system fails to yield classes of jobs in which prediction of job
performance is usefully advanced by weighting the composites GVN,
SPQ, and KFM separately. Except for Job Family II, which has only 5
percent of Job Service jobs, a single weighting of 2 GVN + KFM would
have the same predictive accuracy and, with the exception of black
applicants in Job Family V, the same order of referral.
4. The present VG-GATB classification of jobs into five job families,
since it has not identified job groups with useful differences in predictive
composites, is of little value as a counseling tool. Since a given worker s
performance is predicted by essentially the same formula for all jobs, it
cannot be claimed that the worker is better suited to some jobs than to
others.
OCR for page 148
)48 GATE VALIDITIES ED VA~DI~ GENERALIZATION
RECOMMENDATIONS
1. Since the job classification scheme currently used in the VG-GATB
Referral System has not identified job groups with useful differences in
predictive composites and is therefore of little value as a counseling tool,
we recommend that USES continue to work to develop a richer job
classification that will more effectively match people to jobs.
Establishing an effective job-clustering system is a necessary prerequi-
site for the testing program to produce substantial system-wide gains (see
Chapter 12~.
Representative terms from entire chapter:
specific aptitudes