Read "Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery" at NAP.edu

« Previous: 6 The Theory of Validity Generalization

Page 134 Cite

Suggested Citation:"7 Validity Generalization Applied to the GATB." National Research Council. 1989. Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery. Washington, DC: The National Academies Press. doi: 10.17226/1338.

Page 135 Cite

Page 136 Cite

Page 137 Cite

Page 138 Cite

Page 139 Cite

Page 140 Cite

Page 141 Cite

Page 142 Cite

Page 143 Cite

Page 144 Cite

Page 145 Cite

Page 146 Cite

Page 147 Cite

Page 148 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

7 Validity Generalization Applied to the GATB CRITERION-RELATED VALIDITY RESEARCH AND VALIDITY GENERALIZATION Since the 1940s the U.S. Employment Service (USES) has conducted some 750 criterion-related validity studies of the General Aptitude Test Battery (GATB). The great majority of the studies used supervisor ratings as the criterion, although a sizable minority of studies were conducted using training criteria. The purpose of these studies was to develop Specific Aptitude Test Batteries (SATBs) for specific jobs. SATBs consist of a subset (2 to 4 aptitudes) of the GATB with associated cutoff scores that best differentiate the good from the poor workers. Applicants whose scores on the chosen aptitudes exceeded the cutoff scores would be regarded as qualified to do the job. Events following the passage of the Civil Rights Act in 1964 mandated increased emphasis on investigations of GATB test fairness for minori- ties. In 1967, USES initiated an effort to validate its tests for minorities. Jobs studied for SATE development tended to be those with large numbers of workers, in part because sufficiently large samples are easier to obtain in populous occupations. The minimum sample size acceptable was 50, small for the statistical task of validating prediction of perfor- mance from test scores, but large in light of the difficulty of finding cooperative employers who have 50 workers in a single job. Some SATE samples, particularly in apprenticeable occupations, were considerably larger, although they often came from multiple establishments. Although 134

VALIDI~ GENE~IZATION APPLIED TO THE GATB ~ 35 the larger sample sizes were desirable, the comparability of the pooled establishments is not known. As stated in a USES memorandum to the committee describing its testing program, by 1980, USES believed the GATB testing program to be at a crossroads. There were now over 450 SATBs covering over 500 occupations. But there are over 12,000 jobs in the Dictionary of Occupational Titles (DOT). The extraordinary difficulty of validating SATBs on minorities, because of small sample sizes, precluded increas- ing the number of occupations covered by more than two to five a year. Even with the best methods of sample search and data collection and analysis, it was clear that developing and validating test batteries for each of the 12,000 occupations was a practical impossibility. Moreover, the technology used in SATBs, requiring both selection of aptitudes and estimation of multiple cutoffs, had been identified as obsolete, techni- cally deficient, and premised on incorrect assumptions by outside professional experts (see, e.g., Buros s Seventh Mental Measurements Yearbook, 19721. At about the same time, the methodology of meta-analysis was receiv- ing attention in mainstream psychology. USES staff saw possibilities in the work of John Hunter and Frank Schmidt, who were among the leaders in developing validity generalization, a variant of meta-analysis applied to validity coefficients, for use in personnel and industrial psychology. The working assumption of industrial psychology prior to the late 1970s was that the sizable observed variation in validity coefficients from one criterion-related validity study to the next, even in apparently similar situations, was a reflection of reality. That is, validity was thought to be situation-specific, the sizes of validity coefficients being influenced by subtle, undetected differences across different workplaces. Schmidt and Hunter argued to the contrary, saying that most validation research has been done with small samples and most studies have inadequate statistical power to demonstrate the statistical significance of their results. The observed variation in validities, they proposed, is due to statistical artifacts, primarily sampling error, rather than to true differences in validity from one situation to another. The VG-GATB Referral System is supported by a series of USES test research reports written by Hunter (U.S. Department of Labor, 1983b,c,d,e). Hunter analyzed the existing validity data base for the GATB, which at the time of his analysis consisted of reports of 515 validity studies carried out by the U.S. Employment Service and coop- erating state employment services over the period 1945-1980. His analysis may be divided into three parts. First, in The Dimen- sionality of the General Aptitude Test Battery (U.S. Department of Labor, 1983b), he argues that it is unnecessary to use all nine compo

~36 GATB VALIDITIES AND VALIDI~GENE^~IZATION nent aptitudes of the GATB in predicting job performance and that it is sufficient to use two composites, called cognitive ability and psycho- motor ability, in making predictions. Second, Test Validation for 12,000 Jobs (U.S. Department of Labor, 1983c) constructs a classification of all jobs into five job families based on the data and things scale used in the DOT code for each job. A different weighting of cognitive and psychomotor ability, for prediction of job performance, is to be used within each job family. And finally, the same report generalizes the validities for the GATB studies within each job family to all jobs in the job family. In this chapter we review the first two parts of Hunter's plan: dimension reduction and job classification. The next chapter presents Hunter's validity generalization analysis of the 515 GATB studies and compares his results with 264 more recent studies that suggest somewhat different ranges of validities than the earlier studies. REDUCTION OF NINE APTITUDES TO COGNITIVE AND PSYCHOMOTOR FACTORS The intention of the original GATB validity research program was to identify, for each job studied, a combination of specific aptitudes and minimum levels for those aptitudes, that an applicant should attain before being referred to a job; these are the so-called SATBs prepared for each job. There are too many jobs in the U.S. economy, and too many new jobs being created, for the GATB research program ever to hope to cover more than a small fraction of them. Two kinds of problems stand in the way. First, it is not immediately clear that a validity study done for a particular job title in a particular plant is applicable to the same job title in another plant; the same duties in the job description may be performed in quite different working environments by different groups of workers. Thus some mechanism must be discovered for generalizing the validity results for jobs studied to jobs not studied, if the research is to be useful. Second, the statistical base for a single job, consisting usually of a sample of fewer than 100 workers, is not by itself adequate to carry out the complex estimation involved in identifying three or four of the nine GATB aptitudes as relevant to the job and selecting minimum compe- tency levels for the aptitudes. A good dose of job analyst's judgment must be used in selecting and calibrating the aptitudes, since the data available do not provide a sufficient basis for decision. Again, we wish to increase the statistical strength of conclusions by making some sensible combina- tion of data for different jobs.

VALIDITY GENE~~=TION SPEWED TO THE GATB ~ 37 GATB Dimensions Faced with the need to generalize validity results from the 491 jobs represented in the 515 studies to the other jobs in the economy, faced also with the problem of small sample sizes that plagues the SATB approach, Hunter's strategy (U.S. Department of Labor, 1983b) is both to reduce the number of variables relevant to predicting job performance, and to assume that the same prediction equations will apply across broad classes of jobs, so that all data for jobs in the same class may be combined in estimating the equation. In developing his own position, Hunter describes two theories of job performance the specific aptitude theory and the general ability theory. Traditional thinking in the GATB program was that job performance would be best predicted by the specific aptitude or aptitudes measured by the SATB and required by the job. For example, performance as a bookkeeper would be better predicted by the numerical aptitude than by general cognitive ability, and performance as an editor would be better predicted by the verbal aptitude than general cognitive ability. In this view, general intelligence has only an indirect relation to job perfor- mance; it is mediated by specific aptitudes. The other position, which was the dominant view early in the twentieth century and is currently enjoying renewed popularity, is that one general cognitive ability, commonly called intelligence, underlies the specific abilities a person develops in school, at play, and on the job. In this view, the validities of the SATBs that were demonstrated in 40 years of research would be the effect of joint causation by a common prior variable, the underlying general cognitive ability. Hunter's analysis of the dimension- ality of the GATB brings him to a variant of the general ability interpre- tation. Hunter argues that, contrary to the SATB analyses, multiple regression techniques should be used in predicting job performance from the nine GATB aptitudes, because the nine are strongly intercorrelated (Table 7-1~. However, the correlations between aptitudes, which must be known in order to apply multiple regression, are only poorly estimated in any one study, and a full multiple regression determining specific weights for each aptitude cannot estimate the weights accurately enough. On the basis of an analysis of the covariation of aptitudes across jobs, he proposes that the nine specific aptitudes fall into three categories of general abilities: cognitive, perceptual, and psychomotor. Although the cognitive and psychomotor abilities are only moderately correlated with one another, both are highly correlated with the perceptual composite (Table 7-2~. As a consequence of this overlap, Hunter says that the perceptual composite will add little to the predictive power of the GATB; the nine GATB

~ 353 GATB VALIDITIES AND VALIDITY GENERALIZATION TABLE 7-1 Correlations Between Aptitudes Based on 23,428 Worker and Aptitude Reliabilities (Decimals Omitted) G V N S P Q K F M Intelligence (G) 100 Verbal aptitude (V) 84 100 Numerical aptitude (N) 86 67 100 Spatial aptitude (S) 74 46 51 100 Form perception (P) 61 47 58 59 100 Clerical perception (Q) 64 62 66 39 65 100 Motor coordination (K) 36 37 41 20 45 51 100 Finger dexterity (F) 25 17 24 29 42 32 37 100 Manual dexterity (M) 19 10 21 21 37 26 46 52 100 Reliability 88 85 83 8' 79 75 86 76 77 SOURCE: U.S. Department of Labor. 1983. The Dimensionality of the General Aptitude Test Battery (GA TB) and the Dominance of General Factors Over Specific Factors in the Prediction of Job Performance for the U.S. Employment Service. USES Test Research Report No. 44. Division of Counseling and Test Development, Employment and Training Administration. Washington, D.C.: U.S. Department of Labor, p. 18. aptitudes may be satisfactorily replaced by just two composite aptitudes: cognitive ability, composed of general intelligence, verbal ability, and numerical ability; and psychomotor ability, composed of motor coordi- nation, finger dexterity, and manual dexterity. (It should be noted that the general intelligence variable is the sum of verbal aptitude, spatial apti- tude, and numerical aptitude with the computation test score removed; it is not measured independently of the others.) Predicting performance for a particular job thus can be reduced to appropriately weighting cognitive ability and psychomotor ability in a combined score for predicting performance, a much simpler task than assessing the relative weights of nine aptitudes. TABLE 7-2 Correlations Between Composites (Decimals Omitted) GVN SPQ KFM Cognitive composite (GVN) 100 76 35 Perceptual composite (SPQ) 76 100 51 Psychomotor composite (KFM) 35 51 100 SOURCE: U.S. Department of Labor. 1983. The Dimensionality of the General Aptitude Test Battery (GA TB) and the Dominance of General Factors Over Specific Factors in the Prediction of Job Performance for the U.S. Employment Service. USES Test Research Report No. 44. Division of Counseling and Test Development, Employment and Training Administration. Washington, D.C.: U.S. Department of Labor, p. 22.

VALIDITY GENE^~IZATION~PLlED TO THE GATE 139 What Gets Lost in the Simplifying Process? One obvious question to ask is whether the power of the GATE to predict for different kinds of jobs, that is, its usefulness in classifying applicants, is diminished by this broad-brush approach. A number of experts have commented to the committee (e.g., Lee J. Cronbach, letter dated July 6, 1988) on the exclusion of the perceptual composite. Hunter argues that the perceptual ability composite (S + P + Q) could be predicted essentially perfectly from the cognitive (G + V + N) and psychomotor composites (K + F + M)- if the composites were perfectly measured. With the actual composites, the multiple correlation for predicting SPQ from GVN and KFM is .80 and the perceptual composite is dropped from all but Job Family 1. But part of the reason that GVN and SPQ are so highly correlated is that the spatial factor S is included in both G and SPQ. A more general observation is that the composites do not predict the specific aptitudes very accurately, even after adjusting for less than perfect reliability.' The question remains whether the specific aptitudes need to be included with separate weights in the regression equations for job performance, or whether the effect of each specific aptitude is captured sufficiently well by including the corresponding composite in the equations predicting job performance. If the latter holds, the task of setting aptitude weights for jobs is much simplified. In building the case, Hunter proposes that validities of aptitudes for jobs are constant for aptitudes in the same composite, so that it is appropriate to use only the composites and not the separate aptitudes in predicting performance. Thus the V and N aptitudes might have validities .25 .25 for one job, .20 .20 for another job, .30 .30 for another job. (The G aptitude must be treated differently.) If this is so, then the correlation between such validities over jobs would be 1. He therefore considers the correlations between aptitude validities over jobs (Table 7-31. The reliability measure in Table 7-3 is based on the sampling error in estimating validities for individual studies. Since the average sample size is 75, a sample validity differs from a true validity by an error with variance approximately .013. The variance of sample validities over all studies is about .026. Thus the variance of true validities over studies is about .013. One way to compute reliability is the ratio of variance of true 'The reliability of a measurement is the correlation between repeated measurements of the same individual, so, for example, if the reliability were 1.0, repeated measurements would be exactly the same. If two variables are not reliably measured, the correlation between them will be lower than that between perfect measurements and may be increased by correcting for unreliability. Note that the same correction does not apply to correlations with intelligence, however, because it is not independently measured.

~40 GATB VALIDITIES AND VALIDITY GENERALIZATION TABLE 7-3 Correlations Between Validities Over 515 Jobs (Decimals Omitted) V N S P Q K F M Intelligence (G) Verbal aptitude (V) Numerical aptitude (N) Spatial aptitude (S) Form perception (P) Clerical perception (Q) Motor coordination (K) Finger dexterity (F) Manual dexterity (M) Reliability 100 80 100 81 67 45 57 19 9 -2 54 32 30 54 16 -7 100 40 100 48 53 100 30 57 100 8 41 40 100 26 45 23 46 100 14 36 19 56 62 100 47 46 44 45 53 52 SOURCE: U.S. Department of Labor. 1983. The Dimer~sionality of the General Aptitude Test Battery (GA TB) and the Dominance of General Factors Over Specific Factors in the Prediction of Job Performance for the U.S. Employment Service. USES Test Research Report No. 44. Division of Counseling and Test Development, Employment and Training Administration. Washington, D.C.: U.S. Department of Labor, p. 32. 24 15 9 47 47 validities to the variance of measured validities, which would be about .5 here. Hunter suggests that the above table of correlations between validities supports his `'general ability theory," which would predict correlations of I between specific aptitudes in the same general ability group. He adjusts the given correlations by the reliability correction, which increases the within-block correlations to an average value of 1.09. This is inaccurate, however. The standard reliability correction is inappropriate here because the errors in measuring different validity coefficients are correlated. Thus if the sample validity for form perception is higher than the true validity, then the sample validity for clerical perception is likely to be higher than the true validity for that sample. When the correlation between sample validities for form perception and clerical perception is computed across studies, it will tend to be positive simply because form perception and clerical perception are positively correlated. Suppose for example that there were no variations in true validities between jobs. The true variance of validities would be zero. The correlation matrix of sample validities would then be approximately the same as the original correlation matrix between variables, because of correlated sampling errors. At the other extreme, suppose the sample sizes were very large so that the sampling variance of validities was zero. Then the correlation matrix between sample validities would be the correlation matrix between true validities.

VALIDITY GENERALIZATION APPLIED TO THE GATE ~4~ TABLE 7-4 Estimated Correlations Between True Job Validities (Decimals Omitted) G V N S P Q K F M Intelligence (G) 100 Verbal aptitude (V) 76 100 Numerical aptitude (N) 76 55100 Spatial aptitude (S) 60 1829 100 Form perception (P) 29 1338 47 100 Clerical perception (Q) 50 4660 21 49 100 Motor coordination (K) 2 -57 -12 37 29 100 Finger dexterity (F) -7 -156 23 48 14 54 100 Manual dexterity (M) -23 -24-3 7 37 12 66 72 100 - NOTE: Each entry is estimated by multiplying by 2 the corresponding entry in Table 7-3 and subtracting the corresponding entry in Table 7-1. A slightly more accurate estimate would subtract from each correlation the product of the average validities of the variables, which will be about .04. In the present case, taking about half the variance in true validities and half the variance in the sampling error, as in the Hunter analysis, suggests (after complex computations) that the correlation of observed validities is about half the correlation of the true validities plus half the correlation between the variables. This produces an estimated matrix of correlations between true job validities (Table 7-4), which is quite different from Hunter's matrix using the standard correction for reliability. If this is the way the true validities covary, then we can expect to find jobs with many different weightings appropriate for specific aptitudes. If cognitive ability and psychomotor ability were sufficient to predict job performance, then we would expect to be able to predict accurately the validities of all aptitudes for a given job by knowing the validities for these two composites. It is evident that the accepted composites do not predict the validity of individual aptitudes at all accurately. The perceptual aptitudes are not well predicted by the two composites, so that there must be many jobs in which they would have useful validities. Since G is composed of a mixture of cognitive and perceptual aptitudes, let us look at the eight independently measured aptitudes. How should they be combined so that the combined aptitudes are sufficient for use in prediction equations? The highly correlated groups are VNQ, SP, and KFM. Composites based on these variables would predict validities for all variables reasonably well, and the correlations between the validities of the composites would be relatively small. These would be useful com- posites for classifying jobs into different groups within which different prediction equations might apply. It is interesting to note that GVN and KFM have negative correlations in Table 7-4, so that jobs for which GVN

)42 GATBVALIDITIES~DVALIDI~GENE^~IZATION has high validity tend to be jobs for which KFM has low validity and vice versa. Hunter and Schmidt (1982) consider models in which economic gains from job matching are obtained by using spatial aptitude and perceptual ability in addition to general cognitive ability. We offer this as further evidence that the SP composite might be of value. Although it is convenient and simplifying to consider only cognitive and psychomotor ability in predicting job performance, the analysis support- ing this reduction is flawed. The estimated correlations of true validities suggest that different relative weights for specific aptitudes might signif- icantly improve prediction of job performance. In developing prediction equations for a specific job, it is not at all necessary to use only the data available for that job. We know the overall correlations between specific aptitudes. We have an estimate of joint distribution of true validities. These collective data may be combined with specific data available for the job to develop regression equations predict- ing performance on the job. For jobs with no direct validity data, we would still need indicators of the specific aptitude validities for the job, such as provided by the five job families for Hunter's two-composite model. The cognitive ability composite is defined as G + V + N. where G has already been defined as the sum of test scores on vocabulary, arithmetic reasoning, and three-dimensional space. Thus G already includes terms for verbal aptitude, numerical aptitude, and spatial aptitude. In terms of original standardized test scores, GVN is approximately three-dimensional space + 3 x vocabulary + 3 x arithmetic reasoning + 2 x computation. These weights have developed as a historical accident, caused by the definition of G first and GVN second. Are these the correct variables to include in the cognitive factor? The correlations between aptitudes suggest that clerical perception, being highly correlated with verbal and numerical aptitude, might be sensibly included in a cognitive factor, and indeed this is suggested in the factor analyses of Fozard et al. (1972) and also by the pattern of estimated correlations of true validities (Table 7-4~. If only two composites are to be used, one for cognitive ability and one for psychomotor ability, it is necessary to establish weights for the specific aptitudes in the composites. Since the aptitudes are highly correlated, it does not make too much difference which weights are used, but one would like to use weights that have some justification. The case for rejecting the SPQ composite, because it is predicted by the other two composites with correlation .80, is weak. It is a mathematical

VALIDITY GENERATION SPORED TO THE GATE ]43 truism that if several variables are highly correlated, then linear combi- nations of some of the variables will predict other linear combinations with high correlation. The question is whether the SPQ composite adds usefully to the prediction of job performance, and it is known that it does in some jobs. For the same reason, the case for rejecting specific aptitudes is weak. Not enough is known about predicting job performance to conclude quickly that two composites alone are sufficient, however convenient it is to work with only two variables in classifying jobs and constructing regression equations. THE FIVE JOB FAMILIES The question remains, what is the appropriate predictor for a job not previously studied? There would be no issue if cognitive ability alone were useful in predicting performance validity might vary from one job to another, but, for every job, applicants would be referred in order of their cognitive score. But if two factors (or several factors) are to be used, their relative weight must be decided in each job. Constructing the Five Job Families Hunter divides all 12,000 jobs in the Dictionary of Occupational Titles into five job families (U.S. Department of Labor, 1983c), and a different weighting of the two abilities is proposed for predicting job performance within each job family. Before deciding on the specifics of the clustering techniques, he examined five different classification schemes for their effectiveness in predicting cognitive and psychomotor validities; each scheme uses attributes available for any job: 1. the test development analyst s judgments; 2. the mean aptitude requirements listed for each job in the Dictionary of Occupational Titles; 3. a five-level job complexity scale based on the DOT data-people- things scale, organized from 1 to 5 in descending order of complexity; 4. predictors from the Position Analysis Questionnaire (PAQ) (McCor- mick et al., 19721; and 5. the Occupational Analysis Pattern (OAP) structure developed by R.C. Droege and R. Boese (U.S. Department of Labor, 1979, 19804. All five classification schemes were reported to perform about equally well in predicting observed validity with correlation .30, although Hunter notes that both PAQ and OAP offer some potential improvements over the data-people-things job complexity classification. However, since the data-people-things classification is available for all jobs through the

|44 GATE VALIDITIES AND VA~DI7Y GENERALIZATION Dictionary of Occupational Titles, that classification was used in validity generalization from the GATE validity studies. The five job families used in the VG-GATB Referral System are therefore the five complexity-based families of the data-people-things classification, with one important difference: the order in which they are numbered does not reflect complexity. Sample Jobs in the Job Families: Family I-set-up/precision work: machinist; cabinet maker; metal fabri- cator; loom fixer Family II feeding/offbearing: shrimp picker; cornhusking machine op- erator; cannery worker; spot welder Family III synthesize/coordinate: retail food manager; fish and game warden; biologist; city circulation manager Family IV-analyze/compile/compute: automobile mechanic; radiologi- cal technician; automotive parts counterman; high school teacher Family V~opy/compare: assembler; insulating machine operator; fork- lift truck operator For the mean observed validities for job complexity categories, see Table 7-5. The final step in the classification system was the development of regression equations that predict job performance as a function of the cognitive, perceptual, and psychomotor composites within each job family (Table 7-61. (There are different recommended equations for training success, but these apply to a small fraction of jobs and applicants only.) It will be noted that the recommended regression equations differ somewhat from the equations computed for the observed validities. The TABLE 7-5 Mean Observed Validities for Job Complexity Categories, and Beta-Weights of GVN, SPQ, and KFM in Predicting Job Performance for Jobs Within Each Category (Decimals Omitted) Validities Beta-Weights Num Job her of Family Complexity Levels GVN SPQ KFM GVN SPQ KFM r Jobs I 1. Setup 34 35 19 18 20 3 37 21 III 2. Synthesize/coordinate 30 21 13 34 -7 5 31 60 IV 3. Analyze/compile/compute 28 27 24 21 3 15 32 205 V 4. Copy/compare 22 24 30 9 5 25 33 209 II 5. Feeding/of~bearing 13 15 35 5 -6 37 36 20 SOURCE: U.S. Department of Labor. 1983. Test Validation for 12,000 Jobs: An Application of Job Classification and Validity Generalization Analysis to the General Aptitude Test Battery. USES Test Research Report No. 45. Division of Counseling and Test Development, Employment and Training Administration. Washington, D.C.: U.S. Depart- ment of Labor, p. 21.

VA~DI7Y GENERALIZATION APPLIED TO THE GATE 145 TABLE 7-6 Recommended Regression Equations for Predicting Job Performance (JP) Job Complexity Multiple Family Level Regression Equation Correlation 1 1 JP=.40 GVN + .19 SPQ + .07 KFM .59 III 2 JP=.58 GVN .58 IV 3 JP=.45 GVN + .16 KFM .53 V 4 JP=.28 GVN + .33 KFM .50 II 5 JP=.07 GVN + .46 KFM .49 SOURCE: U.S. Department of Labor. 1983. Test Validation for 12,000 Jobs: An Application of Job Classification and Validity Generalization Analysis to the General Aptitude Test Battery. USES Test Research Report No. 45. Division of Counseling and Test Development, Employment and Training Administration. Washington, D.C.: U.S. Depart ment of Labor, p. 39. new equations are computed from validities corrected for restriction of range (in the worker populations studied compared with the applicant populations for whom the predictions will be made) and for reliability of supervisor ratings. The effect of these corrections is to increase the multiple correlation that indicates the accuracy of the prediction by about 65 percent. Since GVN has greater restriction of range than KFM, the corrections tend to increase the estimated GVN validities more, and so give greater weight to GVN in the regression equations. Do the Five Job Families Electively Increase Predictability? The majority of the GATE studies (84 percent of workers studied for job performance) fall into job complexity categories 3 and 4, which correspond to the Job Families IV and V in the eventual VG-GATB referral protocol. And indeed, about the same proportion of Job Service applicants apply for jobs in those categories. From Table 7-3, the correlation between GVN and KFM is .35. This means that the correla- tion between the predictor of success for Job Family IV and the predictor of success for Job Family V is .93. If we used a single predictor, say 2 GVN + KFM, it would have correlation greater than .96 with both these predictors. Thus the ordering of applicants by the score 2 GVN + KFM would be almost indistinguishable from the orderings by the different predictors for Job Families IV and V, and would have correlation at least .93 with the predictors in all job families except Job Family II (complexity level 5), which contains only 5 percent of the jobs. We conclude that the job complexity classification based on data and things fails to yield classes of jobs within which prediction of job perfor

)46 GATE VALIDITIES AND VALIDITY GENERALIZATION mance is usefully advanced by weighting the composites GVN and SPQ and KFM separately. The only class that justified different weighting was the small class of low-complexity jobs that included only 5 percent of the workers. For all the rest of the jobs we would have effectively the same predictive accuracy, and effectively the same order of referral of workers, 2 by using the single weighting 2 GVN + KFM. Prediction of performance from a single factor would be expected by the proponents of Spearman~s g, a single numerical measure of intelli- gence. A recent issue of the Journal of Vocational Behavior (vol. 31, 1986) is devoted to the role of g in predictions of all kinds. The general argument offered is that g does just as well as specialized test batteries developed, following Hull s (1928) prescription, by multiple regression. For example, Hunter (1986) argues that the specialized test batteries developed by the military for different groups of jobs (mechanical, electronic, skilled services, and clerical) predict performance no better in the category they were developed for than in other categories, and no better than g in any category. Thorndike (1986) argues that specialized batteries developed for optimal prediction on a set of people show marked drops in validity when cross-validated against other groups of people, and that a general predictor g is to be preferred unless the regression weights are based on large groups. Jensen (1986) asserts that practical predictive validity of psychometric tests is mainly dependent on their g-loading, although he concedes that clerical speed and accuracy and spatial visualization add a significant increment to the predictive validity of the GATE for certain clerical and skilled blue collar occupations. We, for our part, remain unconvinced by the USES analysis that finer differentiation is not possible. We do acknowledge that the development of distinct aptitudes that allow differential prediction of success in various jobs has proven to be a thorny problem. The committee believes that the data reported in Army and Air Force studies (Chapter 4) did in fact tend to show slightly higher validities for the aptitude area composites (e.g., mechanical, electronic) than for the more general Armed Forces Qualifi- cation Test composite-but the operative word is slightly. However, differential prediction (in this usage meaning the ability to predict that an individual would have greater chances of success in certain classes of jobs and lesser chances in others, depending on the aptitude 2Since the average differences between black and white examinees are higher for GVN than for KFM, there is an advantage in terms of reducing adverse impact to retaining Job Family V, which has a relatively higher loading on KFM. However, these advantages will not be significant if referral is in order of within-group percentiles, which have the same average for blacks and whites.

VALIDITY GENERALIZATION APPLIED TO THE GATB ~47 requirements of the jobs) is critical. It is precisely what is needed for a job counseling program to be of value for matching people more effectively to jobs. Although the technical challenge of developing job area aptitude composites that provide differential prediction is great, the committee believes that the continued pursuit of more sophisticated occupational classification systems, such as that attempted in the GAP classification scheme, is worthwhile. The potential for very large data-gathering efforts exists if the use of the GATB is expanded. We suggest that USES make full use of such data to vigorously pursue the possibility of increased precision in the differential prediction of success in various kinds of jobs. CONCLUSIONS 1. Although it is convenient and simplifying to reduce all nine GATB aptitudes to two composites-cognitive aptitude and psychomotor apti- tude- for predicting job performance, the USES analysis supporting this reduction is flawed. Our analysis suggests that different relative weights for specific aptitudes might significantly improve prediction of job perfor- mance. And, as a matter of fairness, some individuals would look better if measured by the specific aptitudes for a class of jobs. 2. The case for rejecting the perceptual composite is weak. The two composites GVN and KFM do not predict the validity of the individual aptitudes accurately. The perceptual aptitudes are not well predicted by the two VG-GATB composites, which indicates that in some jobs the SPQ (perceptual) composite could add usefully to the prediction of job performance. 3. The categorization of all jobs into five job families on the basis of job complexity ratings derived from the DOT data-people-things job classifi- cation system fails to yield classes of jobs in which prediction of job performance is usefully advanced by weighting the composites GVN, SPQ, and KFM separately. Except for Job Family II, which has only 5 percent of Job Service jobs, a single weighting of 2 GVN + KFM would have the same predictive accuracy and, with the exception of black applicants in Job Family V, the same order of referral. 4. The present VG-GATB classification of jobs into five job families, since it has not identified job groups with useful differences in predictive composites, is of little value as a counseling tool. Since a given worker s performance is predicted by essentially the same formula for all jobs, it cannot be claimed that the worker is better suited to some jobs than to others.

)48 GATE VALIDITIES ED VA~DI~ GENERALIZATION RECOMMENDATIONS 1. Since the job classification scheme currently used in the VG-GATB Referral System has not identified job groups with useful differences in predictive composites and is therefore of little value as a counseling tool, we recommend that USES continue to work to develop a richer job classification that will more effectively match people to jobs. Establishing an effective job-clustering system is a necessary prerequi- site for the testing program to produce substantial system-wide gains (see Chapter 12~.

Next: 8 GATB Validities »

Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery (1989)

Chapter: 7 Validity Generalization Applied to the GATB

Welcome to OpenBook!

Get Email Updates