Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Summary This volume is one of a number of studies conducted under the aegis of the National Research Council/National Academy of Sciences that deal with the use of standardized ability tests to make decisions about people in employ- ment or educational settings. Because such tests have a sometimes important role in allocating opportunities in American society, their use is quite rightly subject to questioning and not infrequently to legal scrutiny. At issue in this report is the use of a federally sponsored employment test, the General Aptitude Test Battery (GATB), to match job seekers to requests for job applicants from private- and public-sector employers. Developed in the late 1940s by the U.S. Employment Service (USES), a division of the Department of Labor, the GATB is used for vocational counseling and job referral by state-administered Employment Service (also known as Job Service) offices located in some 1,800 communities around the country. In recent years, the Department of Labor has begun to promote the use of the GATB throughout the Employment Service for referral to all jobs found in the U.S. economy. Spurred by the need to streamline operations because of severe staff reductions and budget cuts, and hoping as well to increase economic productivity by improving the personjob match, USES has encouraged the states to experiment with a test-based referral system that in this report is called the VG-GATB Referral System. Although the "'VG" stands for validity generalization, the theory used to extrapolate the empirically established validities of the GATB for predicting performance in some 500 jobs to all other jobs in the U.S. economy. 1
2 SUMMARY state pilot programs have tended to be a patchwork of old and new procedures, the major features of the experimental system as conceived by USES staff are as follows: 1. Virtually all registrants at Employment Service offices are to be administered the GATB. 2. Virtually all job orders are to be filled on the basis of GATB scores. 3. The group of job candidates referred to an employer are to be selected on the basis of rank-ordered test scores (plus any additional criteria such as educational or experience requirements imposed by the employer). 4. GATB scores are computed as percentile scores within each of three racial or ethnic groups: black, Hispanic, and other. The purpose of these score adjustments, which serve to erase group differences in test scores, is to mitigate the adverse effects that rank-ordering on the basis of test score would otherwise have on the employment opportunities of minority job seekers. 5. Given the dependence of this type of referral system on searching the registrant files to compile a list of job candidates from the highest score on down, computerization of the files to allow rapid data retrieval is encouraged to complement the VG-GATB Referral System. ISSUES FOR STUDY Faced with a Justice Department challenge on legal and constitutional grounds to one element in the VG-GATB Referral System-the use of within-group percentile scores the Department of Labor sought the advice of the National Academy of Sciences on the future role of test-based referral in the Employment Service. A committee of experts established within the National Research Council was asked to study the issue of within-group scoring and further to undertake a thorough evaluation of validity generalization and its application to the GATB. The Department of Labor sought advice on whether the pivotal role envi- sioned for the General Aptitude Test Battery is technically justified, whether the anticipated economic benefits are realistic, and what the effects of widespread adoption of the VG-GATB Referral System might be on various constituencies of interest, including veterans, people with handicapping conditions, employers, and job seekers. In seeking answers to these questions, the Committee on the General Aptitude Test Battery organized its work around nine topics, outlined briefly below. The text indicates the chapters of the report that contain the committee s complete statements. This overview concludes with a summary of the committee s central recommendations.
SUMMARY 3 Issues in Policy, Equity, and Law (Chapters 1, 2, and 13) The VG-GATB Referral System raises important questions of social policy. The Department of Labor s adoption of within-group scoring and the subsequent legal challenge lodged by the former Assistant Attorney General for Civil Rights echo a deep-seated ambivalence in our society about the meaning of equality. At issue is the fairness of using race- conscious mechanisms to overcome the legacy of governmentally im- posed discrimination that consigns most black Americans to the margins of social acceptance and economic well-being. Employment testing raises the issue in the public arena in a very concrete way: employee selection on the basis of rank-ordered test scores will screen out a large proportion of black and Hispanic candidates and thus expose employers (and the Employment Service) to legal action under the civil rights laws on grounds of discrimination; the use of score adjustments to mitigate these adverse effects on the employment chances of minority job seekers creates vulnerabilities to charges of reverse discrimination. The claim of omnicompetence for the GATE that it is a valid predictor for all 12,000 jobs in the U.S. economy-raises a different set of issues. It is based in part on the idea that the test measures some attribute that underlies performance in all jobs, an attribute that is usually identified as intelligence, or g. There are dangers in promoting intelligence testing to which policy makers should be sensitive. Data from intelligence testing were misused in the early twentieth century in a way that fed the racial and ethnic prejudices of the day, and the potential for generating feelings of superiority in some groups and inferiority in others is equally great today. Findings and Conclusions The committee is not in a position to make definitive statements about the legality of race-conscious scoring methods. Our aim is to explicate the issues that policy makers need to consider as they plan the future of the VG-GATB Referral System, and to offer advice on aspects of the problem that lend themselves to scientific analysis. Is the Psychometric Quality of the GATB Adequate? (Chapters 4 and 5) The General Aptitude Test Battery is now some 45 years old. There have been four versions of the test: Forms A and B were introduced in 1947, and Forms C and D in 1983. A testing program of this sort always
4 SUMMARY poses the danger of getting handcuffed to history; the issue is whether to make changes in the instruments as the technology advances. Altering a test can destroy its links to the research base; there is, therefore, a strong impulse toward preservation, which can ultimately result in an out-of-date test. Given the relatively advanced age of the GATB, we felt it important to look closely at the test's structure and content and its psychometric properties. We also wanted to see how it compares with other major test batteries. Findings and Conclusions Our study leads us to conclude that the GATB is adequate in psycho- metric quality, with the exception of two serious flaws that could significantly impair the usefulness of the test if it is made an important screening device throughout the Employment Service. The first flaw is weak test security due to the availability of only two current alternate forms of the test and due to administration of the test in a variety of protocols by a variety of organizations. It must be anticipated that the forms of the test will be available outside government channels once it becomes clear that getting a job through the Employment Service depends on doing well on the test. The second flaw is the speededness of the test. Many of the subtests have such severe time limits that an average applicant can expect to complete only one-third of the test. Such tests are eminently coachable; that is, test takers can learn strategies to improve their performance. For example, scores can be substantially increased by randomly filling in the remaining blanks in the last minute of the test. The test will not retain its validity if such coaching becomes widespread. We did not find the GATB markedly superior or inferior to other test batteries, such as the Armed Services Vocational Aptitude Battery (ASVAB), on two dimensions of central importance-predictive validity and test reliability. But the GATB does not compare well with the ASVAB in other ways, e.g., test security, the production of new forms, the strength of its normative data, and the severe time limits imposed even when speed of performance is not an essential aspect of the aptitude being measured. How Well Does the GATB Predict Job Success? (Chapter 8) The question of greatest interest about any employment test is how accurate an estimate of future job performance it allows. No test provides anything close to perfect prediction; there are many characteristics of
- SUMMARY ~ importance to actual job performance that tests do not assess, and others that tests do not assess very well. Nevertheless, tests can measure some relevant skills and abilities and are particularly good gauges of cognitive abilities. The GATB is supported by some 750 criterion-related validity studies. These show the degree of relationship between GATB scores and a measure of job performance (typically supervisor ratings of job incum- bents, but in some cases grades in a training course) in about 500 jobs. The committee has reanalyzed the data from these 750 studies and looked closely at the adjustments for sampling error and restriction of range that appear in USES technical manuals reporting GATB validities. Findings and Conclusions Our findings speak directly to the question of how central a role in Employment Service job referrals the GATB could sustain technically. In the 750 studies, the correlations of GATB-based predictors with supervi- sor ratings, after correction for sampling error, are in the range of .2 to .4. The average validity (corrected for criterion unreliability) of GATB aptitude composites in studies conducted since 1972 is about .25, whereas corresponding adjustments for the older studies produce an average validity of .35. These correlations are modest. In the committee's judgment, they indicate that GATB scores can provide useful screening information, but that the predictive power of the test battery is not so strong that the GATB should become the sole means of filling all job orders. The average values reported here are lower than those appearing in USES technical reports, which are .5 or higher. One reason for the discrepancy is that the committee had access to more data; the more recent (post-1972) studies tended to produce noticeably lower validities than did the older studies. In addition, although we acknowledge that the correlations are attenuated by criterion unreliability and range restriction, the committee does not accept the magnitude of the corrections that were made for these two factors in the USES technical reports. Since these corrections have the effect of substantially increasing the estimated correlations between test scores and ratings of job performance, the committee's estimate of GATB validities is substantially lower than that in the technical reports. Does the GATB Predict Less Well for Minority Job Seekers? (Chapter 9) Because of the consistent differences in average group performance on standardized tests, a persistent concern about ability tests has been
. 6 SUMMARY that they may be biased against minority group members. The 1970 Equal Employment Opportunity Commission (EEOC) Guidelines on Employee Selection Procedures required that data be generated and results be reported separately for minority and nonminority groups, and USES conducted about 200 validity studies during the 1970s and early 1980s to explore the question. We have looked carefully at the data reported by race to see if the GATB predicts differentially by race. Findings and Conclusions Our analysis of the 78 studies that had at least 50 black and 50 nonminority employees shows that there were differences in both the validities and the prediction equations for blacks and nonminorities. First, the average correlations between test score and supervisor ratings were .12 for blacks and .19 for nonminorities. Second, the formula that best predicts black performance is somewhat different from that predicting the performance of majority-group applicants. However, the use of a single formula for relating GATE scores to performance criteria would not be biased against black applicants; if anything, it would slightly overpredict their performance, particularly in the higher score ranges. This finding needs to be treated with some caution. Differential predic- tion analysis takes the performance measure as a given. But there may be bias against blacks in the primary criterion measure used in the studies- supervisor ratings. Usually the supervisors were white. There is some empirical evidence, and it is plausible on historical and social grounds, that supervisors will favor employees of their own race. The size of the supervisor bias has not been determined, but its possible presence counsels caution in accepting supervisor ratings as an equally accurate estimate of job performance for both groups. Are There Scientific Justifications for Adjusting Minority Test Scores? (Chapter 13) In addition to the question of test bias, which is addressed by differential validity analysis (comparability of correlations) and differen- tial prediction analysis (comparability of regression lines), there is a larger question of the evenhandedness of selection based on test scores. Our premise is that the inaccuracy of the test should not unduly affect the employment prospects of able minority workers. This premise led us to focus on the issue of selection error and specifically to ask whether there are differences among the majority and minority groups in false-accep- tance and false-rejection rates.
SUMMARY 7 Findings and Conclusions Our analysis of the impact of selection error on minority and nonmi- nority applicants demonstrates that in the absence of score adjustments, minority applicants who could perform successfully on the job will be screened out of the referral group in greater proportions than are equivalent majority-group applicants. Conversely, majority applicants who turn out not to perform successfully will be included in the referral group in greater proportions than equivalent minority applicants. This effect of selecting by rank order of scores is a function of prediction error and the existence of average group differences in test scores. To explain: If applicants are placed by test scores alone, taking the applicants in order of test score produces workers with the highest expected supervisor ratings. Nonetheless, because prediction is imper- fect, some high scorers will not perform well on the job and some low scorers could have done so. With no score adjustments, very low fractions of minority-group members will be referred for employment because minority-group members tend to score substantially lower on the GATE on average. For example, if 20 percent of the majority group were referred, only 3 percent of the minority group would be referred to a typical job handled by the Employment Service. Yet, because the validities of test score for supervisor rating are modest, there is not so great a difference in average job performance between minority and majority applicants as there is in average test performance. Majority workers do comparatively better on the test than they do on the job, and so benefit from errors of false acceptance. Minority workers at a given level of job performance have much less chance of being selected than majority workers at the same level of job performance, and thus are burdened with higher false-rejection rates. (Note that these effects are a function of high and low test scores, not racial or ethnic identity.) In sum, the modest validities of the GATE cause selection errors that weigh more heavily on minority workers than on majority workers. This outcome is at odds with the nation's express commitment to equal employment opportunity for minority workers. In the committee's judg- ment, the disproportionate impact of selection error provides scientific grounds for the adjustment of minority scores so that able minority workers have approximately the same chances of referral as able majority workers. Others will have to decide whether the scientific reasons are compelling in the realms of public policy and law. The committee has analyzed two score-adjustment methods the cur- rent USES system of within-group percentile scores and a performance- based method of computing scores. Both score adjustment strategies are
8 SUMMARY race-conscious; both would virtually eliminate the adverse impact of the GATB on black and Hispanic subpopulations (at current validity levels); and both adjustments would be commensurate with the far less than perfect relation between the GATB test score and job performance. Is the GATB Valid for Some, Most, or All Jobs? (Chapters 6 and 7) The VG-GATB Referral System was built on the claim that the GATB is a valid predictor of job performance for all 12,000 jobs in the U.S. economy. That is a big claim, and two chapters of the report are devoted to weighing its scientific merits. Findings and Conclusions In the committee's judgment, it is probable that the GATB has validities for supervisor ratings in the range of .2 to .4 for a wide variety of jobs similar to those served by the Employment Service, although we have seen no evidence to justify the claim that the test battery is a valid predictor for all 12,000 jobs in the economy. We accept the general thesis of validity generalization, that the results of validity studies can be generalized to many jobs not actually studied, but we urge a cautious approach of generalizing validities only to appropriately similar jobs. Furthermore, the policy considerations do not end with a demonstration that the GATB has some predictive power for x numbers of jobs. The question that still must be asked is how much validity is enough to make a single fallible test the central means of referring workers to jobs throughout the Employment Service. Although exclusive use of the VG-GATB Refer- ral System would make the matching of people to jobs slightly more efficient, it would do so at the cost of depriving the low scorers of any chance at jobs that many of them could have performed successfully. Policy makers will have to decide if such a cost is warranted. One would also want to consider whether it makes equally good sense to use a general test battery such as the GATB for jobs that do not require a great deal of prior training as well as for those that do. Should it be used for entry-level as well as experienced workers? For experienced workers or complicated jobs, other sources of information may be more valuable. Will Increased Use of the GATB Result in Substantial Increases in Productivity? (Chapter 12) Personnel psychologists have always made the logical assumption that matching people to jobs more effectively will increase productivity; this has
SUMMARY 9 been the underlying rationale for employment testing. In recent years, some researchers have attempted to put dollar values on the performance gains from testing. The proponents of validity generalization have been particu- larly notable on this count. The committee's analysis provides a critique of the specific claims of dollar gains that would result from use of the VG- GATB Referral System throughout the Employment Service, claims that have been developed in USES technical reports and repeated elsewhere. Findings and Conclusions The often-repeated claim that use of the GATE by the Employment Service will produce a gain of $79.36 billion is unfounded on close examination. It is based on overestimates of validities, of the variability of worker productivity, and of the selectivity of employers using the Employment Service. For example, it assumes that only 1 in 10 Employ- ment Service applicants finds a job, an assumption that, if extended to the whole economy, would produce perhaps a very productive work force, but also 90 percent unemployment. Potential Effects of the VG-GATB Referral System (Chapters 10 and 11) Although very little systematic information is available from the pilot studies, the committee gathered enough information to be able to suggest certain likely effects of the VG-GATB Referral System on Employment Service clients. Findings and Conclusions A universal testing program would have side effects whose economic and social consequences are not well established. Certain types of individual employers would benefit, although the benefits would tend to attenuate as more and more employers who compete in the same labor market adopt VG-GATB procedures. Certain types of job seekers would likewise ben- efit. However, were the VG-GATB system the only mode of referral through the Employment Service, the lowest-scor~ng applicants would be consigned to receiving little or no assistance in finding work, when in fact many such applicants could perform satisfactorily on many jobs. If the VG-GATB Referral System did not include the kind of score adjustments currently made to the scores of black, Hispanic, and in some cases, Native American applicants, it would have a severe adverse impact on the employment opportunities of members of those demographic groups. In the committee's judgment, the VG-GATB Referral System is
~ O SUMMARY not viable without some sort of score adjustments so long as the govern- ment is committed to a policy of equal employment opportunity that looks to the erects of employment practices on racial and ethnic minority groups. Veterans are accorded referral priority as a matter of statutory law. Because it would dramatically alter referral procedures in the Employ- ment Service, the VG-GATB Referral System has been of some concern to veterans organizations. The states have adopted a variety of mecha- nisms for incorporating veterans preference in the VG-GATB system, the effects of which range from absolute preference to effectively no preference. The method of according veterans preference in a test-based referral system that seems most compatible with the statutory grant of preference to qualified veterans would be the addition of some number of points before conversion of the scores to percentiles. When Should the GATB Not Be Used? (Chapter 11) Any policy promoting greater use of the GATB for referral should be accompanied by clear guidelines outlining when its use is not appropriate. There are specific populations, such as people with certain handicapping conditions and people who do not have a command of the English language, for whom the GATB is simply not suitable as the main referral mechanism. There are also less clearly identifiable types of job seekers who will not be adequately served by the VG-GATB Referral System. For example, during the course of site visits to local Job Service offices, we learned that there are some communities that are extremely resistant to testing; one pilot test of the VG-GATB was discontinued because people in the area refused to use the Job Service if they had to take a test. Exclusive use of test-based referral would serve the interests of neither employers nor job seekers in such communities. Findings and Conclusions The GATB is not such a good predictor of job performance that traditional and alternative referral techniques should be abandoned. Its best use is to supplement current methods rather than replace them. Forcing people to take the GATB as a condition for receiving job placement services serves no one s best interests. Filling job orders automatically and solely through the VG-GATB Referral System is not a prudent use of USES resources. For people with disabilities, the GATB is appropriate primarily as a supplement to counseling rather than as the main referral instrument. Job counselors should continue to provide their main path of referral.
SUMMARY ~ ~ SUMMARY OF CENTRAL RECOMMENDATIONS The committee's most important recommendations, summarized here, appear in full as Chapter 14 of this report. The findings and conclusions that provide the underlying rationale for these recommen- dations will be found at the end of Chapters 4 through 13, as will subsidiary recommendations. Operational Use of the VG-GATB Referral System Any expansion of the VG-GATB Referral System should be accom- panied by a vigorous program of research and development. Two inade- quacies in the testing program must be corrected: a. Test Security: It is essential that measures be taken to provide for test security to ensure fairness to examiners. Most important is the regular development of alternate forms of the test and frequent replacement of old forms. In addition, USES must produce, and the states must enforce, clearly specified security procedures of the kind used to maintain the confidentiality of other large-scale test batteries. b. Test Speededness: A research and development project should be put in place to reduce the speededness of the GATB. A highly speeded test, one that no one can hope to complete, is vulnerable to distortion from coaching. If this characteristic of the GATB is not altered, the test will not retain its validity when given a gatekeeping function that is widely recognized. 2. We recommend that no job seeker be obliged to take the GATB; every local office that uses VG-GATB referral should maintain an alternative referral path for those who choose not to take the test. 3. Because tests provide only partial information about future job performance, we recommend that Job Service offices that adopt the VG-GATB Referral System continue to use multiple criteria in choosing which applicants to refer. Referral Methods 4. The committee recommends the continued use of score adjust- ments for black and Hispanic applicants in choosing which Employment Service registrants to refer to an employer, because the effects of imperfect prediction fall more heavily on minority applicants as a group due to their lower mean test scores. We endorse the adoption of score adjustments that give approximately equal chances of referral to able minority applicants and able majority applicants: for example, within
~ 2 SUMMARY group percentile scores, performance-based scores, or other adjust- ments. 5. If the within-group score adjustment strategy is chosen: a. We recommend that USES undertake research to develop more adequate norming tables. b. An attempt should be made to develop norms for homogeneous groups of jobs, at the least by job family, but if possible by more cohesive clusters of jobs in Job Families IV and V. To correctly compute within-group percentiles, USES must estimate the average difference between the majority-group scores and the minority-group scores in applicants for homoge- neous groups of jobs. 6. We also recommend that USES study the feasibility of what we call a Combined Rules Referral Plan, under which the referral group is composed of all those who would have been referred either by the total-group or by the within-group ranking method. Score Reporting The decision concerning what kind of scores to report to employers and job applicants is separate from the choice of methods to use to create the referral pool. The uppermost concern in reporting GATE scores should be to provide the most accurate and informative estimate of future job performance possible. 7. The committee recommends that two scores be reported to employ- ers and applicants: a. a within-group percentile score with the corresponding norm group identified and b. an expectancy score (derived from the total-group percentile score) equal to the probability that an applicant's job perfor- mance will be better than average. Promotion of the VG-GATB Referral Program 8. Given the modest validities of the GATE for the 500 jobs actually studied, given our incomplete knowledge about the relationship between this sample and the remaining 1 1,500 jobs in the U.S. economy, given the Department of Justice challenge to the legality of within-group scoring and the larger philosophical debates about race-conscious mechanisms and the known problems of using a test with severe adverse impact, and given the primitive state of knowledge about the relationship of individual
SUMMARY ~ 3 performance and productivity of the firm, we recommend that the claims for the testing program be tempered and that employers as well as job seekers be given a balanced view of the strengths and weaknesses of the GATE and its likely contribution in matching people to jobs. 9. Given the primitive state of knowledge about the aggregate eco- nomic effects of better personnel selection, we recommend that Employ- ment Service officials refrain from making any dollar estimates of the gains that would result from test-based selection. 10. The Employment Service should make clear to employers using the VG-GATB Referral System that responsibility for the relevance of selection criteria and the effects of selection on the composition of their work force lies directly with the employer. Use of tests approved by the U.S. Employment Service does not alter this allocation of responsibility under federal civil rights law. Pilot Studies There is too little evidence based on controlled, rigorous studies of the effects of using the VG-GATB Referral System for the committee to be able to assure policy makers at the Department of Labor that anticipated improvements have indeed occurred; this is not to say that they have not occurred. 11. If USES decides to continue the VG-GATB Referral System, it should undertake a series of carefully designed studies to establish more solidly the efficiencies that are believed to result. 12. This research should be a cooperative effort, involving federal and State Employment Service personnel and employers. USES should encourage state Employment Security Agencies that deal with large employers (e.g., Michigan) and states that have fully articulated VG systems in place (e.g., Virginia, Utah, Oklahoma) to take a leading role in conducting studies to demonstrate the efficacy of the VG-GATB Referral System. 13. We also recommend that the employer community, as a potentially major beneficiary of an improved referral system, take an active part in the effort to evaluate the VG-GATB Referral System. Special Populations Veterans 14. If government policy is to strike a balance between maximizing productivity and preference for veterans in employment referral through the VG-GATB Referral System, the Employment Service should adjust
|4 SUMMARY veterans' VG-GATB scores by adding a veterans' bonus of some number of points before conversion to percentiles. Unadjusted expectancy scores should also be reported to employers and job seekers. It should be noted on the referral slip that the percentile score has been adjusted for veterans' preference. 15. The Employment Service should continue to meet the needs of disabled veterans through individualized counseling and placement services. People with Handicapping Conditions 16. For applicants with handicapping conditions, we recommend the continued use of job counselors to make referrals. 17. Measures should be taken to ensure that no job order is filled automatically and solely through the VG-GATB system. Job counselors who serve handicapped applicants, disabled veterans, or other popula- tions with special needs must have regular access to the daily flow of job orders. 18. To ensure that handicapped applicants who can compete with tested applicants are given that opportunity, the GATE should be used when feasible to assess the abilities of handicapped applicants. But the test should be used to supplement decision making, not to take the place of counseling services. 19. Because special expertise in assessing the capabilities of people with handicaps is necessary and available, we recommend that the Department of Labor encourage closer coordination between state reha- bilitation agencies and the state Employment Service agencies. States should consider placing state rehabilitation counselors in local employ- ment service offices that serve a sizable population of handicapped people.
PART I BACKGROUND AND CONTEXT Part I provides the setting for the committee's study. Chapter 1 describes the difficult policy issues that officials need to consider as they decide on the future of the General Aptitude Test Battery and of the score adjustments used to mitigate the adverse effects of testing on minority job seekers. Chapter 2 focuses on the divergent conceptions of equity that have emerged as a product of the civil rights revolution of the 1960s and 1970s; it also traces the ambivalence present in American society, as it is reflected in government policy and law. Chapter 3 is an overview of the operations of the U.S. Employment Service, the federal-state system for bringing job seekers and employers together.