Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 73
.
The GATB: Its Character and
Psychometric Properties
The General Aptitude Test Battery (GATB) has been in use for more
than 40 years, and for most of that time it has remained virtually
unchanged. Through the years it has been used in state Employment
Service offices for vocational counseling and referral and in addition has
been made available for testing and counseling to high schools and
technical schools, labor union apprenticeship programs, public and pri-
vate vocational rehabilitation services, and other authorized agencies.
The obvious first task for the committee was to sift through the years of
research and experience with the GATB to assess its suitability as the
centerpiece of the proposed VG-GATB Referral System. We looked
carefully at the development and norming of the instrument, its psycho-
metric properties, and evidence that it actually measures the aptitudes it
claims to measure. We also looked with some care at four other widely
used tests of vocational aptitudes in order to get a sense of the relative
quality of the GATB.
This chapter describes the test and summarizes our analysis of its
psychometric properties (a more detailed discussion appears in Jaeger,
Linn' and Tesh, Appendix A). Chapter 5 addresses the two shortcom-
ings that the committee feels must be dealt with if the GATB is to
assume a central role in the Employment Service system of matching
people to jobs: namely, the highly speeded nature of the test, which
makes it vulnerable to coaching, and the paucity of available test forms,
which makes it vulnerable to compromise.
73
OCR for page 74
74 ANALYSIS OF THE GENERAL APTITUDE TEST BAITER Y
DEVELOPMENT OF THE GATB
In the period 1942-1945, the U.S. Employment Service (USES) de-
cided to develop a "general" aptitude battery that could be used for
screening for many occupations. Drawing on the approximately 100
occupation-specific tests developed since 1934, USES staff identified a
small number of basic aptitudes that appeared to have relevance for many
jobs (U.S. Department of Labor, 1970:171:
1. Intelligence (G), defined as general learning ability;
2. Verbal aptitude (V), the ability to understand the meanings of words
and language;
3. Numerical aptitude (N), the ability to perform arithmetic operations
quickly and accurately;
4. Spatial aptitude (S), the ability to think visually of geometric forms
and to comprehend the two-dimensional representation of three-dimen-
sional objects;
5. Form perception (P), the ability to perceive pertinent detail in
objects or in pictorial or graphic material;
6. Clerical perception (QY, the ability to perceive pertinent detail in
verbal or tabular material a measure of speed of perception that is
required in many industrial jobs even when the job does not have verbal
or numerical content;
7. Motor coordination (K), the ability to coordinate eyes and hands or
fingers rapidly and accurately in making precise movements;
8. Finger dexterity (F), the ability to move fingers and manipulate small
objects with the fingers rapidly and accurately; and
9. Manual dexterity (M), the ability to move the hands easily and
skillfully.
Four of the nine aptitudes-clerical perception, motor coordination,
finger rl~.xt~.ritv ~n`1 m~n''n1 `1exter'tv involve speed calf work as a major
~e~ ,, ~,,~ ~
component.
From the USES inventory of job-specific tests, those providing the best
measure of each of the nine basic aptitudes (based on several statistical
criteria) were selected for inclusion in the new General Aptitude Test
Battery, which became operational in 1947. The operational edition of the
GATB, B-1002, was produced in two forms, A and B. Form A was
reserved for the use of Employment Service offices; Form B was used for
validation research and for retesting and was made available to other
authorized users for vocational counseling and screening. It was not until
1983 that two additional forms, Forms C and D, of GATB edition B-1002
were introduced.
OCR for page 75
CHARACTER AND PSYCHOMETRIC PROPERTIES 75
THE STRUCTURE OF THE GATB
The General Aptitude Test Battery consists of 12 separately timed
subtests, which are combined to form nine aptitude scores. Eight of the
subtests are paper-and-pencil tests, and the remainder are apparatus
tests. Two of the paper-and-pencil subtests (name comparison and mark
making), as well as all four subtests that require manipulation of objects,
are intended to measure aptitudes that involve speed of work as a major
component. Each subtest is scored as number correct, with no correction
r
for guessing.
The following descriptions of the subtests in Forms A and B of the
GATB are based on material in Section III of the Manualfor the USES
GATB (U.S. Department of Labor, 1970:15-16~. Examples of various
item types are drawn from a pamphlet published by the Utah Department
of Employment Security.
Subtest 1: Name Comparison
This subtest contains two columns of 150 names. The examinee
inspects each pair of names, one from each column, and indicates
whether the names are the same or different. There is a time limit of 6
minutes, or 2.40 seconds per item. This is a measure of the aptitude of
clerical perception, Q.
Sample Item:
Which pairs of names are the same (S) and which are different
(D)?
1. W. W. Jason W. W. Jason
........... Johnson & Johnsen
......... Harold Jones and Co.
2. Johnson & Johnson
3. Harold Jones Co . . .
Subtest 2: Computation
This subtest consists of arithmetic exercises requiring addition, sub-
traction, multiplication, or division of whole numbers. The items are
presented in multiple-choice format with four alternative numerical
answers and one "none of these." There are 50 items to be answered in
6 minutes, or 7.20 seconds per item. This is one of two measures of
numerical aptitude, N.
Sample Item:
Add (I) 766 (A) 677 (C) 777
11 (B) 755 (D) 656
(E) none of these
OCR for page 76
76 ANALYSIS OF THE GENES ETUDE TEST BAKERY
Subtest 3: Three-Dimensional Space
This subtest consists of a series of exercises, each containing a stimulus
figure and four drawings of three-dimensional objects. The stimulus figure
is pictured as a flat piece of metal that is to be bent, rolled, or both. Dotted
lines indicate where the stimulus figure is to be bent. The examinee
indicates which one of the four drawings of three-dimensional objects can
be made from the stimulus figure. There are 40 items with four options
each, to be completed in 6 minutes, or 9.00 seconds per item. This subtest
is one of three measures of intelligence, G. and the only measure of spatial
aptitude, S.
Sample Item:
At the left in the drawing below is a flat piece of metal. Which
object to the right can be made from this piece of metal?
i
Ill
~ i\:
am/
it\
C D
Subtest 4: Vocabulary
Each item in this subtest consists of four words. The examinee
indicates which two of the four words have either the same or opposite
meanings. There are 60 items, each having six response alternatives (all
possible pairs from four). The time limit is 6 minutes, or 6.00 seconds
each. This subtest is one of three measures of intelligence, G. and the only
measure of verbal aptitude, V.
Sample Items:
Which two words have the same meaning?
(a) open (b) happy (c) glad (d) green
Which two words have the opposite meaning?
(a) old (b) dry (c) cold (d) young
OCR for page 77
CHARACTER AND PSYCHOMETRIC PROPERTIES 77
Subtest 5: Tool Matching
This subtest consists of a series of exercises containing a stimulus
drawing and four black-and-white drawings of simple shop tools. Dif-
ferent parts of the tools are black or white. The examinee indicates which
of the four black-and-white drawings is the same as the stimulus drawing.
There are 49 items with a time limit of 5 minutes, or 6.12 seconds per item.
This is one of two measures of form perception, P.
Sample Item:
At the left in the drawing below is a tool. Which object to the
right is identical? Variations exist only in the distribution of
black and white in each drawing.
.
Subtest 6: Arithmetic Reasoning
This subtest consists of a number of arithmetic problems expressed
verbally. There are five alternative answers for each item, with the fifth
being "none of these." There are 25 items with a time limit of 7
minutes, or 16.80 seconds per item. This subtest is one of three
measures of intelligence, G. and one of two measures of numerical
aptitude, N.
Sample Item:
A man works ~ hours a day, 40 hours a week. He earns $1.40
an hour. How much does he earn each week?
(A) $40.00
(B) $44.60
(C) $50.60
(D) $56.00
(E) none of these
Subtest 7: Form Matching
This subtest presents two groups of variously shaped line drawings.
The examinee indicates which figure in the second group is exactly the
same size and shape as each figure in the first or stimulus group. Total
test time is 6 minutes, or 6.00 seconds per item. This subtest is one of
two measures of form perception, P.
OCR for page 78
78 ~AlYSISOF THE GENE~LAPTITUDE TEST BA"ERY
Sample Item:
For questions 9 through 12 find the lettered figure exactly like
the numbered figure.
I\ r
/9\ 1:
~ '
12 ~
' ( B )
Lid ~
(The actual test would have 25 or more items within a group.)
Subtest 8: Mark Making
This subtest consists of a series of small empty boxes in which the
examinee is to make the same three pencil marks, working as rapidly as
possible. The marks to be made are short lines, two vertical and the third
a horizontal line beneath them: ~ . There are 130 boxes to be completed
in 60 seconds, or 0.46 seconds per item. This subtest is the only measure
of motor coordination, K.
Subtest 9: Place
The equipment used for Subtests 9 and 10 consists of a rectangular
pegboard divided into two sections, each containing 48 holes. The upper
section contains 48 cylindrical pegs. In Subtest 9, the examinee moves the
pegs from the holes in the upper part of the board and inserts them in the
corresponding holes in the lower part of the board, moving two pegs
simultaneously, one in each hand. This performance (moving 48 pegs) is
done three times, with the examinee working rapidly to move as many of
the pegs as possible during the time allowed for each of the three trials, 15
seconds or 0.31 second per peg. The score is the number of pegs moved,
summed over the three trials. There is no correction for dropped pegs.
This test is one of two measures of manual dexterity, M.
OCR for page 79
CHARACTER AND PSYCHOMETRIC PROPERTIES 79
/ ~; ,
~ /\\
--Mu-\
~. ~
-
-
Subtest 10: Turn
For Subtest 10, the lower section of the board contains the 48
cylindrical pegs. The pegs, which are painted in two colors-one end red
and the other end white" all show the same color. The examinee moves
a wooden peg from a hole, turns the peg over so that the opposite end is
up, and returns the peg to the hole from which it was taken, using only the
For Right-Handed Examinees
01~ 813 01
824 023 821321 into
iS2 a:. 830 0" 828
8~e 8" 037 ~S.
018
_ 426
043 042
(Examinee stands here)
OCR for page 80
80 ANALYSIS OF THE GENE~L APTITUDE TEST BAKERY
preferred hand. The examinee works rapidly to turn and replace as many
of the 48 cylindrical pegs as possible during the time allowed, 30 seconds.
Three trials are given for this test. The score is the number of pegs the test
taker attempted to turn, summed over the three trials. The time allowed
is 0.63 second per peg and there is no correction for errors. This subtest
is one of two measures of manual dexterity, M.
Subtest 11: Assemble
The equipment used for Subtests 11 and 12 consists of a small
rectangular board (finger dexterity board) containing 50 holes and a rod to
one side, and a supply of small metal rivets and washers. In Subtest 11,
the examinee takes a small metal rivet from a hole in the upper part of the
board with the preferred hand and at the same time removes a small metal
washer from a vertical rod with the other hand; the examinee puts the
washer on the rivet and inserts the assembled piece into the correspond-
ing hole in the lower part of the board using only the preferred hand. The
examinee works rapidly to move and assemble as many rivets and washers
as possible during the time allowed. There is one scored trial of 90 seconds,
or 1.80 seconds per rivet. The score is the number of rivets moved; there
is no correction for dropped rivets or for moving rivets without washers.
This subtest is one of two measures of finger dexterity, F.
Subtest 12: Disassemble
The equipment for this subtest is the same as that described for Subtest
11. The examinee removes the small metal rivet of the assembly from a
hole in the lower part of the board, slides the washer to the bottom of the
board, puts the washer on the rod with one hand and the rivet into the
corresponding hole in the upper part of the board with the other
(preferred) hand. The examinee works rapidly to move and replace as
many rivets and washers as possible during the time allowed. There is one
timed trial of 60 seconds, or 1.20 seconds per rivet. The score is the
number of rivets moved; there is no correction for dropped rivets or
washers. This subtest is one of two measures of finger dexterity, F.
HOW GATB SCORES ARE DERIVED
There are more than 750 items on the GATB all together. But an
applicants score is not simply the sum of the correct answers on each
subtest. The generation of GATB scores from subtest scores involves a
number of conversion procedures intended to provide the scores with
meaning and to suitably standardize and weight subtest scores in the
OCR for page 81
CHARACTER AND PSYCHOMETRIC PROPERTIES ~ ~
6<,
1
500450 40g 35g 300 2S~ 200 150 109 SO
490440 390 349 299 249 199 ldg 99 4g
480439 389 339 28023g 189 139 89 3g
470 42g 3g 3Q tp 29 179 12g 7g 2g
4Cig 4 1g 36g 31g 26g 219 1 6g 1 1g hi 1g
O
500 45O 440 TO 30O 2sO TO 150 10O SO
49O 440 HO 34O ~ TO 1~ 14O 90 40
4lO 430 38O 33O TO TO ·80 13O SO 3O
47O TO 3~ HO TO 2~ 17O 120 TO SO
46O 410 36O 31O 26O 210 t6O 110 .0 ·0
(Examinee sits here.)
Subtest 11.
OCR for page 82
82 ANALYSIS OF THE GENERAL APTITUDE TEST BATTERY
50O 4SO 4~O 3sO 30O 2SO 20O lsO 100 50
49O 44O 39O 34O 29O 24O 190 1 4O 9O
4ao 43G 38O 330 28O 23O 18O 130
~0
470 41~ 3< 32O 2~ 22O 170 1 ~lO
"0 4,0 3dO 31O 26O IlO 16O 110 GO
o
4O
30
20
10
~4~ ~03 S~ ~ 580258
5 3 3 2 ~ 1 1
4984J3333432982481~139343
4 834 23 ~ 3 382 ~2 3~1 ~1 ~ 8~ 38
47~4~3~3282~2~1 ~12~ 78 28
46~41836331826328168tl862 18
(Examinee sits here.)
Subte-st 12.
OCR for page 83
CHARACTER AND PSYCHOMETRIC PROPERTIES 83
various forms of the test. This section describes the mechanics of
producing GATB scores under traditional procedures and under the new
VG-GATB Referral System (U.S. Department of Labor, 1970, 1984c). It
also looks briefly at the development of GATB norms and the equating of
test forms, both of which influence the conversions made.
Obtaining GATB Scores
There are three steps in obtaining GATB scores under the traditional
procedures:
1. The first step is to calculate the number of items correct for each of
the 12 subtests. There is no penalty for wrong answers.
2. The second step is to convert each raw score so that it is referenced
to the norming population. The specific conversion depends on which
aptitude the subtest score will be used for (arithmetic reasoning has a
different value for G. intelligence, than for N. numerical aptitude), the
form of the GATB that was administered, and the type of answer sheet
used. There is a conversion table for each subtest for each form of the
GATB. Three of the subtests are components of two different aptitudes
and hence have two conversion tables for each form. Each raw score will
go through two or three transformations in becoming an aptitude score.
3. The third step is to sum the converted scores into aptitude scores.
The conversion tables used to produce aptitude scores are designed to
accomplish three things: first, to put all aptitude scores on a single
measurement scale having a mean of 100 and a standard deviation of 20 in
the norming population; second, to make scores on all operational forms
of the test comparable with one another (so that a score of 109 on the
verbal subtest in Form A means the same as a score of 109 on the verbal
subtest in Form B); and third, to weight the components of an aptitude
score when it consists of more than one subtest.
The new VG-GATB Referral System, in which all jobs are clustered
into one of five job families, and in which percentile scores are computed
on the basis of group identity (black, Hispanic, other), requires two
further steps:
4. The conversion of aptitude scores to "B" scores. There are two
aspects to the process: the aptitudes are reduced to three composites- a
cognitive composite (G + V + N); a perceptual composite (S + P + Q);
and a psychomotor composite (K + F + M) and the composites are
accorded different relative weights for each of the five job families
according to their importance in predicting job performance in each
family. There is a conversion table for each of the three composites, and
each table has conversions for each of the five job families, for a total of
OCR for page 88
88 ANALYSIS OF THE GENE~[ APTITUDE TEST BAKERY
a function of the time interval between test administrations for the G
aptitude. The stability coefficients of the GATB perceptual aptitudes are
somewhat smaller than those of the cognitive aptitudes, but again,
compare well to those of corresponding aptitudes in other test batteries.
The stability coefficients of psychomotor aptitudes F and M are
substantially smaller than those of other aptitudes assessed by the GATB
and, if these aptitude scores were to be used individually for making
selection or classification decisions, would be regarded as unacceptably
small. However, this is probably not a problem for the VG-GATB system,
since referral decisions are based on composites of aptitude scores.
Although direct estimates of the stability of the operational GATB
aptitude composites (such as KFM) are not available, the estimated
stability coefficient over a time interval of two weeks or less for a
unit-weighted composite of abilities K, F. and M is 0.81. This value is
sufficiently large not to preclude interpretation of scores for individual
examiners. (Additional information on the reliability of the GATB can be
found in Appendix A.)
CONSTRUCT VALIDITY ISSUES
In this section we report on evidence that bears on USES claims that
the subtests of the GATB measure the aptitudes with which they are
identified in the GATB Manual (U.S. Department of Labor, 1970) and
nothing more. In particular, the committee conducted an exhaustive
review of the literature on convergent validity, which reports the strength
of relationships between subtests of the GATB and corresponding
subtests of other test batteries. Evidence of strong positive relationships
between measures purportedly of the same construct is supportive of
construct validity claims for all related measurement instruments. Thus
the claim that the subtests of the GATB measure the aptitudes attributed
to them (e.g., intelligence, verbal aptitude, spatial aptitude) would be
enhanced by data of this sort and weakened if small to moderate
correlations between corresponding subtests were to be found. (A de-
tailed discussion of convergent validity findings can be found in Appendix
A.)
Chapter 14 of Section III of the GATB Manual (U.S. Department of
Labor, 1970), entitled "Correlations with Other Tests," is a primary
source of convergent validity evidence. That chapter contains correlation
matrices resulting from studies of the GATB and a variety of other
aptitude tests and vocational interest measures. Results for 64 studies are
reported. Since the publication of the GATB Manual, correlations
between various GATB aptitudes or subtests and corresponding subtests
OCR for page 89
CHARACTER AND PSYCHOMETRIC PROPERTIES 89
TABLE 4-1 Summary Statistics for Distributions of Convergent
Validity Coefficients for the Cognitive GATB Aptitudes (G. V, and N),
the Perceptual GATB Aptitudes (S. P. and Qj, and the Psychomotor
Aptitudes (K, F. and M)
Number First Third
Aptitude of Studies Minimum Quartile Median Quartile Maximum
G 51 .45 .67 .75 .79 .89
V 59 .22 .69 .72 .78 .85
N 53 .43 .61 .68 .75 .85
S 19 .30 .58 .62 .70 .73
P 8 .38 .44 .47 .57 .65
Q 16 .24 .38 .50 .60 .76
K 1 .58 .58 .58 .58 .58
F 2 .37 .37 .39 .41 .41
M 1 .50 .50 .50 .50 .50
of other test batteries have been provided in studies by Briscoe et al.
(1981), Cassel and Reier (1971), Cooley (1965), Dong et al. (1986),
Hakstian and Bennett (1978), Howe (1975), Kettner (1976), Kish (1970),
Knapp et al. (1977), Moore and Davies (1984), O'Malley and Bachman
(1976), and Sakolosky (1970~. The sizes and compositions of examinee
samples used in these studies are diverse, as are the aptitude batteries
with which GATB subtests and aptitudes were correlated. They range
from 40 ninth-grade students who completed both the GATB and the
Differential Aptitude Test Battery (DAT), to 1,355 Australian army
enlisters who completed the GATB and the Australian Army General
Classification Test. However, in 8 of 13 studies (many of which consid-
ered several independent samples of examiners), the samples consisted of
high school students.
Distributions of convergent validity coefficients for the GATB cognitive
and perceptual aptitudes are summarized in Table 4-1 and, for ease of
visual comparison, are depicted in Figure 4-2. As can be seen, the
distributions for the cognitive aptitudes of the GATB (G. V, and N)
provide moderately strong support for claims that these aptitudes are
appropriately named and measured, with median coefficients of .75, .72,
and .68, respectively. The results are based on more than 50 studies of
each aptitude. Corresponding results for the perceptual aptitudes of the
GATB (S. P. and Q) are somewhat less convincing. Data for the
psychomotor aptitudes are so meager (because the GATB is one of very
few tests that attempts to measure them) that judgment on their conver-
gent validity must be withheld.
Although the median convergent validity coefficient observed for the
OCR for page 90
90 ANALYSIS OF THE GENERALAPTITUDE TEST BATTERY
0.0 0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8 0.9 1.0
General Aptitude L:
Verbal Aptitude EN
Numerical Aptitude I ~
_ Spatial Aptitude
S n = 19
p ~ Form Perception
.
Q Clerical Perception
1 1 1
0.0 0.1 0.2
LEGEND: ,
Minimum _
1 1 1 1 1 1 1 1
0.3 0.4 0.5
_
_ ~ ~
First '. Third
Quartile IVled~an Quartile
0.6 0.7 0.8 0.9 1.0
\
Maximum
FIGURE 4-2 Distributions of convergent validity coefficients for GATE cogni-
tive aptitudes (G. V, and N) and GATE perceptual aptitudes (S. P. and Q). The
number of studies (n) on which the results are based are indicated for each
aptitude.
spatial aptitude (S) was respectably large, the corresponding median
values for the form perception (P) and clerical perception (Q) aptitudes
were smaller than would be desired. The three-dimensional-space subtest
is said to measure both intelligence and spatial aptitude and might
therefore require greater reasoning ability and inferential skill than is
typical of measures of spatial aptitude found in other batteries. The name
OCR for page 91
CHARACTER AND PSYCHOMETRIC PROPERTIES 9)
comparison subtest of the GATB appears to tap only a subset of the skills
typically associated with clerical perception.
COMPARISON WITH THE ASVAB AND OTHER TEST BATTERIES
The GATB is one of a number of test batteries used in this country for
vocational counseling or employee selection and classification. In order to
gauge the relative quality of the GATB, the committee reviewed four of
the more widely used of these tests: the Armed Services Vocational
Aptitude Battery (ASVAB), the Differential Aptitude Test, the Employee
Aptitude Survey, and the Wonderlic Personnel Test. For purposes of this
report, we limit our discussion largely to the ASVAB testing program,
since it provides the closest parallel to the way the VG-GATB would
function and might reasonably be considered an appropriate model should
the Employment Service proceed with test-based referral as a major
component of its employment program.
The ASVAB is the cognitive abilities test battery used to select and
classify applicants for military service in the enlisted ranks. It is admin-
istered annually to approximately 1 million applicants for military service,
as well as to an equal number of students in the tenth through twelfth
grades and postsecondary students. (The latter administrations provide
Service recruiters with the names of prospects and provide the schools
with a vocational aptitude test battery for their students at no cost.)
The ASVAB is the most recent in a series of tests, beginning with the
Army General Classification Test of the World War II era, used for initial
screening of potential entrants into military service, for purposes of
classification and assignment, or for both. Introduced in the late 1960s for
use in the DOD Student Testing Program, the ASVAB was officially
adopted in 1976 as the DOD enlistment screening and classification
battery.
In the 13 years of its operational use, new forms of the ASVAB have
been introduced at about four-year intervals. ASVAB Forms 5, 6, and 7
made up the first operational test battery; Form 5 was designated for use
in the student testing program and the latter two in the Enlistment Testing
Program. For enlistment processing, Forms 6 and 7 were replaced by
Forms 8, 9, and 10 in 1980; by Forms 11, 12, and 13 in 1984; and by Forms
15, 16, and 17 in 1989. (In 1984, Form 14 replaced Form 5 as the current
form for school administrations). The three forms introduced in 1980
included certain significant changes in the test battery, including the
deletion of the spatial abilities subtest. The 1984 and 1989 batteries were
developed to be parallel to their predecessor. Among the reasons for this
cycle of new forms is the need to maintain the integrity of the test battery
in the all-volunteer environment. The pressures on military recruiters to
OCR for page 92
92 ANALYSIS OF THE GENERA APTITUDE TEST BAKERY
meet enlistment quotas must be balanced by close attention to test
security.
ASVAB Test Parts
The ASVAB includes 10 separately timed subtests and takes about
three hours to administer. There are eight power subtests (tests for which
speed of work has no influence on an examinee's score) and two speeded
subtests. The test parts are:
1. General science (GS);
2. Arithmetic reasoning (AR);
3. Word knowledge (WK);
4. Paragraph comprehension (PC);
5. Numerical operations (NO) (speeded);
6. Coding speed (CS) (speeded);
7. Auto and shop information (AS);
8. Mathematical knowledge (MK);
9. Mechanical comprehension (MC); and
10. Electronics information (EI).
Four of the subtests AR, WK, PC, and MK-make up the Armed
Forces Qualification Test (AFQT). The AFQT, which is considered a
general measure of trainability, is used to determine eligibility for
enlistment. In addition, each Service has developed its own set of
aptitude composites from the ASVAB subtests, which are used to qualify
applicants for various career fields. For example, the Army uses a
selector composite termed "combat" which includes the ASVAB
Subtests AR + CS + AS + MC.
Speededness of the ASVAB
The eight power subtests of the ASVAB appear not to be speeded.
This is documented in the ASVAB Technical Supplement (U.S. Depart-
ment of Defense, 1984b), which presents a study showing the propor-
tions of eleventh- and twelfth-grade students omitting the last item for
each of the eight ASVAB power subtests. Higher omit rates were
generally shown by the younger students and for the arithmetic
reasoning and word knowledge subtests. However, none of these omit
rates was particularly high. On average, about 7 percent of twelfth-
grade students omitted the last item of the eight subtests. This evidence
permits the assertion that the ASVAB subtests so labeled are indeed
predominantly power tests.
OCR for page 93
CHARACTER AND PSYCHOMETRIC PROPERTIES 93
ASVAB Normative Data
Until 1980, the aptitude levels of military recruits were established with
reference to a normative base representing all males serving in the Armed
Services during 1944 (Uhlaner and Bolanovich, 19521. In 1980, the
Department of Defense, in cooperation with the Department of Labor,
undertook a study called Profile of American Youth to assess the
vocational aptitudes of a nationally representative sample of youth and to
develop current norms for the ASVAB. Subsequent forms of the ASVAB
have been calibrated to this 1980 Youth Population, making it the only
vocational aptitude battery with nationally representative norms i
The 1980 Youth Population norms were based on a sample of 9,173
people between the ages of 18 and 23 who were part of the nationally
representative National Longitudinal Survey of Youth Labor Force
Behavior. The sample included 4,550 men and 4,623 women and con-
tained youth from rural as well as urban areas and from all major census
regions. Certain groups blacks, Hispanics, and economically disadvan-
taged whites-were oversampled to allow more precise analysis than
would otherwise be possible (U.S. Department of Defense, 1982~.
ASVAB Reliabilities
Reliability data are available for the form of the ASVAB administered
to high school students both for the individual subtests and for the
aptitude composites. The reliability estimates reported are alternate-form
reliability coefficients. This approach combines the measure of temporal
stability previously presented for the GATE with the administration of
two forms of the same test so that the risk of distortion due to memory
effects can be avoided.
The alternate-form reliabilities for subtests from ASVAB Forms 8, 9,
and 10 range from .57 to .90 with a median of .79 (U.S. Department of
Defense, 1984b). As would be expected, the reliabilities for the aptitude
composites are higher; the academic composites ranged from .88 to .94,
and the mechanical and crafts composites ranged from .84 to .95 (U.S.
Department of Defense, 1984a).
In comparison, the alternate-form reliabilities for the GATB cognitive
aptitudes are close to .90 and for the perceptual aptitudes are in the low
.80s (U.S. Department of Labor, 1986~.
ASVAB Validities
The ASVAB Test Manual (U.S. Department of Defense, 1984c) pre-
sents tables of validity coefficients for military training, separately by
OCR for page 94
94 ANALYSIS OF THE GENERA ETUDE TEST BAKERY
eight career fields. In all, 11 validity coefficients were provided by the
Army, 47 by the Navy, 50 by the Marines, and 70 by the Air Force. Those
Services reporting validities that were corrected for restriction of range
computed the corrections using the 1980 Profile of American Youth
Population. Validities were reported for both the AFQT and the aptitude
or selector composite used to place recruits in each of the eight career
fields.
There are difficulties in trying to interpret these data. The training
criterion is problematic when self-paced instruction is used or when
courses are graded pass/fai} rather than along a numerical continuum. In
addition, training criteria are dependent on the detail of records main-
tained by the particular training school, which differs by occupational
specialty and by Service.
There are also difficulties in trying to summarize the data, largely
because of differences in what each Service reported. Both the Army and
the Navy reported uncorrected and corrected validities for the AFQT and
the selector composites. The Air Force reported only selector validities,
uncorrected, whereas the Marine Corps reported only corrected validi-
ties, but for both AFQT and the selector composites.
Nevertheless, enough data are presented to make an estimate of
ASVAB validities. As is true of the GATB, there is a broad range of
observed validities; there are examples of marginal predictive power
and a few cases of dramatically high prediction the Navy selector
composite for cryptologic technician produces uncorrected validities of
.60. Over all combinations, we estimate the weighted mean validity of the
AFQT for training to be .33 (uncorrected) and for the selector composites
to be .37 (uncorrected). These correlations are at the same general level
of predictive efficiency as the mean validities we estimate for the GATB
against a training criterion and, as might be expected, somewhat higher
than the validities for a performance criterion (supervisor ratings) (see
Chapter 81.
One trend in the military data that is pertinent in the context of this
study of the GATB and validity generalization is that there is a tendency
for the more job-specific selector composites to produce slightly higher
validities than the AFQT. Of the studies that reported both AFQT and
selector validities, the mean uncorrected selector validities were higher
than the AFQT validities in 11 comparisons, were equal in 3 comparisons,
and were lower in 5.
This pattern in the relative validities of selector composites and the
AFQT is confirmed in more extensive reports of the Service data.
Wilbourn and colleagues' report (1984) on the relationships of ASVAB
Forms 8, 9, and 10 to Air Force technical school grades shows compar-
atively high mean validities for both AFQT and selector composites, with
OCR for page 95
CHARACTER AID PSYCHOMETRIC PROPERTIES 95
TABLE 4-2 Uncorrected Weighted Validities for Training by Selector
Composite and AFQT for Four Air Force Career Fields
Selector Composite
Composite Components N n rcomp rAFQT n
Mechanical GS, AS, MC 19 9,185 .43 .39 483
Administrative WK, PC, NO, CS 7 3,170 .21 .43 453
General WK, AR, PC 16 9,183 .43 .41 574
Electronic GS, AK, MK, EI 26 6,166 .48 .35 237
NOTE: N = number of studies; n = number of examiners; n = average number of
examiners; romp = uncorrected weighted validity of selector composite for training; rAFQT
= uncorrected weighted validity of the Armed Forces Qualification Test for training; GS =
general science; AR = arithmetic reasoning; WK = word knowledge; PC = paragraph
comprehension; NO = numerical operations (speeded); CS = coding speed (speeded); AS
= auto and shop information; MK = mathematical knowledge; MC = mechanical compre-
hension; EI = electronics information.
SOURCE: Wilboun~, James M., Lonnie D. Valentine, Jr., and Malcolm J. Reel 1984.
Relationships of the Armed Services Vocational Aptitude Battery (ASVAB) Forms 8, 9, and
10 to Air Force Technical School Final Grades. AFHRL Technical Paper 84-08. Working
paper. Manpower and Personnel Division, Brooks Air Force Base, Texas.
the selector composites producing slightly higher coefficients in three of
the four aptitude areas (Table 4-2~.
Army data are reported in McLaughlin et al. (1984~. The Army reports
validities for training and for Skill Qualification Tests (job knowledge
tests), based on 92 school classes and 112 groups of test takers, each
group with 100 cases or more. The uncorrected validities of the selector
composite and the general composite (equivalent to the AFQT) in nine
occupational areas are shown in Table 4-3. In six occupational areas, the
selector composite validity was higher, in two areas the mean weighted
AFQT validity was higher, and in the remaining occupational area, the
values were the same.
There is some indication that the speeded nature of certain ASVAB
subtests is what causes the break in the pattern of relative validities.
According to McLaughlin et al. (1984), the reason lies in the lower
validities of the two ASVAB speeded subtests (numerical operations,
coding speed) compared with the higher validity of the two quantitative
subtests (arithmetic reasoning, mathematics knowledge). As Tables 4-2
and 4-3 show, for both the Air Force and the Army, the administrative or
clerical composite includes both speeded subtests but no test of mathe-
matics. The AFQT, which then included a half-weighted numerical
operations subtest plus a full weighted arithmetic reasoning subtest, has
higher validity than composites with more of the speed factor and less of
the quantitative factor.
OCR for page 96
96 ANALYSIS OF THE GENERAL APTITUDE TEST BATTERY
TABLE 4-3 Uncorrected Weighted Validities for Training and SQT
by Selector Composite and General Composite for Nine Army Career
Fields
Selector Composite
Composite Components N n rComp rGeneral n
Clerical CS, NO, WK, PC 16 10,368 .27 .39648
Combat CS, AR, MC, AS a 14,266 .33 .311,783
Electronics AR, EI, GS, MK 10 5,533 .29 .26553
Field artillery GS, AR, MC, MK 2 5,602 .36 .342,801
General maintenance GS, AS, MK, EI 14 2,571 .26 .23184
Mechanical maintenance NO, EI, MC, AS 18 7,073 .30 .27393
Operators/food NO, WK, PL, MC, AS 11 8,704 .30 .30791
Surveillance/communications NO, CS, WK, PC, AS 5 3,729 .26 .34746
Skilled technical WK, PC, MK, MC, GS 14 7,061 .33 .32504
NOTE: Genera = Uncorrected weighted validity of the General Composite for Training
and Skill Qualifying Test. See note in Table 4-2 for identification of other components.
SOURCE: Based on McLaughlin, Donald H., Paul G. Rossmeissl, Lauress L. Wise,
David A. Brandt, and Ming-mei Wang. 1984. Validation of Current and Alternative Armed
Services Vocational Aptitude Battery (ASVAB) Area Composites: Based on Training and
Skill Qualification Test (SQT) Information on Fiscal Year 1981 and 1982 Enlisted Acces-
sions. Technical Report 651. Alexandria, Va.: U.S. Army Research Institute for the
Behavioral and Social Sciences.
The reason the Air Force validities are higher than those for the Army
is not clear. McLaughlin et al. (1984) suggested that the Army's adoption
in the 1970s of criterion-referenced assessment for technical training
courses (i.e., pass/fail), and the simultaneous conversion of many courses
into a self-paced mode, led to a large reduction in the psychometric
quality of available training measures for validation purposes. However,
despite any difference in overall validities for these two Services, the
appropriate selector composite is a slightly but generally better predictor
than the general composite, or AFQT.
CONCLUSIONS
GATB Properties
1. In terms of the stability of scores over time and stability between
parallel forms of the test, the GATB exhibits acceptable reliabilities. The
reliabilities of the cognitive aptitudes are particularly high and compare
well with those of other tests used for selection and classification. The
stability coefficients of the perceptual aptitudes are somewhat smaller,
but well within the acceptable range. The reliabilities of the individual
psychomotor subtests are low, although not so low for the psychomotor
composite as to preclude its use.
OCR for page 97
CHARACTER AND PSYCHOMETRIC PROPERTIES 97
2. Our review of a very large number of convergent validity studies
provides moderately strong support for claims that the subtests of the
GATB measure the cognitive constructs they purport to measure. The
evidence for the perceptual aptitudes is mixed; the spatial aptitude test
bears a respectably large relationship to similarly named subtests in other
batteries, but evidence for the form perception and clerical perception
subtests is less convincing. Since most aptitude test batteries do not have
equivalent psychomotor subtests, this type of analysis is not useful in
trying to establish that the K, F. and M subtests are appropriate measures
of a psychomotor construct.
3. Only four operational forms of the GATB have been introduced in its
42-year history: Forms A and B were introduced in 1947 and were replaced
by Forms C and D in 1983. So long as the GATB was used primarily as a
counseling tool, this lack of new forms was probably no serious problem. If,
however, the VG-GATB Referral System becomes a regular part of Em-
ployment Service operations and the GATB takes on an important gatekeep-
ing function, then the frequent production of new forms, similar to the
program for developing new forms of the Armed Services Vocational
Aptitude Battery, will be essential to maintain the integrity of the GATB.
4. The scoring system for the VG-GATB seems unduly complex. It
involves so many conversions, the exact nature of which is not fully
documented, that the link between raw scores and the final within-group
percentile scores is clouded.
5. The norms for the GATB are problematic. The General Working
Population Sample, developed in the early 1950s to be representative of the
work force as it appeared in the 1940 census, is at this point a very dated
reference population. There have been enormous structural changes in the
economy and the work force in the intervening years. The more recent
norms, developed for the computation of within-group percentile scores,
are based on convenience samples that can claim neither to be nationally
representative nor scientifically drawn from populations of those who
would be applicants for homogeneous clusters of jobs.
6. Our review of the available evidence regarding test equating indi-
cates that Subtests 1 through 8 of GATB Forms A, C, and D (evidence is
lacking on Form B) are sufficiently related to one another that the scores
can be considered interchangeable after equating. The scores from the
alternate forms of the psychomotor subtests (Subtests 9 through 12),
however, should not be considered interchangeable.
Comparison with Other Test Batteries
7. On two dimensions of central importance predictive validity for
training criteria and test reliability the GATB compares quite well with
OCR for page 98
98 ANALYSIS OF THE GENE~L APTITUDE TEST BAKERY
the other test batteries we reviewed. For example, the mean uncorrected
validities of the Armed Forces Qualifying Test for a training criterion we
estimate to be about .33 across all Services (although some Services
report substantially higher validities); for the GATB, the corresponding
figure for predicting training criteria would be about .35 overall and .30 for
studies since 1972. With the exception of one subtest (arithmetic reason-
ing), GATB reliabilities are also about the same as those of other test
batteries.
S. However, if the GATB is to take on a much more important role in
Employment Service operations if, in other words, it takes on a major
gatekeeping function like that exercised by the ASVAB then it will need
to be supported by a similar program of research and documentation. The
areas in which the GATB program does not compare well with the best of
the other batteries test security, the production of new forms, equating
procedures, the strength of its normative data, the integrity of its power
tests will take on heightened significance.
RECOMMENDATIONS
1. If the VG-GATB Referral System becomes a regular part of Em-
ployment Service operations, we recommend a research and development
program that allows the introduction of new forms of the GATB at
frequent intervals. The Department of Defense program of form develop-
ment and equating research in support of the ASVAB provides an
appropriate model.
2. Test equating will become far more important should the GATB
become a central part of the Employment Service job referral system,
because such use will necessitate the regular production of new forms of
the test. The committee recommends both better documentation of
equating procedures and special attention to creating psychometrically
parallel forms of the apparatus-based subtests.
3. The USES long-term research agenda should include consideration
of a simplified scoring system for the GATB.
4. The USES long-term research agenda should give attention to
strengthening the normative basis of GATB scores. The General Working
Population Sample should be updated to represent today s jobs and
workers. In addition, more appropriate samples need to be drawn to
support any score adjustment mechanisms adopted.
5. More reliable measurement of the psychomotor aptitudes deserves a
place on the GATB research agenda.
Representative terms from entire chapter:
psychometric properties