Range Restriction Adjustments in the Prediction of Military Job Performance

Stephen B. Dunbar and Robert L. Linn

INTRODUCTION

Common practice in establishing the criterion-related validity of a test or battery of tests to be used for selection or classification involves performing a linear regression of a relevant measure of performance on the test battery and reporting various descriptive statistics that assess the magnitude of the linear relationship between predictor(s) and the criterion. Although alternatives to the familiar correlation coefficient and perhaps less familiar regression slopes and intercept exist, these alternatives make the task of characterizing criterion-related validity of the battery more cumbersome. The correlation coefficient, in particular, allows for ready comparisons of predictive validity across occupational categories as well as across different predictor and criterion measures, so that its widespread use is not surprising. In the context of military performance assessment, correlations allow for comparisons of predictive validity over time and across Services. Such comparisons are important components in the validation of selection composites that are used for prediction in a wide variety of occupational categories such as military training programs.

Whatever appeal correlations have for these and other reasons must be weighed against certain limitations, several of which are especially relevant in the context of criterion-related validity studies. When a correlation coefficient is used to make inferences about the predictive validity of a test



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 127
Performance Assessment for the Workplace: VOLUME II Range Restriction Adjustments in the Prediction of Military Job Performance Stephen B. Dunbar and Robert L. Linn INTRODUCTION Common practice in establishing the criterion-related validity of a test or battery of tests to be used for selection or classification involves performing a linear regression of a relevant measure of performance on the test battery and reporting various descriptive statistics that assess the magnitude of the linear relationship between predictor(s) and the criterion. Although alternatives to the familiar correlation coefficient and perhaps less familiar regression slopes and intercept exist, these alternatives make the task of characterizing criterion-related validity of the battery more cumbersome. The correlation coefficient, in particular, allows for ready comparisons of predictive validity across occupational categories as well as across different predictor and criterion measures, so that its widespread use is not surprising. In the context of military performance assessment, correlations allow for comparisons of predictive validity over time and across Services. Such comparisons are important components in the validation of selection composites that are used for prediction in a wide variety of occupational categories such as military training programs. Whatever appeal correlations have for these and other reasons must be weighed against certain limitations, several of which are especially relevant in the context of criterion-related validity studies. When a correlation coefficient is used to make inferences about the predictive validity of a test

OCR for page 127
Performance Assessment for the Workplace: VOLUME II battery for a population of applicants or enlistees, its values should ideally be estimated from a random sample of the applicant population. In most settings where predictive validity needs to be established, the only sample available for estimating the desired population values is one containing individuals, selected in part on the basis of scores on the predictor(s) in question, who have remained with a program long enough for criterion scores to be obtained. Besides the difficulties presented by the use of nonrandom samples in calculating correlations, the well-known sensitivity of correlations, slopes, and intercepts to linearity and homoscedasticity in the joint distribution of predictors and criteria is an area of concern. This concern can be magnified by selection effects. This paper provides an overview of standard procedures used to adjust correlations and regression parameters for the effects of selection, commonly referred to as corrections for range restriction. Technical issues related to the accuracy of these adjustments are considered, especially where they are likely to have implications for the types of adjustment procedures appropriate for large-scale predictive validity studies of an aptitude battery like the Armed Services Vocational Aptitude Battery (ASVAB). The paper concludes with a discussion of issues related to the implementation of a set of adjustment procedures for validation studies in the military, where the choice of the reference population, choice of selection variables for making adjustments, and choice of an analytical procedure all have important consequences for the assessment of the predictive validity of present and future versions of the ASVAB. SELECTION EFFECTS IN CORRELATION AND REGRESSION Although the effects of various types of nonrandom selection on correlation coefficients, slopes, and intercepts are well-documented in the psychometric literature (cf. Thorndike, 1949; Gulliksen, 1950; Lord and Novick, 1968), a brief review of these effects will establish the context for technical issues related to their use in studies of criterion-related validity. Figure 1, Figure 2, Figure 3 and Figure 4 illustrate the effects of the usual types of selection on the bivariate scatterplot of a selection test (X) and a performance criterion (Y). In Figure 1, the scatterplot of a sample of 5,000 observations from a bivariate normal population is shown, along with the least-squares regression line of Y on X. The correlation between X and Y in this population is .60. Figure 2 and Figure 3 demonstrate the effects of explicit selection on X and Y, respectively—explicit selection in these examples involves actual truncation of the marginal distribution of the selection variable and is clearly visible by inspection of the scatterplot. When selection is explicit on X, the Y on X least-squares regression line is unaltered because of assumed linearity. However, estimates of the correlation between X and Y are altered because of the reduced

OCR for page 127
Performance Assessment for the Workplace: VOLUME II FIGURE 1 Scatterplot of a criterion (Y) and a predictor (X) in an unselected sample (N = 5,000, ρXY = .60). variance of X in the selected sample. In Figure 2, with only the upper 50 percent of the observations on X included, the resulting correlation of .398 underestimates the population correlation by a substantial amount. When selection is explicit on Y, both the terms in the Y on X least-squares regression and the correlation coefficient are affected. In Figure 3, with only the upper 50 percent of scores on Y included, the regression line (depicted by the broken line) has a smaller slope and larger intercept, while the correlation between X and Y of .412 again underestimates the population value. While explicit selection can occur, particularly when X is a screening device like an admissions test, a more likely situation would depict X as one of several measures used to select individuals. In such a situation one might imagine a third variable, Z, as the explicit selection variable—Z can be

OCR for page 127
Performance Assessment for the Workplace: VOLUME II FIGURE 2 Explicit selection on X. thought of as a kind of composite measure that includes such factors as self or administrative selection as well as scores on other predictors and is positively correlated with X and Y under typical circumstances. That Z can be thought of as a composite measure is reflected in its designation as an ideal discriminant function separating selected and nonselected groups (Cronbach et al., 1977) or as a latent variable underlying the true selection process (Muthén and Jöreskog, 1983). In this case, both X and Y are referred to as incidental selection variables because the selection effects on the bivariate distribution of X and Y are indirect. This type of selection effect also exists when interest focuses on the predictive value of an alternative set of variables imperfectly correlated with the explicit selection variable. In such a situation, the alternative predictors represent incidental selection variables. The more subtle effects of explicit selection on Z are illustrated in Figure 4. The XY scatterplot in this figure is based on a trivariate normal popula-

OCR for page 127
Performance Assessment for the Workplace: VOLUME II FIGURE 3 Explicit selection on Y. tion distribution of X, Y, and Z in which the correlation between Y and each of X and Z is .60 and the correlation between X and Z is .90. The scatterplot contains only the upper 50 percent of the observations on Z, and, as can be seen in the figure, the only evidence for this being a selected sample is a reduction in the variability of the marginal distributions of X and Y. Even when X and the true selection variable are as highly correlated as in this example, the effects of incidental selection yield a least-squares regression line (again depicted by the broken line) with reduced slope and a correlation between X and Y (.414) that underestimates the population correlation substantially. Although the plots in Figure 1, Figure 2, Figure 3 and Figure 4 do provide an indication of the types of selection effects that can occur in practice, they do not show how differential selectivity can complicate the interpretation of correlations based on selected samples. Table 1 illustrates the effects of different degrees of range

OCR for page 127
Performance Assessment for the Workplace: VOLUME II FIGURE 4 Explicit selection on Z. restriction on the correlations of two measures with a criterion; one of the two measures represents the explicit selection variable (Z in the above discussion), while the other represents an incidental selection variable (X). The statistics in the table are based on a trivariate normal population in which the correlations between the criterion, Y, and Z and X are .50 and .45, respectively, and the correlation between Z and X is .50. The selection ratio indicates the proportion of the total sample that is selected into the validation sample on the basis of scores on the explicit selection variable Z. The standard deviation of Z is given in the second column of the table and the correlations between the predictor variables and the criterion in the third and fourth columns. With no selection on Z, the correlations shown in the table are equal to the population values of .50 and .45. However, as the selection ratio de-

OCR for page 127
Performance Assessment for the Workplace: VOLUME II TABLE 1 Illustration of the Effects of Range Restriction for Z on the Correlation of Z and X with a Criterion Measure Y Selection Ration Standard Deviation of Z Correlation with Y     Z X 1.0 1.000 .50 .45 .7 .701 .38 .37 .6 .649 .35 .36 .5 .603 .33 .35 .4 .559 .31 .34 .3 .515 .29 .33 .2 .468 .26 .32 .1 .411 .23 .31 .05 .371 .21 .30 creases and the proportion of individuals excluded from the validation sample increases, the correlations between the predictors and the criterion steadily decrease. For example, if only individuals in the top 70 percent of scores on Z are selected (selection ratio of .7) and hence available for a validation study, the estimates of the correlations of Z and X with the criterion would be expected to decrease to .38 and .37, respectively. When only the upper 30 percent are selected, the corresponding correlations become .29 and .33, while for selection of the upper 5 percent they become .21 and .30. The effects of explicit and incidental selection shown in Table 1 clearly lead to a steady decrease in the assessed predictive values of Z and X when product-moment correlations are used to describe the degree of association with the criterion. However, the table also shows this decrease to be more dramatic for the explicit selection variable, Z, than it is for X. In spite of the fact that the population correlations indicate Z to be the better predictor, the effects of explicit selection on Z lead to a misleading indication that X is the better predictor once the selection ratio drops below .6. The degree by which one is misled clearly increases as the degree of selectivity increases. Examples such as these provide a clear indication that understanding of selection effects is crucial to the interpretation of results from criterionrelated validity studies. Without risk of hyperbole, one could say that the predictive validity of a selection instrument cannot be accurately characterized unless the possible effects of sample selection are accounted for to some extent.

OCR for page 127
Performance Assessment for the Workplace: VOLUME II Coping With the Effects of Selection The subtle ways in which biases introduced by sample selection affect estimates of correlations in an entire applicant pool are difficult to account for in any exhaustive way; however, it is possible to obtain useful assessments of the degree of association between a set of predictors and a criterion even with selected samples. Although no method of adjusting for selection effects is without limitations, it is often possible to obtain a less biased indication of predictive validity either by employing an adjustment procedure or by examining alternatives to the validity coefficient. Allred (in this volume) provides an excellent review of alternatives to the validity coefficient for describing the results of a predictive validity study, many of which are less sensitive to the effects of selection. The scatterplots shown previously are the most straightforward example of an alternative approach. They provide a very detailed representation of the relationship between a predictor and a criterion measure. Such detail can be important for detecting specific characteristics of the relationship—ceiling and floor effects, marked departures from linearity, changes in the degree of criterion variability depending on predictor score (heteroscedasticity), and outliers can all be discerned from careful inspection of a scatterplot. In addition to the scatterplot, Allred shows how graphical displays of the criterion distribution at fixed levels of the predictor can offer a more concise evaluation of potential nonlinearity and heteroscedasticity than the scatterplot. The various types of expectancy tables discussed by Allred provide estimates of the criterion performance expected for any prespecified level of performance on the predictor(s) and are also useful indicators of the predictor-criterion relationship. General measures of association used in the analysis of contingency tables (cf. Bishop et al., 1975) could conceivably be applied to such expectancy tables in order to provide a numerical index analogous to the correlation coefficient—but the tables are also valuable in their own right. When scatterplots or plots of conditional distributions indicate that the relationship between predictor and criterion is approximately linear and homoscedastic, the use of the familiar summary statistics from correlation and regression is most meaningful. Although a bivariate, or in the case of multiple predictors multivariate, normal distribution of predictor(s) and criterion is not strictly required for these conditions to hold, normality is the basis for the significance tests and confidence intervals used for inferences about population correlations and regression parameters. The obvious difficulty for formal statistical inference in the context of a criterion-related validity study lies in the fact that the random sampling scheme required, given that normality conditions are satisfied, can seldom be attained in practice. The estimates of the unknown population values are biased by nonrandom sample selection—the adjustment procedures reviewed below

OCR for page 127
Performance Assessment for the Workplace: VOLUME II attempt to remove at least a portion of this bias by incorporating specific assumptions about the selection process into estimates of correlations and regression parameters. Corrections for Sample Selection Bias The most common procedures for adjusting correlations for the effects of sample selection were first introduced by Pearson (1903) for the bivariate case and later extended by Lawley (1943-1944) to the case of multiple predictors and criteria. These procedures are sometimes referred to as the Pearson-Lawley corrections and have been used extensively in validation research in certain Services for a number of years. As discussed by Lord and Novick (1968), Lawley's extension of Pearson's two- and three-variable corrections describes the relationships between complete sets of explicit and incidental selection variables based on two assumptions: (1) the regression of each incidental selection variable on any combination of the explicit selection variables is linear, and (2) the errors of estimate incurred in regressing incidental on explicit selection variables are constant (i.e., they are homoscedastic). When these conditions are satisfied, the covariances among the incidental selection variables can be expressed as where C represents the variance-covariance matrix of the variables indicated by subscripts x and z, which refer to incidental and explicit selection variables, respectively. An asterisk is used to distinguish matrices based on the selected sample from those based on the unselected population. As can be seen from the above expression, it is the relationship among explicit selection variables in the unselected population versus selected sample (Czz vs. ) that determines the size of the adjustment made by the Lawley correction procedure. For the case of no selection, Czz and are identical and the term in parentheses vanishes, yielding Cxx = . As selection affects the elements of , the elements of are adjusted accordingly. For the special case illustrated earlier, in which Z was the lone explicit selection variable and X and Y were incidental selection variables of substantive interest, the Pearson-Lawley expression for the correlation between X and Y in the unselected population is given by

OCR for page 127
Performance Assessment for the Workplace: VOLUME II where upper-case R and S designate correlations and standard deviations in the unselected population, respectively, and lower-case r and s designate corresponding quantities in the selected sample. If the explicit selection variable, Z, were known, its standard deviation in the unselected population or applicant pool is all that would be necessary in addition to selected sample statistics in order to estimate the XY correlation. Thus, the expression given above would be applicable in a situation where, for example, selection was explicit on a composite variable, but interest in the correlations between individual elements of the composite and the criterion existed. Where a number of explicit selection variables are known, the Lawley extension of the above three-variable formula would provide the appropriate adjustment under assumptions (1) and (2). Inspection of the above formula provides some insight into the concepts underlying this and other corrections for range restriction. The numerator of this formula shows that the sample rxy is being incremented by a factor related to the selection ratio and the magnitudes of the correlations between the explicit selection variable and the variables of substantive interest. As selectivity increases, the ratio of standard deviations in the unselected to selected groups becomes larger and the correction factor increases. Similarly, when the explicit selection variable is highly correlated with the variables of substantive interest, the correction factor can be quite large. In either case, the similarity of the above formula to the one used for calculating partial correlations makes clear that the adjustment for explicit selection on Z “undoes” precisely what partial correlation is designed to do. That is, instead of factoring out variance that is not considered related to the correlation between X and Y, the above equation factors in variance that is considered related to that correlation. If X itself happens to be the explicit selection variable, the Pearson twovariable correction formula yielding Rxy can be obtained by simply substituting x for z in the subscripts of the above formula and simplifying the resulting expression, yielding where upper- and lower-case quantities are defined as before. Because Z, conceived as a true selection variable, is not likely to be observed in practice, the above two-variable correction formula has been widely used, even when selection is not, strictly speaking, explicit on X. The application of these formulas is quite simple and can be illustrated by again considering the scatterplots in Figure 2 and Figure 4. Recall that these plots depicted explicit selection on X and Z, respectively. When selection is

OCR for page 127
Performance Assessment for the Workplace: VOLUME II explicit on X, the two-variable selection formula gives a corrected estimate of .577 for the correlation between X and Y in contrast to the observed rxy of .398 in the selected sample. Similarly, when selection is explicit on Z, the three-variable formula gives a corrected estimate of .590 for Rxy in contrast to the rxy of .414 in the selected sample. As expected, the corrected values in these two instances are quite close to the population correlation of .60 in that assumptions (1) and (2) are perfectly satisfied and the correct explicit selection variable is available. Perhaps a more realistic example would depict X as the only available proxy for the unobserved true selection variable and treat it as explicit even though it is incidental. When the two-variable formula is thus used in the setting depicted in Figure 4, the corrected estimate of Rxy is .540, still smaller than the population correlation. As suggested below, this undercorrection is likely to be a common occurrence when the Pearson-Lawley procedures are used in practice. Technical Considerations and the Accuracy of Pearson-Lawley Corrections The principal technical issues that have implications for the value of corrections for range restriction in practice relate to their accuracy in the presence of violated assumptions and their degree of sampling error. With regard to the former area of concern, studies by Rydberg (1963), Linn (1968), Novick and Thayer (1969), Brewer and Hills (1969), Greener and Osburn (1979, 1980), Linn et al. (1981), Dunbar (1982), Gross and Fleischmann (1983), and Booth-Kewley (1985) have addressed the general issue of bias in results of range restriction adjustments when either regression or selection assumptions are not satisfied. Gullickson and Hopkins (1976), Forsyth (1971), Bobko and Rieck (1980), Dunbar (1982), Gross and Kagen (1983), Brandt et al. (1984), and Allen and Dunbar (1990) have provided descriptions of the sampling behavior of estimates of correlations and regression parameters that have been corrected for range restriction. The principal findings of some of these studies are reviewed below. The concern regarding the effects of nonlinearity and/or heteroscedasticity in the criterion-predictor relationship is aptly expressed by Lord and Novick (1968), who argue that the Pearson-Lawley corrections can give overly optimistic indications of predictive validity when assumptions are not met. In spite of a common observation that departures from linearity and homoscedasticity are likely to occur at the same time in applied settings, most studies of the behavior of the Pearson-Lawley corrections under violated assumptions have examined nonlinearity and heteroscedasticity separately. Results of studies using both real and simulated data suggest that the effects of violated assumptions on the accuracy of the Pearson-Lawley corrections depend on the nature of the violation. Greener and Osburn

OCR for page 127
Performance Assessment for the Workplace: VOLUME II about population correlation coefficients and regression parameters when selection is severe. In other situations, either where selection is moderate or reasonable values for variances in the unselected group are difficult to determine, the two-stage procedure may have some promise. Some of the technical limitations of the Heckman approach alluded to in the above example have been studied in greater detail, with concern again centering on violated assumptions regarding the distribution of selection variables and on sampling fluctuations of the adjusted estimates. Goldberger (1980) addressed the problem of bias in the adjustment when the selection variables are not normally distributed, suggesting that it can be quite large with only modest departures from normality. When the specific departure is such that the regression of Y on X has a reduced slope in the upper range of the X scores, the two-stage adjustment tends to be conservative, just as did the Pearson-Lawley corrections in this situation (Dunbar, 1982). Unfortunately, this and other selection modeling approaches have seen limited use in empirical studies of criterion-related validity; as a consequence, there is presently little basis for judging how they might perform if used in largescale validation studies. Sampling fluctuations and the efficiency of estimates obtained in the selection modeling approaches have also received attention recently, particularly in Nelson's (1984) examination of the behavior of the Heckman two-step method, Olsen's (1980) least-squares version of the method, and a full maximum likelihood version with respect to the degree of collinearity introduced by overlap between variables describing the selection process and variables to be evaluated as predictors. Nelson's results confirm the general suspicions aroused concerning the two-step methods by the example given previously: adjustments for selection bias are most needed precisely when the two-step methods for making them are ineffectual. That is, the high degree of collinearity expected when the selection process is completely specified resulted in unacceptable sampling errors in the parameter estimates from the Heckman and Olsen procedures. Nelson's results showed the full maximum likelihood method to perform better under most circumstances in his simulation study—to the extent that this method is related in spirit to the Pearson-Lawley corrections, some preference for the latter might be inferred from these results. In any case, it appears that a more careful evaluation of some of the alternative approaches to dealing with range restriction problems needs to be made in the context of test validation research before such methods can be used with much confidence on a large scale. Some of the Services are currently engaged in such efforts—results from such studies can have direct implications regarding the expected accuracy of selection modeling approaches in the Joint-Service context (see Rossmeissl and Brandt, 1985; Dunbar, 1986). A final note on alternative procedures concerns the potential application

OCR for page 127
Performance Assessment for the Workplace: VOLUME II of certain Bayesian approaches to handling the missing data problem caused by selection. Rubin (1977), for example, developed a method for assessing the influence of nonrespondents on the results of sample surveys based on subjective notions about the characteristics of nonrespondents. These notions are formalized in terms of the parameters of prior distributions for observations in the nonrespondent population. In a test validation context, one might consider unselected examinees as analogous to the nonrespondents that Rubin's methods are designed to handle, nonrespondents with particular characteristics because of administrative as well as self selection. Herzog and Rubin (1983) and Glynn, et al. (1986) provide extended discussions and examples of how repeated imputations of missing observations based on a variety of prior distributions can be used together with available data to estimate the desired parameters of a combined population of selected and unselected examinees. The combination of such “mixture models ” with repeated imputations from a range of reasonable prior distributions can make explicit the amount of uncertainty that exists regarding the predictive validity of a test. As noted by Rubin (1977), these methods, and perhaps any method for handling selection bias, are best considered as ways of formalizing the possible effects of missing observations on outcome measures rather than as substitutes for random samples normally required for valid inferences about the characteristics of a population. IMPLICATIONS FOR PREDICTIVE VALIDITY IN A JOINT-SERVICE CONTEXT The problem of range restriction is not new to anyone concerned with establishing the criterion-related validity of selection and classification tests in the Services, although the methods for coping with it have been far from uniform over the years. Some Services have used either the Pearson or Lawley corrections routinely in reporting the results of validity studies for the ASVAB and its predecessors, while others have questioned this use, especially when validation is conducted within particular occupational categories (cf. Sims and Hiatt, 1981; and Air Force Human Resources Laboratory, 1982, for an instance of this contrast of viewpoints). Because many things can affect the magnitudes of correction factors, comparison of corrected correlations across Services is not a simple task. The choice of analytical procedures, base or reference populations to which the corrected estimates are intended to generalize, and the variables that are to serve as explicit selection measures can all influence the magnitudes of corrected estimates of predictive validity. These issues are discussed in what follows as they relate to the use of adjustment procedures for validating the ASVAB against measures of performance in current use for military jobs, and for

OCR for page 127
Performance Assessment for the Workplace: VOLUME II validating alternative measures of job proficiency (surrogate criteria) against on-site evaluations of job performance (benchmark measures). Analytical Procedure The demands on a procedure for dealing with selection effects in the Joint-Service context for military performance assessment are not typical of many settings in which test validation studies are conducted. One purpose behind Service-wide efforts in this regard is the achievement of a degree of comparability in characterizations of predictive validity across Services, across jobs of a similar nature within and between Services, and across jobs that involve different tasks and hence different combinations of ASVAB subtests for selection. Moreover, a concern exists for examining the consistency of predictions of performance criteria for groups of military trainees distinguished by sex, race or ethnicity, and level of education. Comparisons of this kind, which make the cooperative venture especially useful, can be hopelessly confounded by varying degrees of range restriction in the groups involved. This problem is duly noted with respect to comparisons across occupations in the work of Schmidt and Hunter (1977) on validity generalization, and with respect to comparisons across demographic groups in the work of Linn (1983a, 1983b) on differential prediction. An important observation regarding the purpose of corrections for range restriction with such comparisons in mind is that they are as much needed to obtain comparability as they are to provide precise estimates of population values. Because they are also to be used with a variety of criterion measures (surrogate as well as benchmark measures of job performance), their limitations need to be fully understood and appreciated. As indicated at various points in the preceding review, the standard PearsonLawley correction procedures are familiar to most personnel psychologists and appear to have been carefully evaluated in both analytical and empirical studies, many of the latter being performed with the specific problems of criterion-related validity in mind. The limitations of these procedures and the conditions under which they are likely to give misleading indications of predictive validity are well documented. In contrast to these conventional techniques, the procedures based on selection modeling are in comparative infancy. Their relationship to the conventional procedures is only partially understood and their use in the context of predictive validity studies has been all but nonexistent. In addition, the sample size requirements of the more sophisticated estimation methods accompanying some of the selection

OCR for page 127
Performance Assessment for the Workplace: VOLUME II modeling procedures may not be met by many military job classifications for which validity checks are desired. Thus, it is probably premature to endorse these newer methods in the context of ASVAB validation. Regarding their use on a routine basis, more caution is probably warranted due to the multiplicity of alternate methods and to problems in the technical implementation of some. It is also premature, however, to dismiss these methods as inappropriate for criterion-related validity studies in general—further investigations of these techniques with military performance data is to be encouraged. Reference Population Results from the application of any adjustment procedure necessarily reflect the characteristics of the group from which information about the unselected population is obtained. In cases where the Pearson-Lawley corrections are used, the source of estimates of the variances and covariances of selection variables in the unrestricted group, in addition to the selection process itself, determines the magnitude of any correction factor. Variations in these estimates, such as those caused by preexisting differences between potential enlistees opting for one Service over the others, result in correction factors of varying magnitudes and make the corrected values difficult to interpret. For these reasons, when comparison across Services is important, using the entire accession populations within Services in a given year as reference groups would be counterproductive. The corrected predictive validities for selection composites would differ from Service to Service as well as from year to year. Clearly, a reference population common to all Services reporting predictive validities for the ASVAB is desirable. Recent versions of the ASVAB have been anchored to a nationally representative sample of men and women drawn as part of the National Longitudinal Survey (NLS) of Youth Labor Force Behavior, sponsored by the Departments of Labor and Defense and usually referred to as the 1980 Youth Population (U.S. Department of Defense, 1982). Although the 1980 Youth Population does not precisely reflect current applicant pools for each Service, it does provide a frame of reference that would allow corrected correlations to be directly comparable across Services. Moreover, it does not constitute a group about which concerns over self-selection phenomena would arise, as would be the case for an accession population. In spite of the fact that the 1980 Youth Population is the closest example of any kind of normative group for dealing with the effects of range restriction, there are some limitations in using it. As a representative sample of the nation's youth, this group contains individuals who would be judged ineligible for military service on the basis of ASVAB scores used in the initial screening of enlistees and is thus atypical of a projected mobilization

OCR for page 127
Performance Assessment for the Workplace: VOLUME II population in the wider range of talent represented. With this wide range of talent, one could expect correction factors to be quite large in some cases, certainly larger than expected when interest is focused on the predictive validity of ASVAB composites for only those recruits who have passed an initial screening. In view of the fact that corrections based on the total 1980 Youth Population might be artifactually large, some portion of this group might be best chosen to meet the need of adjusting validity coefficients for comparability across Services. The effects of suggested restrictions of the Youth Population on the magnitudes of correction factors are illustrated in the example presented below. In order to provide an idea of the kinds of results to be expected when using the Pearson-Lawley correction procedures in connection with the 1980 Youth Population, an example is given in Table 5 based on training data from nine Marine Corps clerical specialties. Given in Table 5 are uncorrected and corrected correlations between the ASVAB clerical composite from Forms 8/9/10 used by the Marine Corps and final course grades in training. The Lawley corrections given in the table assume the 10 individual subtests to be explicit selection variables, and the composite and course grades to be incidental selection variables. The adjusted values given in column A were obtained by using the variances and covariances of subtest standard scores for the total Youth Population (U.S. Department of Defense, 1982). To approximate the situation in which only that portion of the Youth Population eligible for military service is used, subtest standard deviations from the total sample were adjusted for four degrees of truncation under the assumption that each subtest followed a normal distribution. The resulting reduced standard deviations were used along with the original correlations from the total sample in obtaining a variance-covariance matrix among explicit selection measures that would provide a rough indication of how much smaller correction factors might be with ineligible examinees deleted from the reference group. This procedure probably underestimates the amount by which correction factors would change if, for example, the bottom 10 percent of the AFQT distribution (the actual screening composite) were deleted from the reference group since only the standard deviations were altered in the calculations. As can be seen from the entries in Table 5, the Lawley corrections suggest that the amount of range restriction in these groups is substantial in nearly all cases. For the specialties with reasonably large sample sizes, the smallest difference between corrected and unconnected validity coefficients is .19 in the second administrative group. For other training courses, the corrections in column A based on the entire Youth Population are quite large. The range restriction phenomenon is further illustrated in the lower portion of Table 5, which contains subtest standard deviations for each training cohort and for the entire Youth Population. Inspection of these

OCR for page 127
Performance Assessment for the Workplace: VOLUME II TABLE 5 Corrected Predictive Validities of the Marine Corps Clerical Composite in Nine Clerical Training Programs Based on 1980 Youth Population and Subtest Standard Deviations in Selected and Unselected Groups Panel (a)       Corrected Validities Training Program N Uncorrected Validity A B C D E Administrative 1 632 .32 .56 .52 .50 .48 .46 Administrative 2 620 .25 .44 .41 .39 .37 .36 Communications center 332 .19 .56 .52 .50 .48 .47 Supply stock 653 .47 .69 .65 .63 .62 .60 Aviation supply 379 .30 .55 .51 .49 .47 .45 Finance records 227 .20 .65 .62 .60 .58 .57 Basic preservation 51 .28 .36 .34 .33 .32 .31 Subsistence supply 66 .11 .59 .56 .54 .52 .51 Aviation operations 87 .36 .66 .63 .61 .59 .57 Panel (b) Training Program AR WK PC NO GS CS AS MK MC EI Administrative 1 6.95 6.25 7.43 8.11 7.21 9.45 8.13 7.09 8.34 7.87 Administrative 2 7.05 6.34 7.21 6.82 7.57 9.46 8.32 7.57 8.17 7.51 Communications center 7.91 7.12 8.49 6.53 8.55 9.51 8.96 7.58 8.77 8.55 Supply stock 7.19 5.72 6.96 6.35 7.39 8.37 8.81 8.03 8.27 7.96 Aviation supply 7.21 6.16 8.17 8.31 7.13 9.08 8.36 7.34 8.34 7.76 Finance records 5.65 3.94 5.10 5.73 6.69 7.78 8.37 6.81 7.37 7.89 Basic preservation 8.05 7.93 8.98 7.13 7.95 6.46 8.12 8.13 8.31 8.59 Subsistence supply 7.05 6.87 7.14 5.75 7.48 8.13 8.00 7.03 7.98 7.98 Aviation operations 6.92 6.20 7.73 7.30 6.35 8.53 8.26 6.77 8.53 8.09 1980 Youth Population 10.25 10.05 9.66 10.65 9.69 10.10 9.92 10.77 9.55 9.86 NOTE: Corrections using the Lawley formula make use of standard deviations on 10 ASVAB subtests from Form 8, derived from: A the total 1980 sample. B the upper 95 percent of the 1980 sample. C the upper 90 percent of the 1980 sample. D the upper 85 percent of the 1980 sample. E the upper 80 percent of the 1980 sample. values shows the varying degrees of selectivity across groups. Generally speaking, the standard deviations in the training cohorts are one-half to two-thirds the size of the corresponding values in the proposed reference population. The effects of removing ineligible individuals from the Youth Population prior to adjusting the predictive validities of the clerical composite for se-

OCR for page 127
Performance Assessment for the Workplace: VOLUME II lection effects are shown in columns B through E in the upper portion of Table 5. Even after deleting lower scoring individuals, it appears that corrected values remain likely to be substantially larger than the uncorrected ones in some cases—extreme selection in some occupational specialties, such as the financial records clerk group in this instance, makes this result unavoidable. However, the relative magnitudes of the corrected validity coefficients across training programs remain the same regardless of the proportion of the reference group that is deleted. It should be noted that the regular pattern of steadily decreasing estimates of predictive validity for increasing degrees of truncation in the proposed reference population is to be expected when the selection variables (here ASVAB subtests) have symmetric distributions, as assumed in this illustration. ASVAB subtests are well known to have nonsymmetric distributions. This is likely to influence the pattern of decreasing estimates of validity when the reference population is truncated on AFQT. Indeed, Maier (1985) has shown that the effects of truncation on the multivariate corrections are more complex than the example here would indicate, in part because of skewness in the distributions of selection variables and in part because of the effects of truncation on the covariances of selection variables. Neither factor is considered in the example presented above. Selection Variables In the above example, individual ASVAB subtests served as explicit selection variables. In practice, most job training programs in the military make use of one or more composite measures in selecting recruits for training. Typically, cut scores for selection are established on the composite scale separately for high school graduates and nongraduates, making high school graduation an additional selection variable. This custom poses several problems for the usual correction procedures in the Joint-Service context. First of all, ASVAB selection composites are not uniform from Service to Service, so that using the actual selection measure would again militate against the goal of comparability in the resulting values. Here, comparability is in direct conflict with accurate specification of the selection process. If using the subtests instead of the composite measures means that the selection mechanism is incompletely specified, the Pearson-Lawley corrections might likely give conservative indications of the predictive validity of the composites. Second, the inclusion of graduation status as a dichotomous selection variable in the Pearson-Lawley formulation is reasonable in principle, but complicated by the fact that certain job training programs admit only recipients of high school diplomas. No variability in the selected sample for such groups represents a kind of limiting case for the Pearson-Lawley procedure, in which no simple correction for range restriction is feasible. It would therefore seem appropriate to consider ASVAB subtests as the only avail-

OCR for page 127
Performance Assessment for the Workplace: VOLUME II able measures that are both closely related to the selection process and common to all Services. Any systematic errors incurred as a result of this choice are likely to result in corrected validities that are smaller than they might otherwise be with more detailed specification of the selection mechanisms operating for individual training and occupational cohorts. In considering the choice of variables used in correcting for range restriction, the properties of criterion measures should not be overlooked. Selection effects are likely to increase when surrogate and benchmark measures are used as criteria. General attrition can be expected to alter the distribution of both types of measures relative to the distribution of final grades in training, and the logistic problems of collecting systematic on-site evaluations of performance could have unknown effects on the range of talent represented in the distributions of benchmark variables. If selectivity is augmented by factors such as these, both the bias error and sampling error of corrected correlations will be affected and the degree of uncertainty regarding the resulting values magnified. The possible presence of additional sources of unreliability in alternative performance measures is another area of concern. While it may be the case that all of these factors (i.e., added selectivity and measurement error) suggest an increased tendency for the Pearson-Lawley corrections to be conservative, their exact influence in individual cases is difficult to determine. To the extent that unreliability may effect corrections that are overly conservative, additional corrections for attenuation might be appropriate. These would of course increase the sampling errors of observed values even further. CONCLUDING REMARKS The effects of selection on correlation coefficients and regression parameters place the personnel psychologist on the horns of a classic statistical dilemma. To retain observed values gives an extremely misleading view of the relationship between predictor and criterion variables, but to correct observed values places one at the mercy of assumptions that will not be strictly satisfied in practice and of an added degree of uncertainty in estimating population values. While there is no inherent magic in any procedure for dealing with the effects of selection bias, neither is there an inherent sorcery. Rather, a balanced indication of the quality of a given predictor can be achieved by reporting both uncorrected and corrected validity coefficients and by careful documentation of the methods used to obtain the corrected estimates. The tenability of assumptions can be examined in individual cases through the use of graphical techniques described elsewhere, and in some cases reasonable speculation about the influence of violated assumptions can be entertained. The review of analytical techniques suggested the use of the standard Pearson-Lawley correction formulas in vali-

OCR for page 127
Performance Assessment for the Workplace: VOLUME II dation studies involving Service-wide applications. A common reference population and set of explicit selection variables also seem desirable for any degree of comparability to be achieved through corrections for restriction of range. When a summary statistic is needed to describe the predictive validity of a selection instrument in a variety of settings, these approaches can provide a more complete assessment of the relationship between that instrument and the relevant measures of on-the-job performance. REFERENCES Air Force Human Resources Laboratory 1982 Aptitude Index Validation of the Armed Services Vocational Aptitude Battery (ASVAB) Forms 5, 6, and 7. Air Force Human Resources Laboratory, Personnel Research Division Lackland Air Force Base, Tex. Allen, N.L., and S.B. Dunbar 1990 Standard error of correlations adjusted for incidental selection. Applied Psychological Measurement, 14:83-94. Bishop, Y.M.M., S.E. Fienberg, and P.W. Holland 1975 Discrete Multivariate Analysis: Theory and Practice. Cambridge, Mass.: MIT Press. Bobko, P., and A. Rieck 1980 Large sample estimators for standard errors of functions of correlation coefficients. Applied Psychological Measurement 4:385-398. Booth-Kewley, S. 1985 An Empirical Comparison of the Accuracy of Univariate and Multivariate Corrections for Range Restriction. TR-85-19. Navy Personnel Research and Development Center, San Diego, Calif. Brandt, D., D. McLaughlin, L. Wise, and P. Rossmeissl 1984 Complex Cross-Validation of the Validity of a Predictor Battery. Paper presented at the annual meeting of the American Psychological Association, Toronto. Brewer, J., and J.R. Hills 1969 Univariate selection: the effects of size of correlation, degree of skew, and degree of restriction. Psychometrika 34:347-361. Cohen, A.C. 1955 Restriction and selection in samples from bivariate normal distributions Journal of the American Statistical Association 50:884-893. Cronbach, L.J. 1982 Designing Evaluations of Educational and Social Programs. San Francisco: JosseyBass. Cronbach, L.J., D.R. Rogosa, R.E. Floden, and G.G. Price 1977 Analysis of Covariance in Non-Randomized Experiments: Parameters Affecting Bias. Occasional paper, Stanford Evaluation Consortium, Stanford University. Dunbar, S.B. 1982 Corrections for Sample Selection Bias. Unpublished doctoral dissertation, Department of Educational Psychology, University of Illinois at Urbana-Champaign. 1986 Comparing Regression Equations Across Training Programs: An Empirical Study of Prior Selection Effects and Alternative Predictive Composites ONR Technical Report 86-1: Measurement Research Group, Iowa City, Iowa. Forsyth, R.A. 1971 An empirical note on correlation coefficients corrected for restriction of range. Educational and Psychological Measurement 31:115-123.

OCR for page 127
Performance Assessment for the Workplace: VOLUME II Glynn, R.J., N.M. Laird, and D.B. Rubin 1986 Selection modeling versus mixture modeling with non-ignorable nonresponse In H. Wainer, ed., Drawing Inferences from Self-Selected Samples. New York: SpringerVerlag. Goldberger, A.S. 1980 Abnormal Selection Bias. Workshop Paper 8006. Social Systems Research Institute, University of Wisconsin-Madison. Greener, J.M., and H.G. Osburn 1979 An empirical study of the accuracy of corrections for restriction in range due to explicit selection. Applied Psychological Measurement 3:31-41. 1980 Accuracy of corrections for restriction in range due to explicit selection in heteroscedastic and non-linear distributions. Educational and Psychological Measurement 40:337346. Greenlees, J.S., W.S. Reece, and K.D. Zieschang 1982 Imputation of missing values when the probability of response depends on the variable being imputed. Journal of the American Statistical Association 77:251-261. Gross, A.L., and L. Fleischmann 1983 Restriction of range corrections when both distribution and selection assumptions are violated. Applied Psychological Measurement 7:227-237. Gross, A.L., and E. Kagen 1983 Not correcting for restriction of range can be advantageous. Educational and Psychological Measurement 43:389-396. Gullickson, A., and K. Hopkins 1976 Interval estimation of correlation coefficients corrected for restriction of range. Educational and Psychological Measurement 36:9-25. Gulliksen, H. 1950 Theory of Mental Tests. New York: John Wiley & Sons. Hausman, J.A., and D.A. Wise 1976 The evaluation of results from truncated samples: the New Jersey income maintenance experiment. Annals of Economic and Social Measurement 5:421-445. Heckman, J.J. 1974 Shadow prices, market wages, and labor supply. Econometrica 42:679-694. 1979 Sample selection bias as a specification error. Econometrica 47:153-161. Herzog, T.N., and D.B. Rubin 1983 Using multiple imputations to handle nonresponse in sample surveys In W.G. Madow, I. Olkin, and D.B. Rubin, eds., Incomplete Data in Sample Surveys, Volume 2. Theory and Bibliographies. New York: Academic Press. Kelley, T.L. 1923 Statistical Method. New York: MacMillan & Co. Lawley, D. 1943–1944 A note on Karl Pearson's selection formulae. Royal Society of Edinburgh: Proceedings, Section A 62:28-30. Linn, R.L. 1968 Range restriction problems in the use of self-selected groups for test validation. Psychological Bulletin 69:69-73. 1983a The Pearson correction formulas: implications for studies of predictive bias and estimates of educational effects in selected samples. Journal of Educational Measurement 20:1-15. 1983b Predictive bias as an artifact of selection procedures. In H. Wainer and S. Messick, eds., Principals of Modern Psychological Measurement: A Festschrift for Frederic M. Lord. Hillsdale, N.J.: Lawrence Erlbaum Associates.

OCR for page 127
Performance Assessment for the Workplace: VOLUME II Linn, R.L., D.L. Harnisch, and S.B. Dunbar 1981 Corrections for range restriction: an empirical investigation of conditions resulting in conservative corrections. Journal of Applied Psychology 66:655-663. Lord, F.M., and M.R. Novick 1968 Statistical Theories of Mental Test Scores. Reading, Mass.: Addison-Wesley. Maier, M.H. 1985 Effects of Truncating a Reference Population on Corrections of Validity Coefficients for Range Restriction. Research Memorandum 85-40. Center for Naval Analyses, Alexandria, Va. Muthén, B., and K.G. Jöreskog 1983 Selectivity problems in quasi-experimental studies. Evaluation Review 7:139-174. Nelson, F.D. 1984 Efficiency of the two-step estimator for models with endogenous sample selection. Journal of Econometrics 24:181-196. Novick, M.R., and D.T. Thayer 1969 An Investigation of the Accuracy of the Pearson Correction Formulas . Research Memorandum 69-22. Princeton, N.J.: Educational Testing Service. Olsen, R.J. 1980 A least-squares correction for selectivity bias. Econometrica 48:1815-1820. Pearson, K. 1903 Mathematical contributions to the theory of evolution—XI: On the influence of natural selection on the variability and correlation of organs. Philosophical Transactions of the Royal Society, London, Series A 200:1-66. Rossmeissl, P., and D. Brandt 1985 Modeling the Selection Process to Adjust for Restriction in Range Paper presented at the annual meeting of the American Psychological Association, Los Angeles. Rubin, D.B. 1977 Formalizing subjective notions about the effects of nonrespondents in sample surveys. Journal of the American Statistical Association 72:538-543. Rydberg, S. 1963 Bias in Selection. Stockholm: Almquist and Wiksell. Schmidt, F.L., and J.E. Hunter 1977 Development of a general solution to the problem of validity generalization Journal of Applied Psychology 62:529-540. Sims, W.H., and C.M. Hiatt 1981 Validation of the Armed Services Vocational Aptitude Battery (ASVAB) Forms 6 and 7 with Applications to ASVAB Forms 8, 9 and 10. Report 1160. Center for Naval Analyses, Alexandria, Va. Thorndike, R.L. 1949 Personnel Selection. New York: John Wiley and Sons. Tobin, J. 1958 Estimation of relationships for limited dependent variables. Econometrica 26:24-36. U.S. Department of Defense 1982 Profile of American Youth: 1980 Nationwide Administration of the Armed Services Vocational Aptitude Battery. Office of the Assistant Secretary of Defense (Manpower, Reserve Affairs and Logistics), Washington, D.C.