7
Statistical Analysis of Observational Data
Thus far we have made the case that randomized controlled experiments are the best approach available to researchers for drawing causal inferences. In the absence of experimental design, causal inference is more difficult. However, applying statistical models to observational data can be useful for understanding causal processes as well as for identifying basic facts about racial differences. Indeed, observational studies are the primary tool through which researchers have explored racial disparities and discrimination. The main goals of this chapter are to delineate the strengths and problems associated with measuring discrimination using observational studies and to identify methodological tools that are particularly promising for application in certain areas of research on discrimination.
We begin by discussing statistical decompositions of racial differences in outcomes using multivariate regressions. These decompositions are basically descriptive but are nevertheless an important tool for understanding what factors are related to observed differences as well as for measuring the magnitude of racial differences. In the next section, we continue with an outline of the fundamental issues that must be addressed to draw causal inferences about racial discrimination from statistical analyses of observational data. We illustrate the main issues by laying out a statistical model that can be used to measure discrimination in hiring decisions. As we see it, the hiring example is robust in the sense of surfacing all of the conceptual issues that hamper research on discrimination across domains, including the five domains on which we focus in this report (labor markets, education, housing, criminal justice, and health care). We discuss the strengths
and limitations of existing approaches to measuring discrimination across these domains and suggest how approaches prevalent in one domain might usefully be employed in others.
Even a cursory review of the literature on labor markets, education, housing, criminal justice, and health care reveals that it is quite common for researchers to employ statistical models when addressing questions of racial discrimination (see Table 4-1). Given the range of domains we examine, we do not attempt to be exhaustive in our presentation. Instead, we provide examples from individual studies in particular domains to illustrate particular methodological issues. Our intent is to summarize what we see as the most important challenges that arise in using statistical models to study racial differences in outcomes. And although we make frequent use of labor market concepts as concrete examples throughout this chapter, the fundamental statistical issues underlie the measurement of discrimination in all domains.
It should be noted that the style of exposition in this chapter is more mathematical than that in the rest of the report. This mathematical presentation is necessary to make clear what statistical decompositions of racial differences measure. It is also needed for precision regarding the role of models as descriptions of the ways in which outcomes are determined in the presence of discrimination, the role of models and assumptions in drawing causal inferences regarding discrimination from observational data, the nature of the biases that arise when those assumptions are violated, and the ways in which alternative study designs can reduce those biases.
STATISTICAL ANALYSIS FOR RESEARCH AND LITIGATION
Before we proceed, a caveat is in order. This chapter attempts to illuminate state-of-the-art statistical methods that should be used by academic researchers attempting to detect the existence and magnitude of racial discrimination in a wide variety of domains. Statistical proof of racial discrimination may often be sought in other contexts in which the same degree of attention to methodological detail may be valued differently. In particular, courts are often called upon to decide discrimination cases in circumstances that are far less congenial to the detailed and sustained analysis of the academic researcher. Litigants often press for expert testimony based on something far short of state-of-the-art statistical practices that academic researchers might employ. In some instances, a straightforward analysis of the available data may appear to make a compelling case, but many outside the courts would argue for more details and alternative analyses to buttress the arguments.
In a paper commissioned for this panel, Nelson and Bennett (2003) investigate the courts’ use of statistics to make decisions in cases alleging
racial discrimination in employment. They analyze published federal court opinions on racial discrimination in employment that refer to “statistics” in some form. They compare practices in cases published in 2000–2002 (178 opinions) and in cases published in 1980–1982 (124 opinions) to evaluate changes over time. For cases published in 2000–2002, a preliminary analysis revealed that courts treat statistical data on racial discrimination conservatively; in other words, “they are reluctant to reject the null hypotheses of nondiscrimination and they are reluctant to hold that plaintiffs have met their burden of proof” (Nelson and Bennett, 2003:2). Similar results were found for the 1980–1982 period.
Most courts expect to see statistical evidence presented by plaintiffs in employment discrimination cases; Nelson and Bennett conclude, however, that in most cases courts are skeptical of statistics used to prove discrimination. Moreover, they do not appear very often to base opinions on statistical evidence in contrast to Supreme Court precedent or other judicial rulings. In fact, courts are relying on statistics less frequently now than they did in the 1980s, even though statistical techniques have improved. Moreover, the interpretation of statistical evidence varies across courts and cases.
Overall, the lack of credence given by courts to statistical evidence and the complexities of drawing inferences about racial discrimination from such data appear to be detrimental to plaintiffs. In both periods examined by Nelson and Bennett, plaintiffs lost to defendants more than three times to one, and it is becoming increasingly more difficult for plaintiffs to convince courts that their claims are valid.
The main reasons cited for not relying on statistical data in judicial opinions are (1) relatively small sample sizes, (2) difficulty in defining the comparison groups, (3) lack of relevant controls for nondiscriminatory explanations for disparities, and (4) the use of aggregated data across multiple job levels in a class action suit. These statistical issues (particularly the first three) were prominent in the cases examined within each time period.
Moreover, although there have been many sophisticated advances in statistical analyses, the analyses used in court cases typically involve simple comparisons between the racial composition of an applicant pool or a potential promotion pool and a set of selection outcomes (such as hiring or promotion). Few cases involve rigorous assessment of the use of multiple regression and other multivariate analyses. Courts discount small samples without considering the probabilities of outcomes, displaying a lack of statistical knowledge and reasoning. Courts also have no consistent approach to dealing with these problems. A recent Supreme Court decision, however (Desert Palace v. Costa [No. 02-679]), appears to open the door to expanded use of statistical methods to support inferences of discrimination in legal proceedings.
STATISTICAL DECOMPOSITIONS OF RACIAL DIFFERENCES
Two types of regression models have been used to decompose racial differences in outcomes. They are (1) regression models with race-specific intercepts, which assume that the effects of other variables (e.g., education) are similar for race groups, and (2) race-specific regression models that allow for interaction effects between race and other variables. All such models pose problems for interpretation.
Regression Models with Race-Specific Intercepts
A standard way to explore the difference in an outcome between groups is to decompose the difference into “explained” and “unexplained” components. To illustrate, suppose the researcher is comparing outcomes of white and black men. The simplest formulation is captured by the regression model
(7.1)
where Y_{i} is the outcome of interest, such as a wage rate, with i indexing the individuals in the sample; R is an indicator variable that takes on the value 1 for blacks and 0 for whites; X_{i} is a set of variables that are believed to be relevant to the determination of Y_{i}; β_{0} is the intercept; β is a vector of coefficients on the variables in X_{i} at a point in time; the coefficient γ captures the difference between groups in the average value of Y that is not accounted for by differences in X; and the error term u_{i} captures the effect of other factors that influence Y_{i}. The coefficients γ and β and u_{i} are defined so that the mean of u_{i} is unrelated to R_{i} and X_{i}. As we explain in detail below, unless the researcher is confident that he or she has measured all of the variables that are both correlated with R and relevant to the determination of Y, the model should be interpreted as descriptive rather than causal.
Let Y_{b} and X_{b} be the mean of Y and X for black men, and let Y_{w} and X_{w} be the mean of Y and X for white men. For concreteness, let Y be the wage rate. The average difference between the groups is
(7.2)
This difference can be decomposed as
(7.3)
The term (X_{w} –X_{b})β is the contribution of group differences in the observed characteristics X to the race gap in Y. For example, studies of the wage gap
almost always include a measure of education among the variables in X. The product of the difference between whites and blacks in the education measure and the coefficient relating education to Y is the contribution of the education difference to the gap. The parameter γ is the portion of the group difference in the means of Y that is not accounted for by the difference in X_{w} and X_{b} given the weights β that the X variables have for a given time period. This parameter is a “catchall” that includes the effects of group differences in omitted factors that would influence Y in the absence of discrimination, as well as the effect of discrimination.
Everett and Wojtkiewicz (2002) provide a good example of this technique from the criminal justice literature and illustrate the fact that the technique is not restricted to linear regression models. They examine the racial and ethnic disparities in federal sentencing following implementation of guidelines that were intended to ameliorate past disparities in sentencing. They estimate ordinal logistic regressions to assess the relative odds of receiving a sentence within each successive quartile of the sentencing range for the committed offense. Adding various legal and extralegal factors to a baseline model including only indicators of race and ethnicity (black, Hispanic, Native American, and Asian, with white as the comparison group), they examine changes in the log-odds coefficient for each indicator. They find that significant racial differences in sentencing remain after accounting for legal factors (offense-related traits such as severity of the crime, as well as recidivism) and other, extralegal factors (such as age and education).
Race-Specific Regression Models
The above description covers a great deal of early research on discrimination that served as the basis for further work on measuring discrimination and explaining racial differences in a variety of social, political, and economic outcomes.^{1} For the past 30 years, however, researchers have used a more general statistical model of such differences (or gaps) that allows for the possibility that the slope coefficients β differ between groups (e.g., an interaction between race and education; see Blinder, 1973; Duncan, 1968; Oaxaca, 1973). Suppose that Y_{wi} and Y_{bi} are determined by the equations
(7.4)
(7.5)
where β_{w} and β_{b} are defined so that E(u_{wi} | X_{wi}) = 0 and E(u_{bi} | X_{bi}) = 0. Consequently, the means of Y_{i} for whites and blacks are
and
respectively.
The difference in the mean of the outcome can be written as
(7.6)
The first term in this decomposition is the portion of the total gap that is explained by average differences in characteristics of whites and blacks using the coefficients for whites (β_{w}) as the weights. In other words, it is the portion of the gap in Y that would be eliminated if the gap in X were closed and if the dependence of Y on X were the same for blacks and whites. The second term is the “unexplained” part of the gap in Y; that is, the difference that arises because the relationship between characteristics and outcomes, as summarized by the regression parameters including the difference in (β_{0}_{w}–β_{0}_{b}) in intercept terms, differs between groups. Blau and Beller (1992) offer a good example of such a decomposition (see Box 7-1).
Alternatively, the average outcome difference can be decomposed as
(7.6′)
This alternative decomposition uses the coefficients from the model for blacks to determine the consequences for Y_{w} –Y_{b} of the group differences in X and uses the mean of X for whites to determine the consequences of the difference (β_{w} –β_{b}) in the slopes. The first term is the portion of the total gap that would be eliminated if the gap in X were closed and if the dependence of Y on X were the same for whites and blacks. The second term is the “unexplained” portion of the gap in Y. This second decomposition sometimes produces quite different results from those produced by the first. Many authors report both results or (occasionally) the average of the two (see Oaxaca and Ransom, 1999, for references).
Interpreting the Decomposition
The share of the total difference due to the second component in equation (7.6) is sometimes referred to as the “share due to discrimination.” This is misleading terminology, however, because if any important control variables are omitted, one or more of the β coefficients, including the intercept, will be affected. The second component therefore captures both the
BOX 7-1 A study by Blau and Beller (1992) is a good example of the use of regression models to decompose differences between groups. They estimate forms of equations (7.4) and (7.5) for black and white men and women in various experience categories, where experience refers to number of years since leaving school. They estimate two sets of regressions. The first uses the logarithm of earnings as the dependent variable and includes measures of education, potential labor market experience and its square, the natural log of annual weeks worked, dummy variables for part-time work status, veteran status (in the case of males), marital status, and dummy variables for three regions and urban residence. The regressions for women include controls for the number of children. The authors also report decompositions based on a second set of regressions that adds a list of dummy variables for major occupation category and for employment in the government sector to the regressions. Blau and Beller (1992:Table 3) report results of their decompositions using the coefficients from the regression model estimated on the white sample. They present separate results for 1971, 1981, and 1988 and use the results for these three periods to investigate changes between 1971 and 1981 and 1981 and 1988. When occupation dummies are excluded, the 1971 results for males with 10 to 19 years of potential experience estimate that 0.209 of the total gap of 0.452 in the log of earnings is due to differences in the means of the observed characteristics that determine earnings. Blau and Beller report that, of the part of the gap that is due to differences in observed characteristics, 0.096 is due to differences in the means of education and 0.061 is due to differences in the means of the variables that measure work hours. They find that 0.243 of the total gap of 0.452 is “unexplained” and reflects differences in the intercepts and slope coefficients on the variables in the earnings model. The results for 1988 show an increase in the total gap to 0.505. The explained gap rises to 0.331, primarily because of an increase in the portion of the gap associated with hours worked during the year, while the unexplained gap actually falls. |
effects of discrimination and the unobserved group differences in factors that would be expected to determine Y in the absence of discrimination. If there is a gap in favor of whites (blacks) in most of the omitted variables that boost Y, the second component will tend to overestimate (underestimate) the effects of discrimination. On the other hand, omitted variables that are correlated with X will influence the coefficients β, potentially caus-
ing the “unexplained” portion of the gap to be either an overestimate or an underestimate of discrimination. Finally, the inclusion in X of variables that are themselves an outcome in a particular domain, such as occupation or position within a firm in a study of earnings differences, may cause the second component to underestimate or less often overestimate discrimination.
It is also misleading to label only the second component as the result of discrimination. This is the case because discriminatory barriers in the labor market and elsewhere in the economy can affect the X variables, which are the characteristics of individuals that matter in the labor market. We discuss this point more below and in Chapter 11.
Although one must be mindful of the limits of what can be learned from equation (7.6), it is nevertheless a simple and powerful way to summarize information on some of the factors that underlie group differences. The decomposition analysis can be extended to study change in group differences over time (see Box 7-2).
Two Pitfalls in Statistical Decomposition
Many researchers further decompose the “explained” gap into the contribution of subgroups of variables. For example, suppose that X contains sets of indicators for region of the country, for city size, and for educational attainment (less than high school, some high school, high school, some college, college, and some graduate school). One would like to know the contribution of each of these sets of indicators to the explained and unexplained portions of the gap. The contributions of each set of variables to the explained gap are identified and can be estimated separately. The problem, however, is that the contributions of the individual variables to the unexplained gap are not identified separately and depend on the choice of the reference category for each variable. That is, one cannot distinguish the contribution to the overall unexplained gap of racial differences in the coefficients on region of the country from the contribution of racial differences in the coefficients on city size. See Jones (1983) and especially Oaxaca and Ransom (1999) for a more extended discussion of this issue and citations of a number of studies that have included such detailed decompositions.
When the relationship between Y and the X variables is highly nonlinear and the racial difference in the distribution of X is large, a lack of overlap between the black and white distributions in the X variables may make it difficult or impossible to estimate the decompositions of equations (7.6) or (7.6′) reliably. This problem may not be obvious if researchers use functional form specifications of equations (7.4) and (7.5) that are not sensitive to potential nonlinearities. Barsky et al. (2002) and Altonji and Doraszelski (2002) investigate the problem posed by lack of overlap in black
BOX 7-2 One way to analyze the sources of change over time in the outcomes of various groups is to differentiate between periods. The simplest way to proceed is to perform decompositions in two different years and compute the change in the “explained” component [(X_{wt} – X_{bt}) β_{wt}] and where we introduce t as a time subscript to make explicit the fact that the the change in the “unexplained” component [(β_{0}wt – β_{0}_{bt}) + (β_{wt} – β_{bt})X_{bt}], equations refer to a particular year. Blau and Beller (1992) and many other studies do this. However, the change in each of the two components combines the effects of changes in the race gap in characteristics and in the race gap in coefficients. A more detailed decomposition of change over time can be obtained as follows. Let the operator Δ represent the average difference between members of group 1 and group 2 in a particular year. For concreteness, let the outcome Y denote the wage rate. The change in wage differentials between time periods t′ and t can be expressed as The first term on the right-hand side of the equation represents the contribution of the relative changes over time in the observed characteristics of the two groups to the change between t and t′ in the wage gap. The second term is the effect of changes over time in the coefficients for group 1, holding differences in observed characteristics fixed. These first and second terms’ two factors capture the change over time in the explained portion of the wage gap that would be expected given changes in the characteristics of the two groups and the coefficients on those characteristics for whites in periods t and t′. The third and fourth terms capture the change in the unexplained component of the gap, (β_{wt} – β_{bt}) X_{bt} in equation (7.4). The third term is the effect of changes over time in the gap in the coefficients between the two groups. The fourth term accounts for the fact that changes over time in the characteristics of group 2 alter the consequences of differences in |
group coefficients (β_{wt} – β_{bt}). Researchers typically compute each of these terms, as well as the subcomponents corresponding to individual elements of X and β. A limitation of this decomposition is that it does not provide much insight into how the wage gap is affected by changes in the overall wage distribution, such as occurred over the 1980s when the returns to skill rose rapidly. Increases in the dispersion of wages will increase the gap between the mean wages of whites and blacks (given that whites are above the mean and blacks below), even if there is no change in the skill distributions of whites relative to blacks or in the level of discrimination. Juhn et al. (1991) and Card and Lemieux (1994, 1996) suggest ways to isolate the effect of a change in the dispersion of the unobservable wage components affecting both groups from a change in the location of the skill distribution of group 2 relative to group 1. Altonji and Blank (1999) provide a detailed discussion of the methods used in these papers. A brief summary of Juhn et al.’s (1991) basic results indicates what one can learn from their type of analysis. Using data from the Current Population Survey, they find that, between 1979 and 1987, changes in levels of education and experience reduced the black–white wage gap (in logs) for men by 0.34 (black characteristics moved closer to white characteristics), whereas increases in the returns to education and experience increased the gap by 0.27. They find that 0.33 of the 0.34 unexplained widening in the wage gap can be attributed to changing wage inequality affecting both whites and blacks. In short, they find that relative wages of blacks declined because black men were disproportionately located at the lower end of an increasingly unequal wage distribution. An alternative approach, used by Murnane et al. (1995), is to examine the sensitivity of estimates for 1978 and 1986 of the unexplained race gap in earnings to adding more detailed cognitive measures, such as test scores, to earnings regressions. They find a smaller race gap in the 1970s that is less sensitive to inclusion of test scores, particularly for males. This result is broadly consistent with the analysis in Juhn et al. (1991).* |
and white income distributions in the context of studies of the black–white wealth gap, with differing conclusions. Barsky et al. (2002) standardize for the effects of income by reweighting the sample of whites to have the same income distribution as the sample of blacks.^{2}
Summary: Decomposition and Residual “Effects” as Racial Discrimination
The use of multivariate regression and related techniques to decompose racial differences in some outcome of interest into a portion due to differences in the distribution of observed characteristics and a portion not explained by those characteristics is an essential tool for describing racial differences. The most informative studies use explanatory variables that both measure the most important determinants of the outcome under study and are likely to have different distributions by race. But the residual race differential may include not only any effect of discrimination but also the effect of other omitted factors that would generate different outcomes by race even in the absence of discrimination. Hence the unexplained gap may overestimate or underestimate the effects of discrimination.
INFERRING DISCRIMINATION FROM STATISTICAL ANALYSIS OF OBSERVATIONAL DATA
In this section, we discuss some of the more frequently encountered obstacles to causal inference in statistical studies of racial discrimination. As discussed in Chapters 5 and 6, it could be relatively easy to estimate the degree of discrimination if only it were possible to manipulate a person’s race. Except in very limited or special circumstances (e.g., an audit framework), race cannot be randomly assigned.^{3} Statistical models are widely used in observational studies in an attempt to replace the experimental control that could ensure an “all-else-equal” comparison. Again, the crucial problem that must be addressed to draw a causal inference from observa-
^{2} |
See also DiNardo et al. (1996). One can allow for arbitrary nonlinearity in the relationship between the outcome and X by first estimating a statistical model of the probability that a person is black as a function of X and then matching whites and blacks on the basis of this probability, which is called the “propensity score.” Nopo (2002) uses a nonparametric matching technique to decompose the gender gap in wages in Peru. Black et al. (2002) use a nonparametric matching technique to estimate the fraction of the earnings differences among college-educated white, black, Hispanic, and gay men that is explained by age, specific college major and degrees, English-language proficiency, family background, and region of birth. |
^{3} |
Even in an audit framework, random assignment is not simple. One of the key controversies in audit studies is the extent to which the designs can approach the classic random assignment paradigm (Heckman, 1998). See the discussion in Chapter 6. |
tional data is that the researcher has no control over which subjects have which attributes. Essentially, the inference that race has a causal effect on an outcome (because of racial discrimination) is drawn by shaping a set of statistical correlations using other information and assumptions formalized in a model of how the process under study is determined. This approach is typical of statistical analysis of observational data and is not unique to the problem of discrimination. Sometimes, the most we can claim is that the evidence is consistent with a certain explanation, with the caveat that other plausible explanations cannot be excluded.
Below we identify and discuss common obstacles to causal inference and some of the solutions proposed in the literature. We begin with a brief introduction to the essential role of theoretically informed models and adequate data in drawing causal inferences from observational data. We then illustrate this point with an extended example involving hiring decisions in the labor market. Finally, we discuss two of the most important sources of bias in observation studies of discrimination—omitted variables bias and sample selection bias.
Developing Statistical Models
According to Sir Ronald Fisher, as quoted by Cochran (1965:252), “When asked in a meeting what can be done in observational studies to clarify the step from association to causation, [I] replied: ‘Make your theories elaborate.’”^{4} To justify causal inference from observational data, we need a theoretically informed model that depicts, as accurately as possible, the specific process in which we are attempting to assess the presence and magnitude of racial discrimination. Depending on the particular process and context, one may have more or less information on which to base a theoretical model and then translate it into a statistical model. Laboratory experiments (see Chapter 6) are designed precisely to test the plausibility of various detailed theoretical frameworks.
As discussed in detail in Chapter 5, there is a growing literature that formalizes the assumptions and the deductive process needed to draw cause-and-effect inferences from statistical data. The key idea underlying this literature is the hypothetical counterfactual introduced in Chapter 5: What would have happened if the applicant for a job or rental housing had been white rather than nonwhite but nothing else had changed? Obviously, the counterfactual situation cannot be observed and compared with what actually occurred. Therefore, to draw a causal inference from experimental or observational data, it is necessary to specify assumptions and conditions
under which the counterfactual logic can be applied. Assumptions from the causal literature are particularly important for justifying the use of regression methods for drawing causal inferences. To draw inferences from running regressions on observational data, substantial prior knowledge about the mechanisms that generated the data must be used to support the necessary assumptions. Studies vary substantially in the degree to which the necessary assumptions are adequately justified. Below we discuss some of the specific issues that must be addressed in such models and their assumptions to draw causal inferences.
Example: Hiring Decisions in the Labor Market
In this section, we lay out a generic framework that underlies many statistical approaches to measuring discrimination. The example is from the labor market domain—in particular, hiring. However, the principles described here are quite general, and the issues raised apply across various domains. Our main purpose is to identify what a researcher must know about how an outcome is determined and what data must be available if discrimination is to be measured. We also discuss some of the most important limitations and issues of interpretation surrounding statistical studies of race differences based on observational data.
To set the stage, suppose the researcher is interested in understanding an outcome variable, labeled y. In the labor market context, y might be the probability of getting hired by a particular firm for a particular job. In other contexts, such as housing or education, y might be the probability that a housing loan application will be approved or that a person will be admitted to a university. To develop an adequate model of the phenomenon, the analyst needs to have a good understanding of the process that would determine y in both the absence and the presence of discrimination. In this example, the researcher would want information about the legitimate criteria (e.g., education or experience) used by the firm’s recruiters to screen applicants for the position.
In the case of hiring, a rational, profit-maximizing, nondiscriminating firm would prefer to hire people who are the best suited to perform well in the jobs for which they are screening applicants. Let the variable P denote productivity in a particular position. To make the model of the decision process as realistic as possible, we distinguish among the variables that determine productivity on the basis of whether they are known to the researcher, the employer, both, or neither. Let X_{1} be the factors that are known to both researcher and employer. Examples of X_{1} factors might be years of education or labor force experience, or other criteria that are easily observable from an application form. Let X_{2} be a set of factors known to the employer but not to the researcher. Examples of X_{2} might be such fac-
tors as the performance of the applicant in an interview, which is likely to affect screeners’ hiring decisions but unlikely to be observed by the researcher. Let Z be a set of variables known to the researcher but not to the employer. Such variables might comprise information collected as part of a survey of the applicants by the researcher. For example, the researcher but not the employer might know that the applicant’s spouse works at the company. Finally, even the most diligent employers and researchers might not have access to all the factors likely to affect a person’s productivity in a given job. Let Q be a set of factors that affect productivity but are observed by neither the employer nor the researcher.
For simplicity, we assume throughout that all relationships among the variables are linear,^{5} so that X_{1}, X_{2}, Z, and Q determine productivity according to
(7.7)
where B_{1}, B_{2}, G_{1}, and G_{2} are weights capturing the importance of each set of factors in determining productivity in the job. For example, if the factors that are unknown to both the employer and the researcher (i.e., Q) are not very important, G_{2} will approach zero, and QG_{2} will have little affect on productivity P. Similarly, if the Z factors are not very important as determinants of productivity, the weight G_{1} will approach zero.
Two important points must be made about the framework summarized in equation (7.7). First, because this equation is a model of hiring in the absence of discrimination, we exclude the race (labeled R) of the individual from the list of X_{1} factors, despite the fact that R might easily be observed by both the firm and the researcher. In this model, therefore, R has no effect on productivity, which is fully determined by X_{1}, X_{2}, Z, and Q. We relax this assumption below in discussing a case in which race does influence productivity as viewed by the firm because its customers are prejudiced.
Second, equation (7.7) has the virtue of specifying precisely what the decision criteria would be for a rational firm seeking to hire the most productive candidates. In particular, a rational firm will base its hiring decisions on the expected productivity of the applicants, given the information it has, X_{1} and X_{2}. If the firm uses only X_{1} and X_{2} to make its productivity assessment, ignoring race, the firm’s hiring decision will be a function of the expected value of P given X_{1} and X_{2}. The firm’s expectation of productivity conditional on the information it has (X_{1} and X_{2}) can be denoted
E(P | X_{1},X_{2}), which is a function of the information the firm has (X_{1} and X_{2}), taking into account the expected value of the information it does not have (Z and Q). The expected values of Z and Q, conditional on X_{1} and X_{2}, are E(Z | X_{1},X_{2}) and E(Q | X_{1},X_{2}), respectively. After weighting each of these terms by its importance as a determinant of productivity (B_{1}, B_{2}, G_{1}, and G_{2}, respectively), the rational firm’s expectation of productivity will be
(7.8)
We assume throughout that all conditional expectations are linear functions, in which case equation (7.8) can be rewritten as
(7.9)
where
and the equation
defines π_{1} and π_{2}.
The intuition behind π_{1} and π_{2} can be seen by rewriting equation (7.9) as
The π_{1} and π_{2} terms, respectively, capture the ways in which X_{1} and X_{2} affect the firm’s estimate of productivity indirectly via their associations with the unobserved factors Q and Z.
We now turn to the hiring decision itself. For a rational, nondiscriminating firm, the hire probability is an increasing function of E(P | X_{1},X_{2}). For simplicity, we assume that the relationship is linear, in which case
(7.10)
which, using equation (7.9) to substitute for [E(P | X_{1},X_{2})], can be written as
(7.11)
The error term u captures random noise in hiring, as well as the fact that
whether an individual with a particular set of characteristics will be hired at a given point in time will be affected by random variation in the quality of the other applicants at that time. We assume that both of these sources of variation are unrelated to race (R).
For a nondiscriminating firm, note that there is no role for R in the hiring equation even if R happens to be correlated with Z or Q and thus correlated with P. The reason is that the hiring decision y depends on E(P | X_{1},X_{2}), not on E(P | X_{1},X_{2},Z,Q) or P itself. (The firm does not observe Z, Q, or P.) If R is added to the model of equation (7.11), it will enter with a coefficient of zero, and the researcher will typically find that for a nondiscriminating firm there is no evidence of a difference in the hiring rates of members of different racial groups who have the same values of X_{1} and X_{2}. (To focus on the key ideas, we assume throughout this section that samples are large enough that we can ignore sampling error in estimates.)
Figure 7-1 depicts the model of the nondiscriminating firm. The arrows from the box containing the firm’s information (X_{1} and X_{2}) and other factors (Z, Q) to productivity capture the fact that all four variables determine productivity. However, there is no arrow from the box containing (Z, Q) to the firm’s judgment about productivity or to the hiring outcome because the firm does not observe Z or Q. For the same reason, there is no arrow linking actual productivity P to the firm’s judgment about P or to the outcome. Race can be correlated with X_{1}, X_{2}, Z, and Q, but the nondiscriminating firm makes no use of race in making a judgment about productivity or in deciding the outcome. Consequently, there is no arrow from race to the outcome.
Now we allow for the possibility that the firm discriminates and bases its decisions on both expected productivity and race. A simple way to capture this possibility is with the hiring rule
(7.12)
where α is the relative weight placed on race by the firm’s screeners, and 1 –α is the relative weight on productivity. This can be rewritten as
(7.13)
where α′ = (1 –α). The parameter α is the difference in the hiring probability that is due to discrimination on the part of the firm. In contrast to the situation for a nondiscriminating firm, when adding R to equation (7.11) will yield a coefficient of zero, α will not be zero for a discriminatory firm when estimating equation (7.13). Figure 7-2 shows the model of a discriminating firm. In contrast with Figure 7-1, Figure 7-2 introduces an arrow
from race to the outcome Y that represents the influence of discrimination. In this model, the strength of the link is α.
What should be obvious from this discussion is that to correctly estimate α, the researcher must have quite a bit of knowledge about how the firm behaves. First, the researcher must have a solid understanding of how the firm would behave in the absence of discrimination. The above model assumes that a nondiscriminating firm would hire on the basis of expected productivity in the firm. This presumption should guide the search for control variables (X_{1}, X_{2}). Second, the researcher must know how the firm
predicts productivity and must have data used by the firm. In the notation of the model, there must not be any X_{2} variables—variables used by the firm of which the researcher is ignorant. Third, one must know that the firm does not use Z in its hiring decision. Otherwise, if Z were correlated with race and data on productivity were available, it would be easy to draw the wrong inference about α from a joint analysis of productivity and hiring decisions even if there were no omitted X_{2} variables. (For example, the researcher might have data on performance on tests that were administered as part of a survey and would not be observed by firms.) A similar point applies to other variables the researcher happens to observe that have no relationship with P conditional on X_{1}, X_{2}, Z, and Q. Fourth, one must know enough about the relationship between P and X_{1} and between y and E(P | X_{1}) to be able to specify a functional form relating y to X_{1} that is both a good approximation and estimable given the data at hand. In the above discussion, we have specified the relationships to be linear in the variables (which may include nonlinear transformations of a set of underlying variables). Even if they are in fact linear, the researcher will typically not know this to be the case and may have to use flexible functional forms or matching techniques or both.
What would the sources of such knowledge be? To continue with the hiring example, in some relatively rare situations the researcher may have deep knowledge of how hiring decisions are made and have access to nearly the same information as the firm (see the example in Box 7-3). In other cases, the researcher may have general knowledge of the most important determinants of P, based on, for example, case studies of similar jobs or interviews with employers or employees, or other knowledge about how the labor market works that is relevant to the problem. When data on P as well as y are available, the researcher can estimate the relationship between P and X_{1} and so can draw inferences with weaker assumptions.^{6}
Munnell et al.’s (1996) study of discrimination in mortgage lending is an analysis in which the quality of the data used and the level of understanding of how outcomes should be determined in the absence of discrimination are sufficiently high that the results are informative about discrimination in mortgage markets. Munnell et al. previously interviewed lenders to identify the factors important to them in determining the suitability of an applicant for a loan, and, thus, their choice of variables to include in equations (7.9) and (7.13) was well motivated. They make a good case that they
BOX 7-3 A study of gender segregation of jobs by Fernandez and Sosa (2003) is one of very few analyses of how an organization selects individuals from a pool of applicants (other examples include Fernandez and Weinberg, 1997; Fernandez et al., 2000; and Peterson et al., 2000). The Fernandez and Sosa study is unusual for the level of detailed information gathered on the hiring and screening practices in the setting being examined. The authors interviewed company personnel about the specific criteria they used for screening applicants for a specific entry-level job—customer service representative. They then coded nearly 4,300 original paper applications received for that job over a 2-year period to reflect the concerns of the personnel screening applicants. Screeners said they looked for evidence of past experience in financial services or customer service settings. They also preferred people who were employed at the time of application, although they said they shunned “overpaid” people who they feared would be likely to leave prematurely. Screeners also said that they placed relatively little weight on formal education when screening individuals. However, they avoided “overeducated” people because they feared such people would leave quickly. Fernandez and Sosa found that the application pool was two-thirds female, a ratio that roughly matched the gender composition of customer service representatives at the start of the study. Over time during the study, the percentage of women increased with each successive step in the screening process, to 69 percent of interviewees, 77 percent of offers, and 78 percent of hires. According to the criteria recruiters said they found desirable, female applicants were better qualified than males at the time of application. Using these criteria, Fernandez and Sosa then developed predictive models of who would be interviewed and who would be offered a job, conditional on being interviewed. Controlling for the applicant characteristics available from the application material did not fully explain screeners’ apparent preference for hiring females. Although the degree of knowledge available about the hiring process in this study is considerable, it is not complete. For example, the authors did not have access to data reflecting candidates’ performance during interpersonal interactions with screeners, either on the telephone or in person. However, the interviews with the screening personnel and the detailed data collected from the application forms allow for a much closer match between the statistical models used and the process under study than is typical in observational studies. |
have measured the most important factors considered by lenders. They show that differences between blacks and whites and Hispanics and whites in such characteristics as income explain much of the higher rejection rates for minorities, but they also find that a substantial unexplained gap remains. Although some have raised questions about the study (see Harrison, 1998), the authors provide a strong foundation for their conclusion that “a serious [racial discrimination] problem may exist in the market for mortgage loans” (Munnell et al., 1996:51).
PROBLEMS WITH MEASURING DISCRIMINATION BY FITTING STATISTICAL MODELS TO OBSERVATIONAL DATA
In addition to the concept of manipulability discussed in Chapter 5, any causal analysis that fits multiple regression models to observational data must address several issues. One such issue is whether the structure of the regression model—the variables and their functional form—captures the causal mechanism with sufficient accuracy. If, as is frequently the case, the variables or their functional form are not specified in advance of the analysis, there is a danger of overfitting. Also, a purely exploratory “fishing expedition” analysis may not be replicable. To address these issues, variables not appearing in a model but highly correlated with model variables could be substituted to evaluate whether alternative models based on these variables would fit as well. Also, cross-validation by examining the performance of the model on a subset of the data or on another data set could improve confidence in the robustness of the inference. A regression model would be suspect if the coefficients changed drastically when the model was fit to a subset of the data. How to address these generic issues is described in many texts on multiple regression (e.g., Weisberg, 1985).
There are two key limitations of statistical analysis of racial differences based on observational data that we discuss in more depth using the above framework. The first and most important of these is omitted variables bias; the second is sample selection bias.
Omitted Variables Bias
Before we discuss omitted variables bias in the context of the above model, we provide in Box 7-4 a simple illustration of this type of bias involving a study of graduate school admissions (Bickel and O’Connell, 1975), which is an example of what has come to be called “Simpson’s paradox.” The hypothetical data in the example involve a case in which the gender disparity in admissions reverses sign when an additional relevant variable is introduced—hence, the term omitted variable bias. Precisely the same phenomenon can occur in regression studies of racial disparities.
Omitted variables bias poses a serious problem for the large share of studies of racial differences in surveys (e.g., the Current Population Survey or decennial census long-form sample) having only a limited set of the characteristics that may reasonably factor into the processes under study. In such circumstances, it is possible to measure the extent of the difference in the outcome that is associated with race, but it is not possible to decompose
group II, reversing the pattern uniformly exhibited at the department level. This apparent paradox is referred to as Simpson’s paradox and is a simple illustration of the problems that can be faced when examining aggregate data on discrimination. The explanation for this seeming paradox is that the aggregate analysis implicitly assumes that the racial groups are relatively homogeneous with respect to the propensity to apply to the various departments at the college and that the departments are relatively homogeneous with respect to admission rates, but these assumptions do not obtain. A good experimental design—which would not be possible to carry out—might have avoided this problem by restricting the racial groups to applying equally to the four departments. This balancing across departments has not occurred, however. Instead, the aggregate analysis for racial group I is weighted by department as follows: 0.41, 0.41, 0.03, and 0.14. In comparison, the aggregate analysis for racial group II is weighted as follows: 0.07, 0.10, 0.40, and 0.43. This imbalance, in conjunction with the varying admission rates of the departments, has caused the aggregate analysis to be misleading. In that analysis, department is functioning as a hidden variable that is correlated both with admission and with racial group. More informally, what has happened is that racial group I has applied much more often to departments that have lower admission rates, and the reverse has occurred for racial group II. So the aggregate analysis is, roughly speaking, comparing the admission rates for racial group I in departments A and B with the admission rates for racial group II in departments C and D, which is an unfair comparison. An obvious follow-up question is how one knows whether the same mistake is being made in the department-level analyses with respect to improper aggregation of disaggregated analyses. For instance, what if we make the four 2 × 2 tables into eight 2 × 2 tables by splitting according to whether applicants are in state or not in state? Could the findings switch back again? The answer is that we cannot say without further analysis. This is why the search for relevant hidden variables is so crucial. More generally, this example demonstrates the complexity and subtlety of analyses of the presence of discrimination and the need to carefully scrutinize statistical models used for this purpose. |
that difference into a portion that reflects discrimination and a portion that reflects the association between race and omitted variables that also affect the outcome. Because researchers generally do not know whether or what critical variables have been omitted, they must be very careful in making the leap from statistical decompositions in a statistical accounting exercise as discussed above to conclusions about the role of discrimination.
Turning back to the specific regression model of hiring outlined above, if elements of X_{2} are correlated with R and are important for productivity, failure to control for them when estimating the discrimination parameter α will lead to bias in estimates of the extent to which race has an independent effect on hiring probabilities. The model laid out above in equations 7.7 through 7.13, however, allows us to be more precise about ways in which the omitted variables threaten an inference about discrimination. The discussion below provides a foundation for possible solutions to the problem of omitted variables bias.
Omitted variables affect the estimation of α as follows.^{7} The researcher attempts to estimate α by regressing y on X_{1} and R. Given equation (7.12), the coefficients of the regression are estimates of the regression model
where u* is uncorrelated with X_{1} and R by definition of
and because of the properties of the error term u in equations (7.11), (7.12), and (7.13).
Because we are assuming that expectations are linear, we can write
where γ_{1} and γ_{2} are the coefficients relating the mean of to X_{1} and R, respectively. Then the regression of y on X_{1} and R is
(7.14)
where u* is uncorrelated with R and X_{1}. The bias in the coefficient on R as an estimator of α is γ_{2}a_{1}α′. The bias is due to correlation between R and the omitted variables X_{2}. The bias will be positive if R is positively related to the index of omitted factors that raise productivity, controlling for X_{1}; it will be negative if the opposite is true.
Sample Selection Bias
Sample selection bias exists when, instead of simply missing information on characteristics important to the process under study, the researcher
^{7} |
For simplicity and to keep the emphasis on the most likely case, we ignore the fact that the researcher may have some additional information in Z that is unknown to the firm but possibly correlated with X_{2}. (In most circumstances, the researcher will not know much that is relevant for P that the firm does not also know.) |
is also systematically (i.e., nonrandomly) missing subjects whose characteristics vary from those of the individuals represented in the data. The classic example pertains to the study of the wages people could expect to get in the labor market (see Heckman, 1979). In that context, the selection problem arises because wages are observed only for those who enter the labor market and find employment. People who have sought and managed to find work are likely to have better labor market opportunities than those who are observed not to be working.
In the context of our example, discrimination in hiring lowers the incentives of minorities with a given value of X_{1} and X_{2} to apply to a firm. Suppose that data are available only on applicants rather than on the entire pool of potential applicants. If the researcher observes all the information the firm uses in making its hiring decisions (X_{2} has no elements), and if applicants know this is the information the firm uses, in addition to R, to make its decisions, then selection in who applies will not lead to bias when one uses equation (7.14) to estimate α. This outcome is extremely rare. Intuitively, discrimination may lower the probability that a black individual with a given value of X will apply, but estimation of equation (7.14) will still uncover the race difference in the hire probability between blacks and whites who are otherwise identical in the relevant characteristics.
Unfortunately, researchers seldom have as much information on applicant qualifications as the firm, in which case X_{2} has elements, such as performance on an application test or in an interview. Then in the presence of discrimination, minorities with more favorable values of X_{2} will choose to apply more frequently, conditional on X_{1}. Consequently, selection will tend to induce a positive relationship between R and the index a_{1} that influences hiring but is omitted from the regression. (Again, it is omitted because the researcher does not observe X_{2}.) Estimates of the amount of discrimination on the basis of regressions of y on X_{1} and R will be understated as a result.
It is also important to note that, if discrimination in hiring influences the applicant pool, one cannot infer the consequences of discrimination in hiring from knowledge of α alone. One would need to know the effect of discrimination at the hiring stage on the racial mix of who applies.
POSSIBLE SOLUTIONS TO PROBLEMS OF USING STATISTICAL MODELS TO INFER DISCRIMINATION
There are many situations in the social sciences in which the researcher is confronted with an omitted variables problem that is parallel to that discussed above. The most common approach for dealing with omitted variables bias is to use an instrumental variables estimator, which amounts to isolating the relationship between the outcome and a particular source of
variation in the explanatory variable of interest that is unrelated to the omitted factors. This strategy is not likely to be available in observational studies in the case of race because the sources of variation in race—race of parents and perhaps the social definition of race at a particular time and place—are likely to be related to the omitted variables.
Absent such so-called instrumental variables strategies, the best we are likely to be able to do with observational studies of racial discrimination is to specify the model as completely as possible. Consequently, it is critically important that the researcher understand how expected productivity is determined and obtain as rich a data set as possible. The most important variables to control for are ones that are likely to have strong effects on P and to be related to R. Because it is difficult to argue that any finite set of productivity-related factors is complete, this strategy will always yield findings vulnerable to the criticism that variables have been omitted.
Classic experimental designs employ random assignment to treatment as a way to ensure that the treatment is uncorrelated with other factors, omitted or not. If it were feasible to randomly assign people to race, then, in the context of equation (7.14), the γ_{2}a_{1} term would be zero, and estimating α would be a way of detecting whether a firm is discriminating on the basis of race. Similarly, random assignment of race to a pool of applicants could resolve the bias associated with sample selection. Indeed, despite their other complications, the big advantage of audit studies is their ability to manipulate race experimentally and thereby get around these problems (see Chapter 6).
Given our inability to manipulate race in observational studies, what can be done about omitted variables and sample selection bias? In the following sections, we discuss some of the strategies that have been used to address these problems.
Using an Indicator of Productivity to Address the Omitted Variables Problem
In many situations in which people are screened, such as hiring, college admission, and mortgage approval, rational nondiscriminating screeners should base their decision on how well they expect a candidate to perform if hired, admitted, or approved. In this section, we use the hiring example to show that data on actual performance can be very useful in attacking the omitted variables problem when studying discrimination.
To illustrate how such data can be employed, we consider the problem of using equation (7.14) as a basis for assessing discrimination in our hiring example. We will consider two situations: one in which the firm knows enough about the worker to make statistical discrimination irrelevant and another in which race yields nonredundant information about productivity.
Suppose that the indicator or proxy for productivity P* is equal to
where the noise is unrelated to actual productivity P, X_{1}, X_{2}, and race indicator R. The researcher can then estimate gE(P | X_{1}, R) from a regression of P* on X_{1} and R. In general,
In the special case in which ZG_{1} and QG_{2} are unrelated to R conditional on X_{1} and X_{2}, E(P* | X_{1},R) can be written as
(7.15)
This is the special case in which the firm knows enough about the worker to make the information in R redundant. Consequently, the firm has nothing to gain from resorting to statistical discrimination on the basis of R. (See Chapter 4 for a discussion of statistical discrimination.) If, after conditioning on X_{1} and X_{2}, R is still correlated with the productivity determinants Z and Q that are not observed by the firm, then equation (7.15) will not hold. Furthermore, the firm would have an incentive to resort to statistical discrimination. To see this point, note that in general
and in this situation both R and X_{1} and X_{2} are useful for predicting Z and Q and thus P.
Now recall that if the firm uses only X_{1} and X_{2} to judge productivity, then from equation (7.14),
We can identify the discrimination coefficient α by estimating equation (7.15) and then regressing y on the resulting estimate of g_{1} and on R:
(7.16)
If equation (7.15) holds—the case in which the firm has no incentive to
statistically discriminate—the coefficient on R is an unbiased estimate of α. Thus, under these circumstances, expanding the model to incorporate actual productivity permits one to solve the omitted variables problem.
If race yields nonredundant information about productivity, equation (7.15) fails. If the firm chooses to use race as a predictor and statistically discriminate, the estimate of α will be an unbiased estimate of the extent to which the hiring rate depends on R independently of the firm’s belief about productivity. However, the analysis will not reveal whether the firm statistically discriminates when forming its belief about productivity.
In other circumstances, productivity data do not solve the omitted variables problem. If the firm does not statistically discriminate and the special condition of equation (7.15) fails, an estimate of α based on equation (7.16) will be biased. The reason is that the relationship between P and R and X_{1} will be different from the relationship between E(P | X_{1},X_{2}) and R and X_{1}, and hiring is based on E(P | X_{1},X_{2}).
If the firm statistically discriminates on the basis of incorrect stereotypes about how R is related to performance conditional on X_{1} and X_{2}, it is easy to show that the estimation of equation (7.16) will also produce biased estimates of α. The bias in α stems from the fact that actual hiring will reflect the incorrect weights placed by the firm on X_{1}, X_{2}, and R in forming its expectation of P, rather than the relationships the researcher will uncover when estimating E(P* | X_{1},R). That is, the estimate of α will mix two forms of discrimination—the overt or subtle preference for whites given beliefs about productivity and the use of incorrect stereotypes to judge the productivity of blacks or other minority groups.
Sample selection will also influence estimates of the effect of R on productivity P if the researcher leaves X_{2} out of the model. The bias in α based on the estimation of equations (7.15) and (7.16) will depend on the same considerations raised in the discussion of these two equations above.
Unfortunately, few studies have enough information about productivity to attempt to estimate an equation such as (7.16). Altonji and Blank (1999) survey a few papers examining wage differentials using a methodology that fits into the above framework, including Hellerstein and Neumark (1998, 1999), and a series of papers looking at compensation for professional athletes. (See Box 7-5 for a description of a study on racial differentials in compensation.)
A much simpler strategy is available if the researcher actually observes the firm’s belief about P. In this case, one may estimate α by regressing y on R and the firm’s belief. Identifying whether the firm is statistically discriminating is more difficult, particularly in the absence of data on the firm’s beliefs. Progress can be made using variants of the approach of Altonji and Pierret (2001), although strong assumptions are required.
BOX 7-5 A number of studies of compensation of professional athletes closely follow the line of analysis outlined here. Kahn and Sherer’s (1988) study of professional basketball is a good example. They hypothesize that the value of a player depends on his marginal revenue product. They assume that marginal revenue product (the effect of a player on the team’s revenues) depends on the player’s actual performance and on the racial preferences of the team’s fans. They hypothesize that performance is a function of a set of characteristics of the player, including total seasons played, average minutes per game, career free throw percentage, career per-game steals, and so on. They estimate a model relating compensation to these characteristics and race, finding that whites are paid about 20 percent more than blacks with the same characteristics. The study goes on to attempt to determine whether the race gap is due to customer discrimination. Kahn and Sherer find that replacing one black player with a white player with the same performance statistics raises home attendance by 8,000 to 13,000 fans per season. They suggest that part of the salary differential may reflect the response of owners to customer discrimination. |
The joint analysis of productivity and hiring is a step further down the road toward working with a complete model of the process through which hiring and discrimination operate. We have demonstrated how a more complete model can be used to draw inferences when omitted variables are present, which is in the spirit of Fisher’s suggestion that theories be made more elaborate. Furthermore, an important advantage of working with a joint model of productivity and hiring is that some of the basic assumptions underlying the estimation of α become testable. For example, the model implies that within a group, hiring decisions are based on expected productivity. This assumption is testable. We want to emphasize that, even here, strong assumptions about how the hiring process operates must be made to infer the effect of race on hiring decisions.
In Annex 7-1, we consider how productivity data can be used to detect adverse impact discrimination, which we define as adopting hiring criteria in ways that are not justified by productivity considerations and that are harmful to a minority group.
Matching and Propensity Score Methods
Matching methods provide an alternative to multivariate linear regression as a way to control for variables that are likely to matter for an outcome in observational studies. Matching consists of comparing outcomes of two paired individuals (or groups) who are comparable on relevant observed attributes except for race. Matching attempts to mimic the experimental setting in the same way as paired testing. To the extent that (1) the observed factors capture the relevant variables affecting the outcome and (2) the comparability is close, racial differences in the outcome variable in a matching study can be attributed to discrimination. Matching has been the subject of considerable research, and relatively sophisticated methods, such as propensity score matching, have been developed.
The objective in matching is to construct matched sets or strata using relevant nonracial covariates that are available. Analogous to overfitting in specifying a multiple regression, the analyst doing the matching must make the trade-off between matching on too few variables with the result of poor comparability within matched sets and matching on too many variables with the result of poor statistical power and problems with interpretation (i.e., overmatching). A common way to manage this trade-off is to combine matching on a small number of variables that are proven to have large effects together with matching on propensity scores (described below) derived from a larger set of additional variables thought to be relevant.
Propensity score matching addresses the problem that, as the number of covariates increases, it becomes increasingly difficult to find matched sets with similar values of the covariates. Even if each covariate is binary, there will be 2^{p} possible sets of covariate values leading to very fine-grained strata for large p. The propensity score is a device for constructing matched sets or strata when there are many covariates. It is typically estimated by fitting a logistic regression to minority versus nonminority group membership using the covariates as the explanatory variables. Subjects with similar propensity scores are grouped into the same strata to create matched sets.
As compared with multiple regression, matching methods reduce the risk of imposing an inappropriate functional form on the relationship between the outcome y and the observed covariates. Multiple regression models use all the data. In matching, on the other hand, each minority-race individual is typically matched to one or more nonminority individuals. The pool of unmatched nonminority individuals is not used in a matched analysis. Matching is most effective when the minority group is very small as compared with the nonminority group. In this situation, the loss in precision from discarding the nonmatched members of the nonminority group is low. Choosing between matching and regression methods often involves weighing the trade-off between reduced sample size from matching and the
functional-form assumptions needed for regression. See Rosenbaum (2002) for an excellent review of these methods and a discussion of the advantages and disadvantages of matching versus multiple regression in various situations. As noted above, matching techniques are beginning to be used as an alternative to multiple regression in statistical decompositions of racial differences. However, these methods do not help with the key problems of omitted variables bias or sample selection bias because matching is performed on the basis of observed variables only.^{8}
In the same spirit as matching, stratification on relevant variable(s) can also be used to achieve some measures of control on nonracial factors. Stratification approaches include (1) pre- and poststratification of the data for analysis purposes and (2) adjustment of strata to a standard population. Because stratification methods are widely used in the epidemiology literature but not in the statistical discrimination literature, we simply refer the reader to Sarndal et al. (1992) for further information.
Panel Data Methods
Another strand of research using observational data has proceeded by exploiting features that become available with longitudinal data. People do not change race, but one can learn about changes in the consequences of race by following individuals over time. If unobserved characteristics are relatively stable, but outcomes and factors that are related to discrimination do change over time, then the unobserved factors can be partialed out of analyses comparing racial groups over time. (For an overview of the analysis of panel data, see Hsiao, 1986.) In the labor market context, following individuals over time enables one also to examine the differences across regions, industries, sectors (private versus government), and occupation in racial differences in outcomes. The assumption that unobserved factors are not changing over time is a strong one, however, and that assumption typically cannot be approximated well enough to be usable unless the time frame is relatively short. Furthermore, people do not switch regions, change industries, and so on at random, which suggests that longitudinal designs comparing the consequences of changes for whites and minorities are subject to selection bias.
^{8} |
Rosenbaum (2002) discusses methods of examining the sensitivity of results to quantitative assumptions about the correlation between the variable of interest (race in the present case) and omitted factors that may be useful for producing bounded estimates of the discrimination coefficient α (see also Manski, 1995). Manski (2003) discusses other approaches to construction of bounds under very weak assumptions about omitted variables. |
Altonji and Pierret (2001) provide an example of how, at the cost of some very strong assumptions, longitudinal data can be used to draw inferences about discrimination (see Box 7-6).
Natural Experiments
Another approach to addressing the problem of omitted variables and limited understanding of how a nondiscriminating firm would make decisions is to exploit so-called natural experiments. As discussed above, using the experimental approach in a controlled setting makes it easy for an experimenter to directly ascertain the effects of explanatory variables on some outcome of interest. If the experimenter could force some firms to be
BOX 7-6 Altonji and Pierret (2001) explore the implications of a hypothesis they refer to as “Employer learning with statistical discrimination,” using the National Longitudinal Survey of Youth, 1979. If profit-maximizing firms have limited information about the general productivity of new workers, they may choose to use easily observable characteristics, such as years of education or race, in judging workers, even if doing so means violating the law in the case of race. The theory put forth by Altonji and Pierret implies that, as firms acquire more information about a worker, pay may become more dependent on productivity and less dependent on easily observable characteristics or credentials. One implication of their model is that, (1) if blacks and whites differ in labor market productivity because of difficult-to-observe premarket factors such as school quality and (2) if employers statistically discriminate against blacks, this situation will lead to a race gap in initial wages, although it cannot easily explain race differences in wage growth. On the other hand, if firms do not statistically discriminate on the basis of race, a wage gap will open up as firms directly observe productivity. Altonji and Pierret’s empirical results call into question the statistical discrimination explanation of wage differences because they observe that the race gap in wages is small at labor market entry and grows with experience in the labor market. The authors point out many caveats to their study, including the possibility that differences between blacks and whites in access to training and promotion opportunities, perhaps because of other, more overt, forms of discrimination, may explain some of their findings. With data on productivity and training as well as wages, the analysis would be strengthened. |
nondiscriminators, their behavior could be compared with that of a control group of firms, and discrimination could be measured as the difference. When an experimental design is not practical, researchers can use natural experiments to observe the natural variations that occur both before and after a specified time period during which an intervention is introduced. Thus, the researcher observes some exogenous intervention, such as a policy change that affects procedures governing hiring or college admission. Instead of random assignment, treatment and comparison groups are defined, and naturally occurring events are used for comparisons.
Social scientists have used a “differences-in-differences” approach (i.e., the racial difference in some outcome of interest both before and after an intervention) to test the effects of changes occurring at some specified time period that affect some firms or other actors but not others (see, e.g., Card and Krueger, 1994; Tyler et al., 1998). In the language of causal modeling, the policy change is a formal manipulation, which is applied to some actors (e.g., firms in a particular industry or state) but not others. (In some studies, the policy change affects all actors and the comparison is done before and after the change.) The pre-policy-change data are used to estimate the counterfactual condition of what would have happened had the policy change not occurred. Such designs are also sometimes termed quasi-experiments. Because there is some degree of control, the assumptions made for natural experiments to support a causal inference need not be as strong as those required for uncontrolled observational studies; however, natural experiments fall short of randomized controlled experiments. (For more detail on natural experimental designs, see Campbell and Stanley, 1963; Meyer, 1995; and Shadish et al., 2002.)
The key idea in the context of discrimination is to find settings in which there is an exogenous source of variation in the weight that firms, schools, lenders, and other actors could place on R that can be plausibly thought of as being independent of the unobservable factors. In the hiring case, the idea is to contrast hiring policy before and after a change in procedure that restricts or eliminates the extent to which the firm can use race in hiring decisions.
Examples of Natural Experiments to Study Discrimination
Below we provide several examples of natural experiments that were used to study racial discrimination in the labor market, education, and health care domains.
Labor market. Holzer and Ludwig (2003) discuss the empirical research using natural experiments in the labor market domain to test the effectiveness of antidiscrimination laws (e.g., Freeman, 1973; Heckman and Payner,
1989) and the effects of a policy change on different racial and ethnic groups (e.g., Chay, 1998; Neumark and Stock, 2001). The basic approach is to determine whether the law has had the result of forcing α to be zero, or at least closer to zero than was the case before the law. By comparing employment rates before and after the change in the law, one can draw inferences about the extent of reduction in discrimination after the change.
Another source of variation in employment policy to examine could be changes at the firm level within an industry that are not necessarily mandated by changes in the law. Goldin and Rouse (2000), for example, focused on an interesting setting for an attempt to measure discrimination in hiring—a professional orchestra. Even though they considered discrimination against women, the methodology they used applies to other groups as well. In the 1970s and 1980s, many orchestras began to hide the auditioning musician from the jury by using a screen or other device. The representation of women in orchestras, especially among new hires, increased dramatically over this period, and the question investigated was the extent to which this increase was attributable to the change in audition practices. Goldin and Rouse obtained data for a number of years on the auditions for a set of nine orchestras; some of these auditions took place with the screen, while others did not.
Basically, Goldin and Rouse used regression models to estimate whether an individual advanced from one round of auditions to the next and whether an individual was hired in the final round as a function of three things: (1) type of audition (blind versus not blind), (2) the interaction between gender and type of audition, and (3) controls for characteristics of the individual and the audition. By construction the weight on gender in the blind audition was 0, because the gender was unknown to those judging the musicians. Because the researchers had multiple observations of the orchestras, they were able to include orchestra-specific constants to control for unobserved differences among orchestras that might have been associated with adoption of a screen. They were also able to include person-specific constants to control for unobserved differences in the quality of the musicians to guard against the possibility that the use of a screen influences the relative quality of the men and women who audition, which made the results weaker.
Overall, Goldin and Rouse found that women were much more likely to advance and be hired when auditions were blind and concluded that the introduction of a screen led to reduced discrimination against women. Their work provides an example of a case in which a change in policy made discrimination at a key stage of the hiring process much more difficult than it had previously been. Moreover, the increase in the rate of hiring of women after the change demonstrated that discrimination existed prior to the
change. The obvious limitation of this work is that it is dangerous to draw broad conclusions about discrimination in hiring from the orchestra case.
Yet another source of variation in employment policy to examine could be changes in wages in an industry in response to competitive pressure. The idea is that prejudiced firms may indulge biases when economic rents (excess profits) are available. Competitive pressure may reduce the rents, forcing firms to either reduce discrimination (hire the best people) or go out of business. This situation would lead to a reduction in α, the weight placed on race.
Black and Strahan (2001) provide one of the cleanest of these studies, and although they focused on gender discrimination, the idea can be applied elsewhere. They exploited the fact that regulations constraining entry by banks into new markets were relaxed beginning in the mid-1970s. Using data from the mid-1970s through 1997, they found that following deregulation the average wages of bank employees declined relative to the wages of nonbank employees. The authors used multivariate regression models to implement a triple-differencing strategy to distinguish the effect of deregulation from fixed characteristics of states and wage and employment trends at the state level that happen to be correlated with deregulation. The strategy amounts to taking the difference between the growth in wages of bank and nonbank employees in states that undergo deregulation at a certain point in time and comparing it with the corresponding difference in wage growth rates for bank and nonbank employees in states that did not undergo deregulation at that point in time. Black and Strahan show that deregulation led to a decline in the gap between the wages of men and women for two reasons: First, women moved into higher-skill occupations; second, the wages of men fell more than the wages of women in a given occupation. This evidence is consistent with some models of gender discrimination.
Health care. Chay and Greenstone (2000) examined trends in black–white infant health outcomes between 1955 and 1975. The authors fit simple trend-break regression models to vital statistics data for blacks and whites in rural and urban areas of different states. They used a time trend variable to measure the average trend in infant mortality rates across states (1955–1965) and an indicator variable to measure the change in the infant mortality trend after 1965. They controlled for differences across states (by race and rural versus urban area) that might be correlated with infant mortality.
The regression results showed a significant trend break in health outcomes for black and white infants after 1965, although improvements were more pronounced for blacks. The authors note that before 1965 black infant mortality rates were high relative to whites. Between 1965 and 1975, however, there was evidence of a sharp decline in black infant mortality
rates and convergence of these rates after 1965 (particularly in the rural South). Chay and Greenstone suggest that the implementation of two federal interventions—Title VI of the 1964 Civil Rights Act (prohibiting discrimination and segregation in access to care) and the Maternal and Child Health Services Program under Title V of the 1935 Social Security Act^{9}—could explain the convergence of black–white infant mortality rates after 1965.
Because the trend-break patterns showed similar improvements for whites across all regions after 1965, it is possible that other causal factors along with race might explain the post-1965 changes. The authors also report a strong correlation between “differential convergence in infant mortality rates” and “differential convergence in black–white hospitalization rates across states” (2000:330). Thus, the federal interventions, and possibly other factors, played an important role in the changes in relative infant mortality rates.
Education. Holzer and Ludwig (2003) provide some examples of natural experiments in the education domain that can be used in research to examine the effect of racial differences in educational inputs on relative outcomes. One type of natural experiment in the education domain looks at discriminatory educational policies and practices and assesses their effects on education outcomes. Examples are studies looking at the adverse effects of “separate but equal” laws on the educational attainment of blacks prior to the ruling in Brown v. Board of Education (Boozer et al., 1992; Donohue et al., 2002; Margo, 1990) and studies of the effects of school desegregation orders implemented after the Brown ruling (Guryan, 2001). Such experiments can be useful for measuring discriminatory practices in education but are difficult to apply in this domain. Holzer and Ludwig (2003:1167) offer this perspective:
Evaluating how these natural experiments change the allocation of educational inputs across or within schools may help highlight the degree to which racial discrimination affected educational decisions in the past. One limitation with this approach is that social scientists are limited to either detecting discrimination within a given jurisdiction retrospectively rather than prospectively, or must extrapolate from evidence of past discrimination in one jurisdiction to other areas where policy makers seek guidance on future enforcement or policy actions.
Another type of natural experiment focuses on the general relationship between educational inputs and outputs. For instance, one such experiment might examine the effects of a policy change in tracking or ability grouping on student outcomes. Differences in student outcomes within one school before and after the policy change could be compared with outcomes in another school that did not experience a policy change to determine whether discrimination played a role (see Holzer and Ludwig, 2003, for further details). Holzer and Ludwig conclude that natural experiments are valuable tools for determining whether observed racial differences in inputs constitute racial discrimination and for measuring the effects of such differences.
Limitations of Natural Experiments
In the context of the study of discrimination, as well as in other arenas, natural experiments have limitations. First, the change under study may be endogenous. That is, it may be a reaction to particular circumstances that warranted a policy change or intervention. As a result, one may not be able to generalize from the results of a study to estimate the average amount of discrimination prior to the change. For example, suppose one is trying to measure discrimination by comparing hiring rates in a particular firm before and after an intervention by the Equal Employment Opportunity Commission (EEOC) with those of firms in the same industry around the same time period. Assuming that the EEOC responds to the most serious cases, the estimated effect would tend to overstate the amount of discrimination in the industry at large prior to the intervention.
Second, the effects of policy interventions may spill over into the control groups used in the study. For example, the effects of heightened EEOC activity involving a particular set of employers in a given industry might influence the behavior of other firms and industries even though they have not been targeted. This phenomenon would reduce estimates of the effect of EEOC enforcement based on a “differences-in-differences” design.
Third, differences in trends in other factors that affect outcomes cannot always be addressed adequately even in differences-in-differences designs, particularly when the policy intervention takes place over a period of time, as is the case with civil rights policy.
Fourth, a change in one domain, such as school desegregation orders, may be accompanied by changes in another domain, such as housing, or by a change in attitudes. Consequently, it may be difficult to use a change in policy in one domain to identify the amount of discrimination in that domain prior to the change.
Fifth, only in rare circumstances (such as the Goldin and Rouse orchestra study) can one be sure that the change in policy under study has eliminated a role for discrimination in the decision under study. In most cases,
the best one can hope for is that a comparison of groups affected by the change in policy will identify the reduction in discrimination induced by the policy (the change in α), rather than the level of discrimination that existed prior to the change.
Sixth, in some cases, changes in policy that lead to positive effects in one dimension may induce negative effects in others. A major concern in the literature on the effects of antidiscrimination policy in the labor market, for example, is that positive effects on wage rates for blacks have been offset in part by negative effects on employment (see Altonji and Blank, 1999, for discussion and references).
Finally, natural variation in the data may be insufficient to identify the effects of interest or may be correlated with other, unmeasured factors that may bias the results. (See Holzer and Ludwig, 2003, on the use of natural experiments to study discrimination; see Shadish et al., 2002, and Meyer, 1995, for a general discussion of the strengths and weaknesses of these designs.)
Summary of Possible Solutions to Problems of Using Statistical Models to Infer Discrimination
It should be obvious that more accurate and complete data collection efforts are critical to reducing the key problem of omitted variables bias. Of course, the data needed must pertain to the particular domain of analysis. Data on performance (e.g., productivity in the hiring context, default rates in the lending context) and detailed knowledge of how an outcome depends on performance can solve the problem of omitted variables bias in some cases. However, situations in which the researcher will possess the data and detailed knowledge needed to support specification of an appropriate model are relatively scarce, at least in the labor market setting.
Matching and propensity score methods are useful as a means of relaxing assumptions about the functional form relating the variables X_{1} and X_{2} to productivity and to hiring decisions. However, they do not solve the omitted variables bias problem.
Panel data are useful as a way of identifying differences in the amount of discrimination across types of institutions, regions, or time. However, this approach requires the assumption that time-varying unobserved characteristics of the individual are not related to mobility, which is a strong assumption.
Natural experiments in which a legal change or some other change forces a reduction in or the complete elimination of discrimination for some groups provides leverage in assessing the importance of discrimination prior to the change and for groups not affected by the change.
ADDITIONAL ISSUES
Thus far we have discussed prospects and problems for measuring discriminatory treatment of persons who are identical except for race. In the context of our hiring example, the parameter α measures discriminatory treatment of blacks and whites with the same values of X_{1} and X_{2}, the variables known to a firm to determine productivity. The model developed above, however, also sheds light on other discriminatory processes.
Effects of Past Labor Market Discrimination on Factors in Hiring
Discrimination by a firm or elsewhere in the labor market may influence some of the elements of X_{1} and X_{2}, the (nonracial) characteristics used by the firm to make hiring decisions. For example, some studies of hiring are based on whether the person was referred by an existing employee (Fernandez and Weinberg, 1997; Fernandez et al., 2000; Petersen et al., 2000). Current labor market discrimination against minorities by the firm will lead to a discrepancy in the probability that minority applicants will know people who work in the firm because personal networks tend to run along racial lines. Even if the use of referrals in hiring is justified by productivity considerations, the total effect of current discrimination will be understated if one holds constant whether a person was referred to the firm. In particular, the parameter α will understate current discrimination.
Alternatively, suppose that in the past a firm discriminated against disadvantaged racial groups but no longer does. Continuing with the referral example, again suppose that the firm makes use of referrals in hiring decisions. If the researcher is simply interested in whether the firm treats applicants with a given set of characteristics differently at the present time, and if the researcher observes all the variables the firm uses to assess productivity (there are no X_{2} variables that the researcher does not observe), the researcher will draw the correct conclusion of no such differential treatment. In this case, α will be zero. However, if the researcher wants to know the total effect of both past and current discrimination on the part of the firm on the racial composition of current hires, it is incorrect to take as given whether employees were referred. The reason is that past discrimination led disadvantaged racial groups to be underrepresented among the pool of potential referrers, thus reducing the chances of attracting disadvantaged racial groups through referrals. To measure the effect of both past and current discrimination on current outcomes in this dynamic context, the researcher must model the effect of past discrimination on current X variables.
To give another example, it is standard practice for many types of jobs—and in many situations defensible from a productivity standpoint—
to consider past work experience when trying to predict productivity. However, past experience will be influenced by discrimination in the labor market. Consequently, the coefficient on R will provide an estimate of the effect of discrimination on the firm’s behavior, given X_{1} and X_{2}. But because discrimination in the labor market leads to a racial gap in the experience-related components of X_{1} and X_{2}, the coefficient on R will understate the total effect of all discrimination that has taken place in the labor market.
Furthermore, discrimination in the labor market may influence the choices of X_{1} and X_{2} that people make before they enter the labor market. For example, if African Americans know they are discriminated against for white-collar positions and college has little value in the blue-collar world, they will have less incentive to pursue a college education. If one uses a model such as equation (7.13) to measure labor market discrimination against blacks in white-collar positions holding education (one of the X_{1} variables) constant, one will underestimate the total effect of discrimination on the racial composition of such jobs. Similarly, if firms develop a reputation of having a hostile work environment for racially disadvantaged groups and if such applicants avoid seeking employment at those firms, a model such as equation (7.13) will underestimate the total effect of discrimination in hiring decisions. This will be the case even if the researcher observes all of the variables used by firms to choose employees. Developing measures of the discrimination that results from a process such as that described above is extremely challenging because of the much longer timeline and more complex environment that must be accounted for to reach statistically valid “all else being equal” conclusions. We address these issues in more detail in Chapter 11.
Effects of Discrimination in Other Domains
To measure the total effect of discrimination in society on a particular outcome, such as the odds of getting hired, one needs to measure the effects of discrimination in other domains on elements of X_{1} and X_{2} that are determined outside of the labor market (see Chapter 11). In our example of hiring, if there is racial discrimination in “pre-market factors” (Neal and Johnson, 1996), such as education, that are related to labor market success, discrimination in the educational sphere will also affect labor market success indirectly. Thus controlling for education in a hiring equation is reasonable in assessing whether a particular employer is discriminating.
However, if there is racial discrimination in the educational domain, controlling education will understate the total effect of all racial discrimination in analyses of labor market discrimination alone. Developing and validating statistical models of these broader processes is one way to gain insight into the presence or absence of discrimination in these other areas.
One’s choice of control variables is influenced by whether one is trying to measure discrimination in a specific domain or the cumulative impact of discrimination.
Other Discriminatory Effects on the Productivity Equation
Another issue concerns whether discrimination on the part of customers, coworkers, or suppliers leads characteristics of the worker, including race, to enter the productivity equation (7.9).^{10} Consider Becker’s (1957) theory of customer discrimination, and consider sales positions. Suppose that white customers prefer to buy from white salespeople, and black or Latino customers are indifferent. In such a world, P is influenced by the match between the race (or ethnicity) of the job candidate and the racial composition of the customers. R will not appear directly in the equation for productivity, but the interaction between R and the racial composition of the customer base will. If the firm obeys the law, it will not apply the interaction variable in making decisions about hiring, and the interaction variable will not enter significantly into hiring decisions. (The interaction will show up in a productivity regression.) If the firm disobeys the law, the interaction term will influence hiring and show up in a hiring regression. One will then conclude correctly that firms discriminate for or against black or Latino salespeople as a function of the customer base. If one excludes the interaction term but adds R to the hiring equation, one will likely find evidence that the firm discriminates against minorities if most of the markets for which the firm is hiring happen to be heavily white. But one will not detect the fact that the nature of the discrimination is related to the match between customers and the sales agent. If there are data that can be used to estimate the effect of the interaction between race and customer composition on productivity, one can see whether hiring decisions appear to reflect such considerations. A number of studies of professional sports take this approach (see Altonji and Blank, 1999, and Kahn, 1991, for examples).
A somewhat different example involves the possibility that discrimination in social institutions that are extraneous to the firm or the labor market influences the form of the productivity equation. Consider again a marketing position. Suppose that social connections play a critical role in marketing. In such a world, sales productivity may well depend on club memberships, where one lives, the schools one attended, and the like. Variables measuring such social connections belong in X_{1} and X_{2}. R may have no relationship to productivity or to hiring decisions if one conditions on
these variables. Now suppose that societal discrimination (including housing discrimination) influences social connections. In this case, discrimination outside the labor market will lead to a race gap in some elements of X_{1} or X_{2} or both, as well as in hiring, even though the variable R will not have an independent effect on hiring conditional on X_{1} and X_{2}. Finally, note that the recruiting strategies chosen by the firm are likely to influence the importance of social networks. Strategies that place more emphasis on personal contacts and less on advertising may not be race neutral. A discriminating firm may consciously choose a recruiting strategy in which social networks are important and then exclude minorities who lack them.^{11} It will be difficult to determine whether the firm’s recruitment strategy is really the profit-maximizing one or in fact is shaped at least in part by the goal of discrimination.
SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS
The main purpose of this chapter has been to review the strengths and limitations of various approaches to dealing with the challenges of measuring discrimination with statistical methods using observational data. This review leads to several conclusions.
Our first conclusion relates to the uses and limitations of statistical decomposition of gaps in outcomes among racial groups:
Conclusion: The statistical decomposition of racial gaps in social outcomes using multivariate regression and related techniques is a valuable tool for understanding the sources of racial differences. However, such decompositions using data sets with limited numbers of explanatory variables, such as the Current Population Survey or the decennial census, do not accurately measure the portion of those differences that is due to current discrimination. Matching and related techniques provide a useful alternative to race gap decompositions based on multivariate regression in some circumstances.
More generally, we will often be hampered in our ability to infer discriminatory behavior on the basis of regression decompositions because we can never be sure we have included all the relevant controls in the model. We must be able to control for the relevant variables well enough to approximate closely the hypothetical counterfactual in which only race has been changed.
Our second conclusion follows naturally from the first:
Conclusion: Nationally representative data sets containing rich measures of the variables that are the most important determinants of social and economic outcomes—such as education, labor market success, and health status—can help in estimating and understanding the sources of racial differences in outcomes. Panel data may be particularly important and useful (see Chapter 11).
Not only must statistical models for estimating discrimination use appropriate data and methods, but they must also be based on as thorough as possible an understanding of the processes that underlie the behavior being studied. Otherwise, the models are likely to require strong assumptions that cannot be justified. More generally, the properties of the model used for analysis are crucial in assessing claims of statistical “proof” of discrimination. Researchers must provide sufficient information on their model to enable others to understand and make a judgment about whether the assumptions underlying the model have been met.
Conclusion: The use of statistical models, such as multiple regressions, to draw valid inferences about discriminatory behavior requires appropriate data and methods, coupled with a sufficient understanding of the process being studied to justify the necessary assumptions.
The specific model we developed in the context of the decision to hire a worker illustrates the role played by assumptions and theory in drawing causal influences based on observational data. It also sheds light on how omitted variables and sample selection biases affect our ability to draw conclusions about discrimination and helps make clear what forms of discrimination are measured and what forms are not.
Data on performance relevant to a particular domain, such as productivity in the labor market context or academic success in the educational arena, are extremely valuable in dealing with the problem of omitted variables bias, in permitting the testing of key assumptions of a statistical model, and in studying adverse impact discrimination (see Annex 7-1 below). Natural experiments, although they have limitations, provide another way to address the problems of omitted variables bias and limited knowledge of the decision processes of particular actors.
Conclusion: We see an important role for focused studies that target particular settings (e.g., a firm or a school), whereby it is possible to learn a great deal about how decisions at each stage in a process are made and to collect most of the information on which decisions are
based. With such knowledge and data, it becomes much easier to specify an appropriate statistical model with which to estimate racial discrimination.
Conclusion: Despite limitations, natural experiments—in which a legal change or some other change forces a reduction in or the complete elimination of discrimination against some groups—can provide useful data for measuring discrimination prior to the change and for groups not affected by the change.
Recommendation 7.1. Public and private funding agencies should support focused studies of decision processes, such as the behavior of firms in hiring, training, and promoting employees. The results of such studies can guide the development of improved models and data for statistical analysis of differential outcomes for racial and ethnic groups in employment and other areas.
Recommendation 7.2. Public agencies should assist in the evaluation of natural experiments by collecting data that can be used to evaluate the effect of antidiscrimination policy changes on groups covered by the changes, as well as groups not covered.
ANNEX 7-1: DETECTING ADVERSE IMPACT DISCRIMINATION
We discuss here ways to detect adverse impact discrimination; that is, discrimination by using factors that correlate with race. A firm may not use race directly, but it may weight variables in hiring decisions in a way that is not proportionate to their influence on productivity. For example, suppose the firm uses
as its productivity rating rather than the correct index
and hires accordingly. In this case, y will be determined by
(A7.1)
where α′ is (1 –α) as before. It is quite possible for α to be 0 even though the firm’s hiring rule has an adverse impact on R that is not justified by
productivity considerations. That is, α can be zero even though R is systematically related to the difference between the index P^{f} and the unbiased productivity index E(P | X_{1},X_{2}). We define this as adverse impact discrimination. The legal requirement that firms validate hiring criteria having an adverse impact on protected classes of workers is designed to prevent this form of discrimination.
In general, it will be difficult to detect that the firm is behaving in accordance with equation (A7.1) without information on P. Suppose, however, that the researcher has an unbiased indicator P* of P as well as data on X_{1} but not X_{2}. Then the researcher can estimate the coefficients θ_{1} of the conditional expectation
If firms are hiring on the basis of expected productivity given X_{1} and X_{2}, then E(y | X_{1}) = E(y | X_{1}θ_{1}). Consequently, one can test the null hypothesis that firms are hiring on the basis of expected productivity given X_{1} and X_{2} by testing the restriction that
One can test this restriction by regressing y on X_{1}θ_{1} and X_{1} (with one element of X_{1} excluded because of collinearity) and testing the null hypothesis that the elements of X_{1} have no effect on y, holding X_{1}θ_{1} constant. From a regression of E(y | X_{1}) on R and X_{1}θ_{1}, one can estimate the race gap for workers with a given value of X_{1} that is due to the firm’s policy. Without special assumptions, however, one cannot estimate the effect on group R of the firm’s misuse of X_{1} and X_{2} without having data on both variables. Unfortunately, even a noisy indicator of productivity is unavailable in most of the data sets used to study racial differences.