Statistical Analysis of Observational Data

**T**hus far we have made the case that randomized controlled experiments are the best approach available to researchers for drawing causal inferences. In the absence of experimental design, causal inference is more difficult. However, applying statistical models to observational data can be useful for understanding causal processes as well as for identifying basic facts about racial differences. Indeed, observational studies are the primary tool through which researchers have explored racial disparities and discrimination. The main goals of this chapter are to delineate the strengths and problems associated with measuring discrimination using observational studies and to identify methodological tools that are particularly promising for application in certain areas of research on discrimination.

We begin by discussing statistical decompositions of racial differences in outcomes using multivariate regressions. These decompositions are basically descriptive but are nevertheless an important tool for understanding what factors are related to observed differences as well as for measuring the magnitude of racial differences. In the next section, we continue with an outline of the fundamental issues that must be addressed to draw causal inferences about racial discrimination from statistical analyses of observational data. We illustrate the main issues by laying out a statistical model that can be used to measure discrimination in hiring decisions. As we see it, the hiring example is robust in the sense of surfacing all of the conceptual issues that hamper research on discrimination across domains, including the five domains on which we focus in this report (labor markets, education, housing, criminal justice, and health care). We discuss the strengths

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 118

Measuring Racial Discrimination
7
Statistical Analysis of Observational Data
Thus far we have made the case that randomized controlled experiments are the best approach available to researchers for drawing causal inferences. In the absence of experimental design, causal inference is more difficult. However, applying statistical models to observational data can be useful for understanding causal processes as well as for identifying basic facts about racial differences. Indeed, observational studies are the primary tool through which researchers have explored racial disparities and discrimination. The main goals of this chapter are to delineate the strengths and problems associated with measuring discrimination using observational studies and to identify methodological tools that are particularly promising for application in certain areas of research on discrimination.
We begin by discussing statistical decompositions of racial differences in outcomes using multivariate regressions. These decompositions are basically descriptive but are nevertheless an important tool for understanding what factors are related to observed differences as well as for measuring the magnitude of racial differences. In the next section, we continue with an outline of the fundamental issues that must be addressed to draw causal inferences about racial discrimination from statistical analyses of observational data. We illustrate the main issues by laying out a statistical model that can be used to measure discrimination in hiring decisions. As we see it, the hiring example is robust in the sense of surfacing all of the conceptual issues that hamper research on discrimination across domains, including the five domains on which we focus in this report (labor markets, education, housing, criminal justice, and health care). We discuss the strengths

OCR for page 118

Measuring Racial Discrimination
and limitations of existing approaches to measuring discrimination across these domains and suggest how approaches prevalent in one domain might usefully be employed in others.
Even a cursory review of the literature on labor markets, education, housing, criminal justice, and health care reveals that it is quite common for researchers to employ statistical models when addressing questions of racial discrimination (see Table 4-1). Given the range of domains we examine, we do not attempt to be exhaustive in our presentation. Instead, we provide examples from individual studies in particular domains to illustrate particular methodological issues. Our intent is to summarize what we see as the most important challenges that arise in using statistical models to study racial differences in outcomes. And although we make frequent use of labor market concepts as concrete examples throughout this chapter, the fundamental statistical issues underlie the measurement of discrimination in all domains.
It should be noted that the style of exposition in this chapter is more mathematical than that in the rest of the report. This mathematical presentation is necessary to make clear what statistical decompositions of racial differences measure. It is also needed for precision regarding the role of models as descriptions of the ways in which outcomes are determined in the presence of discrimination, the role of models and assumptions in drawing causal inferences regarding discrimination from observational data, the nature of the biases that arise when those assumptions are violated, and the ways in which alternative study designs can reduce those biases.
STATISTICAL ANALYSIS FOR RESEARCH AND LITIGATION
Before we proceed, a caveat is in order. This chapter attempts to illuminate state-of-the-art statistical methods that should be used by academic researchers attempting to detect the existence and magnitude of racial discrimination in a wide variety of domains. Statistical proof of racial discrimination may often be sought in other contexts in which the same degree of attention to methodological detail may be valued differently. In particular, courts are often called upon to decide discrimination cases in circumstances that are far less congenial to the detailed and sustained analysis of the academic researcher. Litigants often press for expert testimony based on something far short of state-of-the-art statistical practices that academic researchers might employ. In some instances, a straightforward analysis of the available data may appear to make a compelling case, but many outside the courts would argue for more details and alternative analyses to buttress the arguments.
In a paper commissioned for this panel, Nelson and Bennett (2003) investigate the courts’ use of statistics to make decisions in cases alleging

OCR for page 118

Measuring Racial Discrimination
racial discrimination in employment. They analyze published federal court opinions on racial discrimination in employment that refer to “statistics” in some form. They compare practices in cases published in 2000–2002 (178 opinions) and in cases published in 1980–1982 (124 opinions) to evaluate changes over time. For cases published in 2000–2002, a preliminary analysis revealed that courts treat statistical data on racial discrimination conservatively; in other words, “they are reluctant to reject the null hypotheses of nondiscrimination and they are reluctant to hold that plaintiffs have met their burden of proof” (Nelson and Bennett, 2003:2). Similar results were found for the 1980–1982 period.
Most courts expect to see statistical evidence presented by plaintiffs in employment discrimination cases; Nelson and Bennett conclude, however, that in most cases courts are skeptical of statistics used to prove discrimination. Moreover, they do not appear very often to base opinions on statistical evidence in contrast to Supreme Court precedent or other judicial rulings. In fact, courts are relying on statistics less frequently now than they did in the 1980s, even though statistical techniques have improved. Moreover, the interpretation of statistical evidence varies across courts and cases.
Overall, the lack of credence given by courts to statistical evidence and the complexities of drawing inferences about racial discrimination from such data appear to be detrimental to plaintiffs. In both periods examined by Nelson and Bennett, plaintiffs lost to defendants more than three times to one, and it is becoming increasingly more difficult for plaintiffs to convince courts that their claims are valid.
The main reasons cited for not relying on statistical data in judicial opinions are (1) relatively small sample sizes, (2) difficulty in defining the comparison groups, (3) lack of relevant controls for nondiscriminatory explanations for disparities, and (4) the use of aggregated data across multiple job levels in a class action suit. These statistical issues (particularly the first three) were prominent in the cases examined within each time period.
Moreover, although there have been many sophisticated advances in statistical analyses, the analyses used in court cases typically involve simple comparisons between the racial composition of an applicant pool or a potential promotion pool and a set of selection outcomes (such as hiring or promotion). Few cases involve rigorous assessment of the use of multiple regression and other multivariate analyses. Courts discount small samples without considering the probabilities of outcomes, displaying a lack of statistical knowledge and reasoning. Courts also have no consistent approach to dealing with these problems. A recent Supreme Court decision, however (Desert Palace v. Costa [No. 02-679]), appears to open the door to expanded use of statistical methods to support inferences of discrimination in legal proceedings.

OCR for page 118

Measuring Racial Discrimination
STATISTICAL DECOMPOSITIONS OF RACIAL DIFFERENCES
Two types of regression models have been used to decompose racial differences in outcomes. They are (1) regression models with race-specific intercepts, which assume that the effects of other variables (e.g., education) are similar for race groups, and (2) race-specific regression models that allow for interaction effects between race and other variables. All such models pose problems for interpretation.
Regression Models with Race-Specific Intercepts
A standard way to explore the difference in an outcome between groups is to decompose the difference into “explained” and “unexplained” components. To illustrate, suppose the researcher is comparing outcomes of white and black men. The simplest formulation is captured by the regression model
(7.1)
where Yi is the outcome of interest, such as a wage rate, with i indexing the individuals in the sample; R is an indicator variable that takes on the value 1 for blacks and 0 for whites; Xi is a set of variables that are believed to be relevant to the determination of Yi; β0 is the intercept; β is a vector of coefficients on the variables in Xi at a point in time; the coefficient γ captures the difference between groups in the average value of Y that is not accounted for by differences in X; and the error term ui captures the effect of other factors that influence Yi. The coefficients γ and β and ui are defined so that the mean of ui is unrelated to Ri and Xi. As we explain in detail below, unless the researcher is confident that he or she has measured all of the variables that are both correlated with R and relevant to the determination of Y, the model should be interpreted as descriptive rather than causal.
Let Yb and Xb be the mean of Y and X for black men, and let Yw and Xw be the mean of Y and X for white men. For concreteness, let Y be the wage rate. The average difference between the groups is
(7.2)
This difference can be decomposed as
(7.3)
The term (Xw –Xb)β is the contribution of group differences in the observed characteristics X to the race gap in Y. For example, studies of the wage gap

OCR for page 118

Measuring Racial Discrimination
almost always include a measure of education among the variables in X. The product of the difference between whites and blacks in the education measure and the coefficient relating education to Y is the contribution of the education difference to the gap. The parameter γ is the portion of the group difference in the means of Y that is not accounted for by the difference in Xw and Xb given the weights β that the X variables have for a given time period. This parameter is a “catchall” that includes the effects of group differences in omitted factors that would influence Y in the absence of discrimination, as well as the effect of discrimination.
Everett and Wojtkiewicz (2002) provide a good example of this technique from the criminal justice literature and illustrate the fact that the technique is not restricted to linear regression models. They examine the racial and ethnic disparities in federal sentencing following implementation of guidelines that were intended to ameliorate past disparities in sentencing. They estimate ordinal logistic regressions to assess the relative odds of receiving a sentence within each successive quartile of the sentencing range for the committed offense. Adding various legal and extralegal factors to a baseline model including only indicators of race and ethnicity (black, Hispanic, Native American, and Asian, with white as the comparison group), they examine changes in the log-odds coefficient for each indicator. They find that significant racial differences in sentencing remain after accounting for legal factors (offense-related traits such as severity of the crime, as well as recidivism) and other, extralegal factors (such as age and education).
Race-Specific Regression Models
The above description covers a great deal of early research on discrimination that served as the basis for further work on measuring discrimination and explaining racial differences in a variety of social, political, and economic outcomes.1 For the past 30 years, however, researchers have used a more general statistical model of such differences (or gaps) that allows for the possibility that the slope coefficients β differ between groups (e.g., an interaction between race and education; see Blinder, 1973; Duncan, 1968; Oaxaca, 1973). Suppose that Ywi and Ybi are determined by the equations
(7.4)
(7.5)
where βw and βb are defined so that E(uwi | Xwi) = 0 and E(ubi | Xbi) = 0. Consequently, the means of Yi for whites and blacks are
1
The exposition in this section is based on Altonji and Blank (1999).

OCR for page 118

Measuring Racial Discrimination
and
respectively.
The difference in the mean of the outcome can be written as
(7.6)
The first term in this decomposition is the portion of the total gap that is explained by average differences in characteristics of whites and blacks using the coefficients for whites (βw) as the weights. In other words, it is the portion of the gap in Y that would be eliminated if the gap in X were closed and if the dependence of Y on X were the same for blacks and whites. The second term is the “unexplained” part of the gap in Y; that is, the difference that arises because the relationship between characteristics and outcomes, as summarized by the regression parameters including the difference in (β0w–β0b) in intercept terms, differs between groups. Blau and Beller (1992) offer a good example of such a decomposition (see Box 7-1).
Alternatively, the average outcome difference can be decomposed as
(7.6′)
This alternative decomposition uses the coefficients from the model for blacks to determine the consequences for Yw –Yb of the group differences in X and uses the mean of X for whites to determine the consequences of the difference (βw –βb) in the slopes. The first term is the portion of the total gap that would be eliminated if the gap in X were closed and if the dependence of Y on X were the same for whites and blacks. The second term is the “unexplained” portion of the gap in Y. This second decomposition sometimes produces quite different results from those produced by the first. Many authors report both results or (occasionally) the average of the two (see Oaxaca and Ransom, 1999, for references).
Interpreting the Decomposition
The share of the total difference due to the second component in equation (7.6) is sometimes referred to as the “share due to discrimination.” This is misleading terminology, however, because if any important control variables are omitted, one or more of the β coefficients, including the intercept, will be affected. The second component therefore captures both the

OCR for page 118

Measuring Racial Discrimination
BOX 7-1
Use of Regression Models to Decompose Racial Differences
A study by Blau and Beller (1992) is a good example of the use of regression models to decompose differences between groups. They estimate forms of equations (7.4) and (7.5) for black and white men and women in various experience categories, where experience refers to number of years since leaving school. They estimate two sets of regressions. The first uses the logarithm of earnings as the dependent variable and includes measures of education, potential labor market experience and its square, the natural log of annual weeks worked, dummy variables for part-time work status, veteran status (in the case of males), marital status, and dummy variables for three regions and urban residence. The regressions for women include controls for the number of children. The authors also report decompositions based on a second set of regressions that adds a list of dummy variables for major occupation category and for employment in the government sector to the regressions.
Blau and Beller (1992:Table 3) report results of their decompositions using the coefficients from the regression model estimated on the white sample. They present separate results for 1971, 1981, and 1988 and use the results for these three periods to investigate changes between 1971 and 1981 and 1981 and 1988. When occupation dummies are excluded, the 1971 results for males with 10 to 19 years of potential experience estimate that 0.209 of the total gap of 0.452 in the log of earnings is due to differences in the means of the observed characteristics that determine earnings. Blau and Beller report that, of the part of the gap that is due to differences in observed characteristics, 0.096 is due to differences in the means of education and 0.061 is due to differences in the means of the variables that measure work hours. They find that 0.243 of the total gap of 0.452 is “unexplained” and reflects differences in the intercepts and slope coefficients on the variables in the earnings model. The results for 1988 show an increase in the total gap to 0.505. The explained gap rises to 0.331, primarily because of an increase in the portion of the gap associated with hours worked during the year, while the unexplained gap actually falls.
effects of discrimination and the unobserved group differences in factors that would be expected to determine Y in the absence of discrimination. If there is a gap in favor of whites (blacks) in most of the omitted variables that boost Y, the second component will tend to overestimate (underestimate) the effects of discrimination. On the other hand, omitted variables that are correlated with X will influence the coefficients β, potentially caus-

OCR for page 118

Measuring Racial Discrimination
ing the “unexplained” portion of the gap to be either an overestimate or an underestimate of discrimination. Finally, the inclusion in X of variables that are themselves an outcome in a particular domain, such as occupation or position within a firm in a study of earnings differences, may cause the second component to underestimate or less often overestimate discrimination.
It is also misleading to label only the second component as the result of discrimination. This is the case because discriminatory barriers in the labor market and elsewhere in the economy can affect the X variables, which are the characteristics of individuals that matter in the labor market. We discuss this point more below and in Chapter 11.
Although one must be mindful of the limits of what can be learned from equation (7.6), it is nevertheless a simple and powerful way to summarize information on some of the factors that underlie group differences. The decomposition analysis can be extended to study change in group differences over time (see Box 7-2).
Two Pitfalls in Statistical Decomposition
Many researchers further decompose the “explained” gap into the contribution of subgroups of variables. For example, suppose that X contains sets of indicators for region of the country, for city size, and for educational attainment (less than high school, some high school, high school, some college, college, and some graduate school). One would like to know the contribution of each of these sets of indicators to the explained and unexplained portions of the gap. The contributions of each set of variables to the explained gap are identified and can be estimated separately. The problem, however, is that the contributions of the individual variables to the unexplained gap are not identified separately and depend on the choice of the reference category for each variable. That is, one cannot distinguish the contribution to the overall unexplained gap of racial differences in the coefficients on region of the country from the contribution of racial differences in the coefficients on city size. See Jones (1983) and especially Oaxaca and Ransom (1999) for a more extended discussion of this issue and citations of a number of studies that have included such detailed decompositions.
When the relationship between Y and the X variables is highly nonlinear and the racial difference in the distribution of X is large, a lack of overlap between the black and white distributions in the X variables may make it difficult or impossible to estimate the decompositions of equations (7.6) or (7.6′) reliably. This problem may not be obvious if researchers use functional form specifications of equations (7.4) and (7.5) that are not sensitive to potential nonlinearities. Barsky et al. (2002) and Altonji and Doraszelski (2002) investigate the problem posed by lack of overlap in black

OCR for page 118

Measuring Racial Discrimination
BOX 7-2
Statistical Decompositions over Time
One way to analyze the sources of change over time in the outcomes of various groups is to differentiate between periods. The simplest way to proceed is to perform decompositions in two different years and compute the change in the “explained” component [(Xwt – Xbt) βwt] and where we introduce t as a time subscript to make explicit the fact that the the change in the “unexplained” component [(β0wt – β0bt) + (βwt – βbt)Xbt], equations refer to a particular year. Blau and Beller (1992) and many other studies do this. However, the change in each of the two components combines the effects of changes in the race gap in characteristics and in the race gap in coefficients.
A more detailed decomposition of change over time can be obtained as follows. Let the operator Δ represent the average difference between members of group 1 and group 2 in a particular year. For concreteness, let the outcome Y denote the wage rate. The change in wage differentials between time periods t′ and t can be expressed as
The first term on the right-hand side of the equation represents the contribution of the relative changes over time in the observed characteristics of the two groups to the change between t and t′ in the wage gap. The second term is the effect of changes over time in the coefficients for group 1, holding differences in observed characteristics fixed. These first and second terms’ two factors capture the change over time in the explained portion of the wage gap that would be expected given changes in the characteristics of the two groups and the coefficients on those characteristics for whites in periods t and t′.
The third and fourth terms capture the change in the unexplained component of the gap, (βwt – βbt) Xbt in equation (7.4). The third term is the effect of changes over time in the gap in the coefficients between the two groups. The fourth term accounts for the fact that changes over time in the characteristics of group 2 alter the consequences of differences in

OCR for page 118

Measuring Racial Discrimination
group coefficients (βwt – βbt). Researchers typically compute each of these terms, as well as the subcomponents corresponding to individual elements of X and β.
A limitation of this decomposition is that it does not provide much insight into how the wage gap is affected by changes in the overall wage distribution, such as occurred over the 1980s when the returns to skill rose rapidly. Increases in the dispersion of wages will increase the gap between the mean wages of whites and blacks (given that whites are above the mean and blacks below), even if there is no change in the skill distributions of whites relative to blacks or in the level of discrimination. Juhn et al. (1991) and Card and Lemieux (1994, 1996) suggest ways to isolate the effect of a change in the dispersion of the unobservable wage components affecting both groups from a change in the location of the skill distribution of group 2 relative to group 1.
Altonji and Blank (1999) provide a detailed discussion of the methods used in these papers. A brief summary of Juhn et al.’s (1991) basic results indicates what one can learn from their type of analysis. Using data from the Current Population Survey, they find that, between 1979 and 1987, changes in levels of education and experience reduced the black–white wage gap (in logs) for men by 0.34 (black characteristics moved closer to white characteristics), whereas increases in the returns to education and experience increased the gap by 0.27. They find that 0.33 of the 0.34 unexplained widening in the wage gap can be attributed to changing wage inequality affecting both whites and blacks. In short, they find that relative wages of blacks declined because black men were disproportionately located at the lower end of an increasingly unequal wage distribution.
An alternative approach, used by Murnane et al. (1995), is to examine the sensitivity of estimates for 1978 and 1986 of the unexplained race gap in earnings to adding more detailed cognitive measures, such as test scores, to earnings regressions. They find a smaller race gap in the 1970s that is less sensitive to inclusion of test scores, particularly for males. This result is broadly consistent with the analysis in Juhn et al. (1991).*
*
For an alternative view of this evidence, see Darity and Mason (1998).

OCR for page 118

Measuring Racial Discrimination
and white income distributions in the context of studies of the black–white wealth gap, with differing conclusions. Barsky et al. (2002) standardize for the effects of income by reweighting the sample of whites to have the same income distribution as the sample of blacks.2
Summary: Decomposition and Residual “Effects” as Racial Discrimination
The use of multivariate regression and related techniques to decompose racial differences in some outcome of interest into a portion due to differences in the distribution of observed characteristics and a portion not explained by those characteristics is an essential tool for describing racial differences. The most informative studies use explanatory variables that both measure the most important determinants of the outcome under study and are likely to have different distributions by race. But the residual race differential may include not only any effect of discrimination but also the effect of other omitted factors that would generate different outcomes by race even in the absence of discrimination. Hence the unexplained gap may overestimate or underestimate the effects of discrimination.
INFERRING DISCRIMINATION FROM STATISTICAL ANALYSIS OF OBSERVATIONAL DATA
In this section, we discuss some of the more frequently encountered obstacles to causal inference in statistical studies of racial discrimination. As discussed in Chapters 5 and 6, it could be relatively easy to estimate the degree of discrimination if only it were possible to manipulate a person’s race. Except in very limited or special circumstances (e.g., an audit framework), race cannot be randomly assigned.3 Statistical models are widely used in observational studies in an attempt to replace the experimental control that could ensure an “all-else-equal” comparison. Again, the crucial problem that must be addressed to draw a causal inference from observa-
2
See also DiNardo et al. (1996). One can allow for arbitrary nonlinearity in the relationship between the outcome and X by first estimating a statistical model of the probability that a person is black as a function of X and then matching whites and blacks on the basis of this probability, which is called the “propensity score.” Nopo (2002) uses a nonparametric matching technique to decompose the gender gap in wages in Peru. Black et al. (2002) use a nonparametric matching technique to estimate the fraction of the earnings differences among college-educated white, black, Hispanic, and gay men that is explained by age, specific college major and degrees, English-language proficiency, family background, and region of birth.
3
Even in an audit framework, random assignment is not simple. One of the key controversies in audit studies is the extent to which the designs can approach the classic random assignment paradigm (Heckman, 1998). See the discussion in Chapter 6.

OCR for page 118

Measuring Racial Discrimination
change. The obvious limitation of this work is that it is dangerous to draw broad conclusions about discrimination in hiring from the orchestra case.
Yet another source of variation in employment policy to examine could be changes in wages in an industry in response to competitive pressure. The idea is that prejudiced firms may indulge biases when economic rents (excess profits) are available. Competitive pressure may reduce the rents, forcing firms to either reduce discrimination (hire the best people) or go out of business. This situation would lead to a reduction in α, the weight placed on race.
Black and Strahan (2001) provide one of the cleanest of these studies, and although they focused on gender discrimination, the idea can be applied elsewhere. They exploited the fact that regulations constraining entry by banks into new markets were relaxed beginning in the mid-1970s. Using data from the mid-1970s through 1997, they found that following deregulation the average wages of bank employees declined relative to the wages of nonbank employees. The authors used multivariate regression models to implement a triple-differencing strategy to distinguish the effect of deregulation from fixed characteristics of states and wage and employment trends at the state level that happen to be correlated with deregulation. The strategy amounts to taking the difference between the growth in wages of bank and nonbank employees in states that undergo deregulation at a certain point in time and comparing it with the corresponding difference in wage growth rates for bank and nonbank employees in states that did not undergo deregulation at that point in time. Black and Strahan show that deregulation led to a decline in the gap between the wages of men and women for two reasons: First, women moved into higher-skill occupations; second, the wages of men fell more than the wages of women in a given occupation. This evidence is consistent with some models of gender discrimination.
Health care. Chay and Greenstone (2000) examined trends in black–white infant health outcomes between 1955 and 1975. The authors fit simple trend-break regression models to vital statistics data for blacks and whites in rural and urban areas of different states. They used a time trend variable to measure the average trend in infant mortality rates across states (1955–1965) and an indicator variable to measure the change in the infant mortality trend after 1965. They controlled for differences across states (by race and rural versus urban area) that might be correlated with infant mortality.
The regression results showed a significant trend break in health outcomes for black and white infants after 1965, although improvements were more pronounced for blacks. The authors note that before 1965 black infant mortality rates were high relative to whites. Between 1965 and 1975, however, there was evidence of a sharp decline in black infant mortality

OCR for page 118

Measuring Racial Discrimination
rates and convergence of these rates after 1965 (particularly in the rural South). Chay and Greenstone suggest that the implementation of two federal interventions—Title VI of the 1964 Civil Rights Act (prohibiting discrimination and segregation in access to care) and the Maternal and Child Health Services Program under Title V of the 1935 Social Security Act9—could explain the convergence of black–white infant mortality rates after 1965.
Because the trend-break patterns showed similar improvements for whites across all regions after 1965, it is possible that other causal factors along with race might explain the post-1965 changes. The authors also report a strong correlation between “differential convergence in infant mortality rates” and “differential convergence in black–white hospitalization rates across states” (2000:330). Thus, the federal interventions, and possibly other factors, played an important role in the changes in relative infant mortality rates.
Education. Holzer and Ludwig (2003) provide some examples of natural experiments in the education domain that can be used in research to examine the effect of racial differences in educational inputs on relative outcomes. One type of natural experiment in the education domain looks at discriminatory educational policies and practices and assesses their effects on education outcomes. Examples are studies looking at the adverse effects of “separate but equal” laws on the educational attainment of blacks prior to the ruling in Brown v. Board of Education (Boozer et al., 1992; Donohue et al., 2002; Margo, 1990) and studies of the effects of school desegregation orders implemented after the Brown ruling (Guryan, 2001). Such experiments can be useful for measuring discriminatory practices in education but are difficult to apply in this domain. Holzer and Ludwig (2003:1167) offer this perspective:
Evaluating how these natural experiments change the allocation of educational inputs across or within schools may help highlight the degree to which racial discrimination affected educational decisions in the past. One limitation with this approach is that social scientists are limited to either detecting discrimination within a given jurisdiction retrospectively rather than prospectively, or must extrapolate from evidence of past discrimination in one jurisdiction to other areas where policy makers seek guidance on future enforcement or policy actions.
9
The Maternal and Infant Care component, expanded in 1963 and 1965, was established to improve the health of mothers and infants in low-income and rural families.

OCR for page 118

Measuring Racial Discrimination
Another type of natural experiment focuses on the general relationship between educational inputs and outputs. For instance, one such experiment might examine the effects of a policy change in tracking or ability grouping on student outcomes. Differences in student outcomes within one school before and after the policy change could be compared with outcomes in another school that did not experience a policy change to determine whether discrimination played a role (see Holzer and Ludwig, 2003, for further details). Holzer and Ludwig conclude that natural experiments are valuable tools for determining whether observed racial differences in inputs constitute racial discrimination and for measuring the effects of such differences.
Limitations of Natural Experiments
In the context of the study of discrimination, as well as in other arenas, natural experiments have limitations. First, the change under study may be endogenous. That is, it may be a reaction to particular circumstances that warranted a policy change or intervention. As a result, one may not be able to generalize from the results of a study to estimate the average amount of discrimination prior to the change. For example, suppose one is trying to measure discrimination by comparing hiring rates in a particular firm before and after an intervention by the Equal Employment Opportunity Commission (EEOC) with those of firms in the same industry around the same time period. Assuming that the EEOC responds to the most serious cases, the estimated effect would tend to overstate the amount of discrimination in the industry at large prior to the intervention.
Second, the effects of policy interventions may spill over into the control groups used in the study. For example, the effects of heightened EEOC activity involving a particular set of employers in a given industry might influence the behavior of other firms and industries even though they have not been targeted. This phenomenon would reduce estimates of the effect of EEOC enforcement based on a “differences-in-differences” design.
Third, differences in trends in other factors that affect outcomes cannot always be addressed adequately even in differences-in-differences designs, particularly when the policy intervention takes place over a period of time, as is the case with civil rights policy.
Fourth, a change in one domain, such as school desegregation orders, may be accompanied by changes in another domain, such as housing, or by a change in attitudes. Consequently, it may be difficult to use a change in policy in one domain to identify the amount of discrimination in that domain prior to the change.
Fifth, only in rare circumstances (such as the Goldin and Rouse orchestra study) can one be sure that the change in policy under study has eliminated a role for discrimination in the decision under study. In most cases,

OCR for page 118

Measuring Racial Discrimination
the best one can hope for is that a comparison of groups affected by the change in policy will identify the reduction in discrimination induced by the policy (the change in α), rather than the level of discrimination that existed prior to the change.
Sixth, in some cases, changes in policy that lead to positive effects in one dimension may induce negative effects in others. A major concern in the literature on the effects of antidiscrimination policy in the labor market, for example, is that positive effects on wage rates for blacks have been offset in part by negative effects on employment (see Altonji and Blank, 1999, for discussion and references).
Finally, natural variation in the data may be insufficient to identify the effects of interest or may be correlated with other, unmeasured factors that may bias the results. (See Holzer and Ludwig, 2003, on the use of natural experiments to study discrimination; see Shadish et al., 2002, and Meyer, 1995, for a general discussion of the strengths and weaknesses of these designs.)
Summary of Possible Solutions to Problems of Using Statistical Models to Infer Discrimination
It should be obvious that more accurate and complete data collection efforts are critical to reducing the key problem of omitted variables bias. Of course, the data needed must pertain to the particular domain of analysis. Data on performance (e.g., productivity in the hiring context, default rates in the lending context) and detailed knowledge of how an outcome depends on performance can solve the problem of omitted variables bias in some cases. However, situations in which the researcher will possess the data and detailed knowledge needed to support specification of an appropriate model are relatively scarce, at least in the labor market setting.
Matching and propensity score methods are useful as a means of relaxing assumptions about the functional form relating the variables X1 and X2 to productivity and to hiring decisions. However, they do not solve the omitted variables bias problem.
Panel data are useful as a way of identifying differences in the amount of discrimination across types of institutions, regions, or time. However, this approach requires the assumption that time-varying unobserved characteristics of the individual are not related to mobility, which is a strong assumption.
Natural experiments in which a legal change or some other change forces a reduction in or the complete elimination of discrimination for some groups provides leverage in assessing the importance of discrimination prior to the change and for groups not affected by the change.

OCR for page 118

Measuring Racial Discrimination
ADDITIONAL ISSUES
Thus far we have discussed prospects and problems for measuring discriminatory treatment of persons who are identical except for race. In the context of our hiring example, the parameter α measures discriminatory treatment of blacks and whites with the same values of X1 and X2, the variables known to a firm to determine productivity. The model developed above, however, also sheds light on other discriminatory processes.
Effects of Past Labor Market Discrimination on Factors in Hiring
Discrimination by a firm or elsewhere in the labor market may influence some of the elements of X1 and X2, the (nonracial) characteristics used by the firm to make hiring decisions. For example, some studies of hiring are based on whether the person was referred by an existing employee (Fernandez and Weinberg, 1997; Fernandez et al., 2000; Petersen et al., 2000). Current labor market discrimination against minorities by the firm will lead to a discrepancy in the probability that minority applicants will know people who work in the firm because personal networks tend to run along racial lines. Even if the use of referrals in hiring is justified by productivity considerations, the total effect of current discrimination will be understated if one holds constant whether a person was referred to the firm. In particular, the parameter α will understate current discrimination.
Alternatively, suppose that in the past a firm discriminated against disadvantaged racial groups but no longer does. Continuing with the referral example, again suppose that the firm makes use of referrals in hiring decisions. If the researcher is simply interested in whether the firm treats applicants with a given set of characteristics differently at the present time, and if the researcher observes all the variables the firm uses to assess productivity (there are no X2 variables that the researcher does not observe), the researcher will draw the correct conclusion of no such differential treatment. In this case, α will be zero. However, if the researcher wants to know the total effect of both past and current discrimination on the part of the firm on the racial composition of current hires, it is incorrect to take as given whether employees were referred. The reason is that past discrimination led disadvantaged racial groups to be underrepresented among the pool of potential referrers, thus reducing the chances of attracting disadvantaged racial groups through referrals. To measure the effect of both past and current discrimination on current outcomes in this dynamic context, the researcher must model the effect of past discrimination on current X variables.
To give another example, it is standard practice for many types of jobs—and in many situations defensible from a productivity standpoint—

OCR for page 118

Measuring Racial Discrimination
to consider past work experience when trying to predict productivity. However, past experience will be influenced by discrimination in the labor market. Consequently, the coefficient on R will provide an estimate of the effect of discrimination on the firm’s behavior, given X1 and X2. But because discrimination in the labor market leads to a racial gap in the experience-related components of X1 and X2, the coefficient on R will understate the total effect of all discrimination that has taken place in the labor market.
Furthermore, discrimination in the labor market may influence the choices of X1 and X2 that people make before they enter the labor market. For example, if African Americans know they are discriminated against for white-collar positions and college has little value in the blue-collar world, they will have less incentive to pursue a college education. If one uses a model such as equation (7.13) to measure labor market discrimination against blacks in white-collar positions holding education (one of the X1 variables) constant, one will underestimate the total effect of discrimination on the racial composition of such jobs. Similarly, if firms develop a reputation of having a hostile work environment for racially disadvantaged groups and if such applicants avoid seeking employment at those firms, a model such as equation (7.13) will underestimate the total effect of discrimination in hiring decisions. This will be the case even if the researcher observes all of the variables used by firms to choose employees. Developing measures of the discrimination that results from a process such as that described above is extremely challenging because of the much longer timeline and more complex environment that must be accounted for to reach statistically valid “all else being equal” conclusions. We address these issues in more detail in Chapter 11.
Effects of Discrimination in Other Domains
To measure the total effect of discrimination in society on a particular outcome, such as the odds of getting hired, one needs to measure the effects of discrimination in other domains on elements of X1 and X2 that are determined outside of the labor market (see Chapter 11). In our example of hiring, if there is racial discrimination in “pre-market factors” (Neal and Johnson, 1996), such as education, that are related to labor market success, discrimination in the educational sphere will also affect labor market success indirectly. Thus controlling for education in a hiring equation is reasonable in assessing whether a particular employer is discriminating.
However, if there is racial discrimination in the educational domain, controlling education will understate the total effect of all racial discrimination in analyses of labor market discrimination alone. Developing and validating statistical models of these broader processes is one way to gain insight into the presence or absence of discrimination in these other areas.

OCR for page 118

Measuring Racial Discrimination
One’s choice of control variables is influenced by whether one is trying to measure discrimination in a specific domain or the cumulative impact of discrimination.
Other Discriminatory Effects on the Productivity Equation
Another issue concerns whether discrimination on the part of customers, coworkers, or suppliers leads characteristics of the worker, including race, to enter the productivity equation (7.9).10 Consider Becker’s (1957) theory of customer discrimination, and consider sales positions. Suppose that white customers prefer to buy from white salespeople, and black or Latino customers are indifferent. In such a world, P is influenced by the match between the race (or ethnicity) of the job candidate and the racial composition of the customers. R will not appear directly in the equation for productivity, but the interaction between R and the racial composition of the customer base will. If the firm obeys the law, it will not apply the interaction variable in making decisions about hiring, and the interaction variable will not enter significantly into hiring decisions. (The interaction will show up in a productivity regression.) If the firm disobeys the law, the interaction term will influence hiring and show up in a hiring regression. One will then conclude correctly that firms discriminate for or against black or Latino salespeople as a function of the customer base. If one excludes the interaction term but adds R to the hiring equation, one will likely find evidence that the firm discriminates against minorities if most of the markets for which the firm is hiring happen to be heavily white. But one will not detect the fact that the nature of the discrimination is related to the match between customers and the sales agent. If there are data that can be used to estimate the effect of the interaction between race and customer composition on productivity, one can see whether hiring decisions appear to reflect such considerations. A number of studies of professional sports take this approach (see Altonji and Blank, 1999, and Kahn, 1991, for examples).
A somewhat different example involves the possibility that discrimination in social institutions that are extraneous to the firm or the labor market influences the form of the productivity equation. Consider again a marketing position. Suppose that social connections play a critical role in marketing. In such a world, sales productivity may well depend on club memberships, where one lives, the schools one attended, and the like. Variables measuring such social connections belong in X1 and X2. R may have no relationship to productivity or to hiring decisions if one conditions on
10
Throughout this section we are defining productivity to be the effect of an employee on the profitability of a firm; we are excluding societal objectives.

OCR for page 118

Measuring Racial Discrimination
these variables. Now suppose that societal discrimination (including housing discrimination) influences social connections. In this case, discrimination outside the labor market will lead to a race gap in some elements of X1 or X2 or both, as well as in hiring, even though the variable R will not have an independent effect on hiring conditional on X1 and X2. Finally, note that the recruiting strategies chosen by the firm are likely to influence the importance of social networks. Strategies that place more emphasis on personal contacts and less on advertising may not be race neutral. A discriminating firm may consciously choose a recruiting strategy in which social networks are important and then exclude minorities who lack them.11 It will be difficult to determine whether the firm’s recruitment strategy is really the profit-maximizing one or in fact is shaped at least in part by the goal of discrimination.
SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS
The main purpose of this chapter has been to review the strengths and limitations of various approaches to dealing with the challenges of measuring discrimination with statistical methods using observational data. This review leads to several conclusions.
Our first conclusion relates to the uses and limitations of statistical decomposition of gaps in outcomes among racial groups:
Conclusion: The statistical decomposition of racial gaps in social outcomes using multivariate regression and related techniques is a valuable tool for understanding the sources of racial differences. However, such decompositions using data sets with limited numbers of explanatory variables, such as the Current Population Survey or the decennial census, do not accurately measure the portion of those differences that is due to current discrimination. Matching and related techniques provide a useful alternative to race gap decompositions based on multivariate regression in some circumstances.
More generally, we will often be hampered in our ability to infer discriminatory behavior on the basis of regression decompositions because we can never be sure we have included all the relevant controls in the model. We must be able to control for the relevant variables well enough to approximate closely the hypothetical counterfactual in which only race has been changed.
11
Rees and Shultz (1970) give an excellent example of a firm that used its recruitment methods to discriminate. They report that a Chicago-area steel mill advertised only in Polish-language newspapers in an effort to avoid African Americans.

OCR for page 118

Measuring Racial Discrimination
Our second conclusion follows naturally from the first:
Conclusion: Nationally representative data sets containing rich measures of the variables that are the most important determinants of social and economic outcomes—such as education, labor market success, and health status—can help in estimating and understanding the sources of racial differences in outcomes. Panel data may be particularly important and useful (see Chapter 11).
Not only must statistical models for estimating discrimination use appropriate data and methods, but they must also be based on as thorough as possible an understanding of the processes that underlie the behavior being studied. Otherwise, the models are likely to require strong assumptions that cannot be justified. More generally, the properties of the model used for analysis are crucial in assessing claims of statistical “proof” of discrimination. Researchers must provide sufficient information on their model to enable others to understand and make a judgment about whether the assumptions underlying the model have been met.
Conclusion: The use of statistical models, such as multiple regressions, to draw valid inferences about discriminatory behavior requires appropriate data and methods, coupled with a sufficient understanding of the process being studied to justify the necessary assumptions.
The specific model we developed in the context of the decision to hire a worker illustrates the role played by assumptions and theory in drawing causal influences based on observational data. It also sheds light on how omitted variables and sample selection biases affect our ability to draw conclusions about discrimination and helps make clear what forms of discrimination are measured and what forms are not.
Data on performance relevant to a particular domain, such as productivity in the labor market context or academic success in the educational arena, are extremely valuable in dealing with the problem of omitted variables bias, in permitting the testing of key assumptions of a statistical model, and in studying adverse impact discrimination (see Annex 7-1 below). Natural experiments, although they have limitations, provide another way to address the problems of omitted variables bias and limited knowledge of the decision processes of particular actors.
Conclusion: We see an important role for focused studies that target particular settings (e.g., a firm or a school), whereby it is possible to learn a great deal about how decisions at each stage in a process are made and to collect most of the information on which decisions are

OCR for page 118

Measuring Racial Discrimination
based. With such knowledge and data, it becomes much easier to specify an appropriate statistical model with which to estimate racial discrimination.
Conclusion: Despite limitations, natural experiments—in which a legal change or some other change forces a reduction in or the complete elimination of discrimination against some groups—can provide useful data for measuring discrimination prior to the change and for groups not affected by the change.
Recommendation 7.1. Public and private funding agencies should support focused studies of decision processes, such as the behavior of firms in hiring, training, and promoting employees. The results of such studies can guide the development of improved models and data for statistical analysis of differential outcomes for racial and ethnic groups in employment and other areas.
Recommendation 7.2. Public agencies should assist in the evaluation of natural experiments by collecting data that can be used to evaluate the effect of antidiscrimination policy changes on groups covered by the changes, as well as groups not covered.
ANNEX 7-1: DETECTING ADVERSE IMPACT DISCRIMINATION
We discuss here ways to detect adverse impact discrimination; that is, discrimination by using factors that correlate with race. A firm may not use race directly, but it may weight variables in hiring decisions in a way that is not proportionate to their influence on productivity. For example, suppose the firm uses
as its productivity rating rather than the correct index
and hires accordingly. In this case, y will be determined by
(A7.1)
where α′ is (1 –α) as before. It is quite possible for α to be 0 even though the firm’s hiring rule has an adverse impact on R that is not justified by

OCR for page 118

Measuring Racial Discrimination
productivity considerations. That is, α can be zero even though R is systematically related to the difference between the index Pf and the unbiased productivity index E(P | X1,X2). We define this as adverse impact discrimination. The legal requirement that firms validate hiring criteria having an adverse impact on protected classes of workers is designed to prevent this form of discrimination.
In general, it will be difficult to detect that the firm is behaving in accordance with equation (A7.1) without information on P. Suppose, however, that the researcher has an unbiased indicator P* of P as well as data on X1 but not X2. Then the researcher can estimate the coefficients θ1 of the conditional expectation
If firms are hiring on the basis of expected productivity given X1 and X2, then E(y | X1) = E(y | X1θ1). Consequently, one can test the null hypothesis that firms are hiring on the basis of expected productivity given X1 and X2 by testing the restriction that
One can test this restriction by regressing y on X1θ1 and X1 (with one element of X1 excluded because of collinearity) and testing the null hypothesis that the elements of X1 have no effect on y, holding X1θ1 constant. From a regression of E(y | X1) on R and X1θ1, one can estimate the race gap for workers with a given value of X1 that is due to the firm’s policy. Without special assumptions, however, one cannot estimate the effect on group R of the firm’s misuse of X1 and X2 without having data on both variables. Unfortunately, even a noisy indicator of productivity is unavailable in most of the data sets used to study racial differences.