Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Discrimination in the Criminal Justice System A Critical Appraisal of the Literature Steven Klepper, Daniel Nagin, and Luke-lon Tierney INTRODUCTION . Discrimination in the criminal justice system is an issue of substantial social concern. The discretionary of the principal actors--the police, prosecutors, judges--are considerable and allow ample latitude unfair treatment of persons of a specific race or powers and for social background. A large empirical literature has emerged concerning the extent of discrimination in the criminal justice process. These studies examine separately or in combination the effect of race or social class on the likelihood of arrest, prosecution, bail, conviction, and the type and severity of sentence. The findings of the studies are by no means consistent. Some find evidence of discrimination while others do not. In this paper we argue that there are major flaws in the literature we have reviewed that limit its usefulness for making inferences about the extent of discrimination in the criminal justice system. We also suggest research strategies to remedy these weaknesses. Our critique and suggestions are prompted by a review of 10 papers, chosen by the panel on the basis of their salience in the literature and their quality, as well as a number of additional papers. While our paper is based on a review of a small sample of studies, we are confident that our conclusions apply generally to the larger literature. First, to some degree our criticisms apply to all of the studies 55

56 reviewed, which makes it unlikely that they do not apply to the larger literature. Second, implementation of several of our recommendations requires the use of statistical methods that have only recently been developed and are not yet widely employed. Third, implementation of these statistical procedures requires the use of modeling approaches that have not been widely adopted in the criminological and sociological literature. Our review suggests three major remediable flaws in the literature: (1) The Absence of Formal Models of Processing Decisions in the Criminal Justice System Case dispo- sition--whether it is dismissal, acquittal, conviction, or sentencing--is the consequence of the interplay of a diverse set of actors, each with individual objectives. Even if a disposition does not directly involve one of these actors, expectations about their actions if they were to become involved may affect decisions. For example, a prosecutor may choose to dismiss a case based on the expectation that a judge will do the same if the case is prosecuted. Similarly, a defendant may choose to accept a plea bargain on the basis of an expectation of the likelihood of conviction at a jury trial and the sentence if convicted. In order to model decisions at each stage of the criminal justice system, a theory of the important decision criteria of each of the major actors and their interaction is required. Without such a theory, estimating equations are likely to be misspecified, which in turn is likely to result in serious biases in the estimated effects of included variables and an inability to discern the effects of more subtle influences. The latter is particularly pertinent to measuring the effects of social class and race because their influence, while possibly of sufficient magnitude to warrant concern, are probably less important in affecting disposition than clearly legally relevant variables like case quality and the seriousness of the crime. Perhaps even more important is that without a well-structured theory, inferences about the role of social status and other factors at each processing stage may be extremely misleading. For example, an observation that social status affects sentences in negotiated pleas may not reflect prosecutorial bias but rather the biases of judges or juries. We regard this point as crucial because implementation of policies to rectify any undesirable effects of race and social status on

57 disposition requires a knowledge of the stage(s) of the criminal justice system at which these factors are important. (2) Sample Selection Biases Resulting from Screening and Processing Decisions The criminal justice system has been likened to a leaky sieve. ~ A ~ In Washington, D.C., for example, of every 100 felony arrests only 13 result in felony convictions. Of the remaining 87, 16 result in misdemeanor convictions. Nearly all the rest are rejected for further processing at an initial screening or subsequently dismissed by a prosecutor, judge, or grand jury. Of those convicted only about 32 percent are incarcerated (Forst et al., 1977). Thus, cases that reach the sentencing stage are a very select group that typically represent only a small proportion of the population of "similar" cases (e.g., same arrest charges) that originally entered the system. Moreover, even those cases entering the system via an arrest are themselves a selected sample of crimes. In most major metropolitan areas, clearance rates (crimes solved by the police, typically by the arrest of a suspect) hover around 20 percent. This low clearance rate principally reflects the absence of any suspect but is also affected by the exercise of arrest discretion by the police. By the very nature of the system, analyses of the determinants of sentence must be executed on a selected sample of cases, namely those that have resulted in conviction. Since the selection process is by no means random. it mav induce serious biases in parameter estimates of included variables. Such biases may, for example, result in an inappropriate conclusion that racial considerations influence sentencing decisions when in fact they do not. Recently developed econometric procedures can be employed in some circumstances to cope with the biases induced by sample selection. (3) Use of Arbitrary Scales to Measure Qualitatively Different Dispositions dismissal or acquittal. sentences include fines, probation, and prison or some combination of these at a specified amount or duration. Many of the papers we reviewed employ arbitrary rules for measuring these qualitatively different outcomes. The index that results serves as the basis (e.g., the dependent variable in a regression model) of the correlates of "severity of outcome." While the scales that are applied are not patently unreasonable, serious questions remain about the degree to which A case may be disposed by For convictions, possible for an analysis

58 findings are simply an artifact of an artificial scale. We are particularly concerned that use of these arbitrary scales may conceal the importance of subtle influences that could be measured if such scales were not used. ORGANI ZATI ON OF THE PARE R While the approaches we suggest for coping with these problems will improve the quality of statistical inference about discrimination, we are under no illusion that their adoption will yield definitive results. The combination of our relative ignorance about the factors determining case-processing decisions and the problems of using nonexperimental data ensure that definitive findings will not be forthcoming soon. In response to the inherent limitation of studies based on nonex- perimental data, we have included a section on the use of experiments to measure discrimination in the criminal justice system. This section discusses the limitation of experiments, approaches for minimizing these limitations, and strategies for combining experimental and nonexperi- mental data. The paper is organized as follows. We begin with a review of statistical issues that arise in the analysis of binary data. Next we discuss the so-called sample selection phenomenon and elaborate on its effects. We then develop a model of the criminal justice system. Next we review selected studies in the context of the sample selection phenomenon and the model developed. We then discuss alternative models of the sentencing decision that do not require the use of arbitrary severity indices. Next we discuss experimental approaches to measuring discrimination. We conclude with a summary of our major points. THE ANALYST S OF BINARY VARIABLES Many decisions in the criminal justice system involve binary outcomes, such as the prosecutor's choice to dismiss or prosecute a case or the jury's decision to find the defendant guilty or innocent. It is common practice to define a binary y such that y = 1 if one outcome (say, a verdict of guilty) occurs and y = 0 otherwise. In a number of the studies we reviewed, the relationship between the likelihood of the event y = 1

59 and a vector of variables x is examined by regressing y on x. The purpose of this section is to point out some hazards of this approach and to describe an alternative approach that we employ in subsequent sections. We begin with a discussion of the classical regression model. The model assumes that a random variable y can be related to a vector of variables x by Yi = Xi ~ + hi i = 1, . . . , N , (1) where column vectors are underlined, ~ is the K x 1 vector of regressors for the ith observation in a sample of size N. ~ is a K x 1 vector of unknown parameters, and si is the disturbance or error associated with the ith observation. The errors c1, . . . ~ eN are assumed to be independent with zero mean and common variance o2. The regressors ~ , . . . , ~ are often assumed to be nonstochastic, although they may also be assumed to be random variables that are independent of the errors ' e ~ ~ ~ £ N. These assumptions imply that the conditional distribution of y given x is such that E(yil_i) = Xi $_ and V(yilxi) = o2 . (2) (3) Equations (2) and (3) state that for each xi, the distribution Of Yi given xi is such that E(yilxi) is linear in xi and V(yilxi) is constant for all i. Under these assumptions, it is well known that ordinary least squares provide consistent and unbiased estimates of the coefficient vector $. The assumptions of the classical regression model are appropriate in cases in which the dependent variable has a large, approximately continuous range of possible values. However, in the case of a binary variable, many ~ . . . . . . _ . ~ IS a binary variable that takes on only the values of zero and one. Let p(x) equal the probability that y = 1 given x. Then it is easy to demonstrate that or tne assumptions are no longer tenable. suppose Y E(yilXi) = P(Xi) and (4)

60 Var(yil xi) = p(xi) [1 - p(Xi)] - (5) Equation (4) indicates that the conditional expectation of y given x is equal to the conditional probability that y equals one given x. If the range of the observed x values is very small, then it may be appropriate to approximate p(x) by x'0. In this case, ordinary least squares will consistently estimate $. (5) indicates that V(yilxi) is not the same for se' i. This implies that ordinary least-squares estimates are inefficient and the standard hypothesis tests are invalid. These problems can be corrected by using standard techniques for adjusting for heteroskedas- ticity. If, however, the range of the x values is large, then the fact that p(~ can take on only values between zero and one (it is a probability) implies that it cannot be approximated by a linear function of x.1 In this case ordinary least-squares estimates are consistent, and the inconsistency may be very severe. To illustrate this point, consider the linear probability model: However, equation _'D if O ~ x'D < 1_ _ _ _ p(x) = 0 if x'0 ~ 0 1 if x'D > 1 . (6) Figure 2-1 displays the form of p(x) for the case in which x is a scalar. Also shown is a set of hypothetical observations that might arise in which a number of the x-values fall outside the range where p(~ is linearly increasing. The dashed line depicts the model that would be estimated by ordinary least squares. By choosing enough observations with very high or very low x-values, the slope of the model estimated by ordinary least squares can be made arbitrarily small. This difficulty can be avoided by fitting the linear probability model specified in equation (6) using nonlinear least squares. However, this introduces severe computational problems. Therefore, it is useful to consider alternative models. Most models for binary variables that have been proposed in the statistical literature can be written as p(x) = F(x't) , where F is a continuous, nondecreasing function with F(-~) = 0 and F (be) = 1, i.e., F is a continuous distribution function. The choice of a particular form (7)

- ~ - / 61 it_ The Linear Probability Model ----OLS Regression Line x'; · Observed Data Point FIGURE 2-1 The Bias in Ordinary Least-Squares Estimation for F is usually somewhat arbitrary and should be taken in the same spirit as the assumptions of linearity and normally distributed errors in simple regression models. The most popular models of this form are the linear probability model discussed above, which is obtained by setting o F(z) = z if z < 0 if 0 < z < 1 if z > 1 , the PROBIT model, where F(z) is set equal to the cumulative standard normal distribution function, and the LOGIT model, with F(z) = eZ/(l + eZ). Models of this form can be motivated in a number of ways. We employ one such motivation repeatedly in the following sections. Let y* represent a latent, unobserved variable that can vary between plus and minus infinity. The latent variable y* is assumed to be related to x by the standard regression function Yi* = Xi '0 + Pi ~ (8) where ci is an unobserved disturbance with mean zero and constant variance o2. The dichotomous variable y is then assumed to be related to y* by Yi = 1 if Yi* > b 0 if Yi* < b , (9)

62 where b is an unknown cutoff level. this implies P(Yi = Taxi) P(Yi = 0lxi) Using equation (8), P(_i'6 + ~ i > b) p (£ i > b - _i'6) ~ 1 - p(£ i > b - Xi 0) If F(.) is the~distribution function of c, then this can be alternatively stated as P (Yi = 1' Xi) _ p (Xi) = F (b - Xi ~ ) P(Yi = 0Ixi) _ 1 - P(_i) = 1 - F(b - Xi B) (10a) . (lob) Note that equation (10) is in essentially the same form as equation (7). The unknown parameters of the model are the vector of coefficients 6, the variance o2, and the cutoff level b. It can be shown that neither o2 nor b can be estimated because they are not uniquely defined.2 But the coefficient vector ~ can be estimated directly from a sample of observations on the binary variable y and x. One widely employed estimation procedure is called maximum likelihood estimation. It has a number of desirable features for cases in which a relatively large sample of observations is available on (y, x). For the LOGIT and PROBIT models, specially designed computer algorithms for calculating the maximum likelihood estimator and its estimated standard errors in large samples are available. For a more complete discussion of estimation and other issues in binary variable models, see Goldberger (1964:248-251) and Cox (1970). To get a better idea of how this approach can be used to model events that occur in the criminal justice system, consider the example of a jury determining whether a defendant is guiltye The jury hears the evidence, which can be summarized in terms of various attributes of the case, such as the number of eyewitnesses, whether a weapon belonging to the defendant was recovered, etc. Suppose that the investigator can observe some of these attributes, perhaps from court records, and can quantify them in terms of a numerical vector x - (xl, x2, · , xK)'. Other attributes of the case, such as the credibility of the witnesses, are not recorded in court records and hence cannot be observed by the investigator. Let their composite influence be represented by s. The jury then might be viewed as computing an index y* = x'D + ~ measuring the

63 strength of the case against the defendant. The observable factors xl, x2, . . . , xK are given weights of 01, 62, . . . ~ ~K, respectively relative to the weight assigned to s. The jury then determines whether to convict the defendant on the strength of the evidence, . measured by y*, by comparing y* to a level b and Declaring the defendant guilty if y* > b and not guilty otherwise. The critical level b is presumed to be determined according to the interpretation of the notion "beyond a reasonable doubt." The statistical problem is to determine the factors the jury takes into account and their relative importance. Among other matters, the investigator might be interested in testing whether juries discriminate against certain types of defendants, in which case the personal characteristics of the defendant might be included in x. The problem facing the investigator is that he or she observes the vector of case attributes x and whether the defendant is convicted, but not y* and c. To estimate ~ using this information, the jury's decision process can be modeled as Ii = 1 if Yi* = Xi ~ + hi _ b 0 if Yi* = Xi ~ + ~ ~ b ~ where I is a binary variable that equals one for conviction and zero otherwise. The vector of coefficients ~ are the parameters of interest. This is in precisely the same form as equations (8) and (9). Hence the weights 61, 62, . . . ~ OK can be estimated directly using the approach discussed above. A similar approach is taken in the subsequent sections to model other decisions in the criminal justice system that involve binary outcomes. ~ . ~ [~ . . . SELECTION Selection Bias The criminal justice process can be thought of as a series of stages, each involving a different set of actors. The first stage involves the detection of a crime, followed by communication of the crime to the police, arrest, prosecution, trial, and sentencing. The literature indicates that the various actors involved at each stage make calculated decisions about the types of

64 crimes that are processed to the next stage. For example, studies of the prosecutor indicate that less serious crimes and those with weak evidence are more likely to be dismissed following arrest. These same two characteristics appear to influence the decision by the police to make an arrest, while the quality of the evidence certainly affects the likelihood that a jury will render a verdict of guilty and pass a case on to the sentencing stage. Other factors, such as the prior record and socioeconomic status of the criminal, also appear to play a role in some of the stages. As a result of deliberate actions of the various actors in the system, the crimes that reach each successive stage in the system after the first are not representative of the broader population of crimes. Samples used to study the various stages in the system are thus selected according to certain characteristics. This does not itself pose a problem for the investigator. A potential problem does arise, however, from the combination of the sample selection process and the fact that some of the features of a case that affect the way it is processed cannot be observed by the investigator. For example, prosecutors and judges may possess a great deal of qualitative evidence about a case that the investigator cannot observe from court records. In other instances, the investigator may not observe other, less qualitative types of evidence, such as whether the criminal used a weapon. The combination of screening and incomplete measurement implies that criminals reaching the later processing stages are not representative of the unobservable (to the investigator) as well as the observable features of the population of cases entering the system. This introduces the possibility of sample selection bias. The type of biases that may arise can be illustrated best with an example. Consider the sentencing of convicted criminals. Suppose that the various actors in the system discriminate against individuals with low socioeconomic status (SES) as well as individuals committing more serious crimes. (The latter form of "discrimination" may be socially desirable.) Then consider high-SES individuals who are convicted of a crime. Holding the effect of factors that are observable to the investigator constant, such individuals would ordinarily have a lower probability of reaching the sentencing stage (given the hypothetical assumption of

65 discrimination). If they have been convicted, then, holding constant the effect of the factors observable to the investigator, they must be unrepresentative concerning the factors unobservable to the investigator that contribute to reaching the sentencing stage. For example, they may have exhibited a greater degree of premeditation than low-SES individuals who have been convicted, or a greater fraction of them may have used a weapon than low-SES convicted criminals. (The degree of premeditation and weapon use are assumed to be unobservable to the investigator.) This may cause problems when the investigator tries to determine the factors influencing the sentencing decision. Suppose that SES does not affect sentencing, but seriousness of the crime does. By the above argument, if discrimination exists against low-SES individuals at the earlier stages of the criminal justice system, then, ceteris paribus, high-SES convicted criminals will be above average on both the observable and unobservable dimensions of the seriousness of a crime. Judges are assumed to observe both sets of factors and to take both into account when deciding on a sentence. The investigator, however, can observe only one set of dimensions. Even after taking account of the observable differences in the cases of highand low-SES criminals, the investigator will still find that high-SES criminals receive longer sentences. This will suggest that judges discriminate against high-SES individuals, even though there is no discrimination at the sentencing stage and there is discrimination against low-SES individuals at the stages preceding sentencing. More generally, this example points out that if there does exist discrimination against low-SES individuals at the sentencing stage, then the biases induced by sample selection might mask the true extent of the discrimination. It is conceivable that the biases might even create the illusion of reverse discrimination at the sentencing stage, when in reality discrimination against low-SES individuals is present at all stages. The biases induced by sample selection are of course more general than the example--they might occur at any stage in the system following the first screening stage, and they might distort the effect of any of the observable features of a crime. This suggests that it is essential to try to account for the effects of sample selection in order to make reliable inferences about the various processing stages in the criminal justice system.

66 Heckman (1979) has proposed a model of the selection process that provides some direction in controlling for selection bias. Heckman's model is composed of two components: Yi = Xi ~ ~ £ i and Ii ( 11) = 1 if Ii* = Zi ~ + Ui > 0 O if Ii* = Zi ~ + Ui < 0 , (12) where E [£ i] = E[ui] = 0 for all i, E[£i£ j] = oc = 0 E[uiuj] = flu = 0 and if i equals j if i does not equal j, if i equals j if i does not equal j, E[ui~j] = COV(U,£ ) if i equals j = 0 if i does not equal j. Equation (ll) is the regression equation of interest: x is a K x 1 vector of nonstochastic regressors, ~ is a K x 1 vector of coefficients, and £ iS an unobserved disturbance, where the subscript i denotes the ith observation in the sample. The problem is to estimate from a sample of observations that has been selected according to equation (12). The binary variable I indicates whether an observation is included in the sample, with I = ~ representing inclusion. The indicator I is generated by an unobserved latent variable I*, which is the sum of _' and u, where z is an L x 1 vector of nonstochastic variables, ~ is an L x 1 vector of unknown parameters, and u is an unobserved disturbance with zero mean and variance ou If I* > 0, then I = 1 and the observation is included in the sample; otherwise the observation is excluded. The key specification of the model involves the assumption that si and ui need not be uncorrelated, which reflects the idea that the same unobservable factors that affect the process of interest may also affect the probability that an observation will be included in the sample. To see how this model applies to the criminal justice literature, consider again the sentencing example. If

67 the regression equation relates the sentence received to a set of observed variables, then it can be estimated only for a sample of criminals who have been convicted. As noted earlier, this sample is likely to be selected according to certain characteristics. The specification of I in the example would then relate the probability of conviction to a set of observable characteristics z and a set of characteristics unobservable to the investigator, represented by the disturbance u. The literature suggests that some of the factors influencing whether a criminal act ultimately results in a conviction, such as the seriousness of the crime, will also affect the sentence length. This suggests that z and x will contain common variables. It also suggests that to the extent that some of the dimensions of the seriousness of a case cannot be observed by the investigator, ~ and u will be composed of similar factors. Hence it might be expected that the COV(£,U) would be positive. Thus Heckman's characterization of the selection problem naturally lends itself to modeling the selection that occurs in the criminal justice system. Heckman's model can be used to probe the biases that result from selection. Note, first, that if selection was not operative and a random sample of observations was available on y, then E(ylx,z) = x'0 and the regression of y on x would consistently estimate 6. However, since y is observed only when Ii = b, the expected value of y given x and z and Ii = 1 is E(yi~xi~zi, Ii = 1) = xi ~ + E (£ i - i, - i, i = Xi'6 + E(£ilUi- Zi of) where the last equality follows from the definition of Ii. Since the unconditional mean of £ is zero, it follows immediately that if £ and u are correlated then the last term in equation (13) will not equal zero. Defining G(r) = E[£lu > r], we can rewrite equation (13) as E[YilXi,Zi, Ii = 1] = xi'6 + G(-Zi'y (13) . (14) For most reasonable joint distributions for £ and u, including the normal, Go) will be monotonic in I, with the sign of dG(~)/df depending on the sign of COV(£,U) . This follows from the observation that for most joint

68 distributions, increasing £ i will increase the expected value of ci given ui > r if COV(£,U) > 0 and decrease it if cov(s,u) < 0. Equation (14) thus implies that the regression of y on x will not consistently estimate ~ unless x and G(-z'y) are uncorrelated. In many applications in the criminal justice area, it is untenable to expect x and G(-z'~) to be uncorrelated. For example, we noted earlier in the sentencing example that x and z are likely to contain common variables, which makes it extremely unlikely that x and G(-z'y) are uncorrelated. In other applications the presence of discrimination at multiple stages of the criminal justice system will also cause x and z to overlap. A few simplifying assumptions will help clarify the nature of the biases when x and z do overlap. First, suppose that £ i can be written as a linear function of Ui plus an independent disturbance: ~ i = [COV(£ ,u)/Var(u)]ui + Wi = (P~/0U)Ui + Wi ~ (15) where Wi is a random disturbance with zero mean and variance ow that is independent of ui and all other disturbance terms and p is the correlation coefficient between ~ and u.3 This representation follows immediately if ~ and u are assumed to be normally distributed. Second, assume that u is uniformly distributed over [-1/2, 1/2] and that Zity falls in this interval for all observations in our sample. Then E (£ i' Xi, Zi, Ii E(yi~xi,Zi, Ii v(yi'Xi,Zi, Ii = which implies that can be written as 1) = .5P ~PoSZi ) = X~'0 + ·5~P~2- ~P~cZi ~ ) = ~£ (1 _ p2) + p O£ (.5 + Zi ~ ) for observations for which Ii = 1, Yi Yi xi ~ + ·5~Po£ ~P~eZi ~ + n i ~(16) where E [n i] = 0, and Var(ni) = o~2(1 _ p2) + p2~£2(.5 + zi.~)2 Consider the important special case in which z is a subset of the regressors in x. Let x = (zlx*)' and ~ = (~°lD*)'. Then equation (16) can be rewritten as . (17)

69 Y i = ·5~Po~ + Zi ' (I - ~po I) + xi* '0 * + n i · ( 18) This implies that the regression of y on x will consistently estimate (~° - /- petal 0*) for the coefficients of the nonconstant regressors. Thus if xj z is one of the common regressors, then the regression of y on x will consistently estimate Dj - /-po£rj for the coefficient of x;. This coefficient is a composite of fj, the coefficient of xj in the regression equation, and hi, the coefficient of x; in the selection equation. Returning to the sentencing example, if discrimination on the basis of SES is present at all stages of the criminal justice system, including the sentencing stage, then SES will appear as a variable in both the regression and selection equations. The regression of y on x will then consistently estimate a coefficient for SES that equals Dj, its true effect in the sentencing equation, minus /-po~ times yj, the coefficient of SES in the selection equation. If discrimination is operative at all stages, then Dj < 0 and Yi ~ O. It was assumed that Hence the coefficient = p > 0, and o£ > 0 by definition. Of SES that is consistently estimated by the regression of y on x will be greater than or equal to Hi and will be positive if -~Ipo£rj > -hi. Thus this regression is biased toward understating the true extent of discrimination at the sentencing stage and may in fact suggest the presence of reverse discrimination at the sentencing stage if discrimination exists at the earlier stages. The magnitude of the bias depends on the amount of discrimination occurring at the earlier stages, as measured by hi, and the variation of the common , , unmeasured forces in the regression and selection equations, as measured by cov(e,u)/var(u). These results accord with the conclusions drawn earlier from the heuristic discussion of selection bias at the sentencing stage. These results carry over to the case in which z is not a subset of x but the variables in z that with those in _ are uncorrelated with x. If the nonoverlapping variables are correlated with x then the ordinary least-squares regression of y on x will be further biased by the omission of these relevant regressors. In conclusion, selection bias may distort the role that some variables play at, say, the sentencing stage, or it may suggest a role for some variables at the sentencing stage that is largely a reflection of the do not coincide

70 opposite role they play at an earlier stage. To the extent that selection bias is operative, it clearly makes inferences about where and how a factor plays a role in the criminal justice system more difficult. It is important to stress that selection bias does not arise from a correlation in the population of all crimes between the unobserved features of a crime that influence the way it is processed through the criminal justice system and those features that are observed. In the terminology of the Heckman model, we have assumed that x and ~ are uncorrelated. It is only in the selected population of crimes from which we sample that a correlation is induced between the observed and unobserved factors affecting the processing of a crime. It is this induced correlation that gives rise to sample selection bias. If ~ does contain factors that are correlated with x, then this may introduce additional problems; these are discussed in Garber et al. (in this volume). Selection Bias at Different Processing Stages of the Criminal Justice System Discrimination in sentencing can be manifested in many different ways. Some researchers have examined whether a greater fraction of arresters with certain character- istics are sent to prison. Others have focused on the severity of punishment received by convictees with various characteristics. Still others have examined whether individuals sentenced to prison receive longer sentences according to certain characteristics. Each of these studies involves a different population of crimes. Some sample from the population of crimes that lead to arrest, some sample from the crimes that lead to conviction, and others sample from the crimes that lead to imprisonment. Clearly, the sample of crimes that lead to imprisonment is the most selected sample, while the sample of crimes leading to arrest is the least selected. It is natural to ask whether the distortions induced by sample selection are greater for those studies using the more selected samples. This question can be analyzed in the context of the following simplified model: Yli = a + nisi + YlXi + Wli Y2i= a + nisi + Y2Xi + W2i (19) (20)

71 Y3i = ~ + DSi + Y3Xi + W3i ~ (21) where it is assumed that Y2i is observed only if Yli > 0 and Y3i is observed only if Yli > 0 and Y2i > 0. To make the model concrete, let Yli be an index measuring the likelihood of conviction for each crime i (that is, the probability that a crime will be reported, lead to arrest, prosecution, and conviction), Y2i an index measuring the likelihood of imprisonment given conviction, and Y3i an index of the severity of punishment given imprisonment. Selection arises in that Y2i is observed only if case i results in a conviction, while Y3i is observed only if case i leads to both conviction and imprisonment.4 Following the conventions used in the previous section, Yli > 0 corresponds to the case in which crime i leads to a conviction and Y2i _ 0 corresponds to the case in which a conviction leads to imprisonment. It is assumed that the sign Of Yli is observed but its value is unobserved. For simplicity of exposition, it is assumed that Y2i is observed for purposes of estimating equation (20) but that only the sign rather than the value of Y2i is observed for purposes of estimating equation (21). This corresponds to the conventions used in the previous section. The three equations describe respectively the determinants of conviction, imprisonment given conviction, and punishment given conviction and imprisonment. To isolate the biases induced by selection, each equation is assumed to be composed of the same four factors: a constant regressor, a regressor S. which measures the characteristic on which discrimination occurs, an unobservable variable x, which commonly enters each equation, and a disturbance, which represents the effects of all other unobservable factors specifically affecting each of the three stages. The presence of the unobserved factor x in each equation is what gives rise to a nonzero covariance between the unobserved variables entering each of the three equations. The disturbances w;, j = 1, 2, 3 are assumed to be independent of each other and all other variables in the model. In order to isolate the effects of selection, two assumptions are made. First, it is assumed that the constant and the coefficient of S. denoted by a and ~ respectively, are the same in equations (19), (20), and (21). Second, it is assumed that Var(~jx + wj) is the same for all j = 1, 2, 3. The assumption that the

72 coefficient of S and Var(yjx + wj) are the of the equations ensures that the fraction in yj attributable to S is the same in all equations. In turn, this ensures that the discrimination, which can be measured by ~ same in each of variation three effects of if the y . ~ (j = l, 2, 3) are observed, and by 0/Var(~jx + wj) z~ only the binary outcomes indicating whether the yj > 0 (j = l, 2, 3) are observed, are comparable across equations. The assumption that a is the same in each equation, coupled with the other assumptions, ensures that the standard for selection is the same in all three equations. The only things that can vary across equations are the yj and the Var(wj). They determine the size of the covariance between the common unobserved components of each of the equations as well as the fraction of variation in yjx + we that is attributable to x. J Selection bias occurs because Y2i and Y3i are observed in selected instances and the same unobservable factor affects each of Yl, Y2, Y3. We are interested in comparing the magnitude of the inconsistency in the estimate of the coefficient of S from the regression of Y2 on S versus the regression of y3 on S. The regression of Y2 on S will not consistently estimate ~ because in the population from which observations on Y2 and y3 are drawn, E(~2x + w21S, Yl > 0) and E(y3x + w31S, Yl > 0, Y2 > 0) are both not equal to zero. In particular, if we assume that S is binary, for example Si = l if the ith defendant is black and Si = 0 otherwise, then it is easy to show that these conditional expectations can be written as ~-- ~ 1 E[~2x + w21S, Yl > 0] = 62S and where and E[y3x + w31S, Yl > 0, Y2 > 0] 63S , 62 = E(Y2X + w21S = 1, Yl > 0) - E(y2x + w21S = 0, Yl > 0) ~ ~2 = E(y3x + w31S = l, Yl > 0, Y2 > 0) - E(y3x + w31S = 0, Yl > 0, Y2 > 0) (22) (23)

73 Following the logic in the previous subsection, it straightforward to demonstrate that 62 and 83 are the inconsistencies in the estimates of the coefficient of in the regressions Of Y2 on S and y3 on S respectively (in the selected populations). Thus we are interested in the relative values Of 82 and 83. Substituting for the conditions Yl > 0 and Y2 > 0 in equation (22) and exploiting the fact that the wj, j = l, 2, 3, are independent, we find that is S 02 Y2{E[X1YlX + wl > -(a + D)] - E[XIY1X + Wl > -a]} and 03 = y3{E[x1ylx + wl > -(a + 0), y2x + w; > -(a + 8)] - E[x1ylx + wl > -a, y2x + w2 > a]t These two expressions can be interpreted as follows. If ~ > 0 then equation (l9) indicates that Yli is larger for blacks (i.e., S = l for blacks) than for whites, ceteris paribus. Since Yli ~ O is a necessary and sufficient condition to pass to the imprisonment stage, this implies that it is harder for whites to pass to the next stage than for blacks. The whites who do pass to the next stage must on average have a larger value of x than the blacks who pass to the next stage. The difference between the expected value of x for the blacks who reach the imprisonment stage and the whites who reach the imprisonment stage is equal to 82, the inconsistency in the regression of Y2 on S using the selected sample. Similarly, 63 equals the difference between the expected value of x of the blacks who reach the punishment stage and the whites who reach the same stage. The question we wish to address is whether 83 is more negative than 02--i.e., whether the bias due to selection rises as the sample is repeatedly selected according to the same standard. It turns out that the answer to this question depends on the magnitude of the coefficients Yl, Y2, and y3. Consider the extreme case in which Yl = 0 and Y2, y3 > 0. Then it is easy to demonstrate that the regression of Y2 on S will consistently estimate 6, while the regression of y3 on S will consistently underestimate 6. Thus in this extreme case, 1031 > 1021 = 0. Since the biases are continuous functions of the yj's, this implies, more generally, that 1~31 > 1~21 whenever Yl is small compared with Y2 and y3.

74 The coefficient Y1 might be expected to be relatively small compared with Y2 and y3 if, for example, equation (19) represents the likelihood of a crime leading to conviction and equations (20) and (21) represent imprisonment and sentence length, respec- tively. The likelihood that a crime leads to a conviction depends principally on the seriousness of the offense and the quality of the evidence. The imprisonment and punishment decisions depend primarily on the seriousness of the offense. The variable x in equations (19-21) might then be interpreted as the unobservable components of the seriousness of the offense. The fact that the role of seriousness at the conviction stage depends on the quality of the evidence (variations in seriousness are likely to have little effect on the likelihood of conviction if the quality of the evidence is either very bad or very good), whereas for the most part it operates unconditionally at the imprisonment and punishment stages, suggests that y1 is less than Y2 and r 3. Thus if Y1 is small relative to Y2 and y3, we expect that the regression of sentence length on race, SES, income, etc. as well as other factors will indicate less discrimination than the regression of the likelihood of imprisonment on the same or similar factors. Depending on the coefficients of S in these equations, this suggests that (if Y1 is small relative to y2 and y3) regressions using samples of individuals sentenced to prison might have the greatest chance of (spuriously) finding reverse discrimination relative to regressions using less selected populations, when in reality the same amount of discrimination occurs at each stage. More generally, it might be expected that we would estimate a greater amount of discrimination at the imprisonment stage than at the punishment stage given imprisonment. We shall return to this prediction when we review various studies of both the imprisonment decision and the punishment decision given imprisonment. It must be pointed out, however, that these conclusions do not hold for all possible values of Y1, Y2, and y3. For example, if y1 = Y2 = Y3, x is uniformly distributed, and if the range of w1 and w2 is suitably restricted, then 62 = 83; this is shown in the appendix to this paper. If x is normally distributed, then 63 ~ 82 might be either positive or negative, depending on the values of a, 0, Var(x), and Var(wj). The intuitive rationale for this result is that the

75 second round of selection that occurs at equation (20) is operating on two different populations--i.e., the blacks who have reached equation (20) have not passed through as stiff a screen as the whites who have reached equation (20). Putting the blacks through an additional screen, albeit not as stiff a screen as the whites, may still have a greater effect on the blacks than the whites, because the population of blacks to which the screen is applied is not as selected as the respective population of whites. Whether this is true depends critically on the distribution of x and the various parameters of the model. Procedures to Deal with Selection Bias If sample selection bias is operative, it is essential to consider how its effects might be isolated in order to estimate the true extent of discrimination in the criminal justice system. This section outlines some approaches to dealing with the selection problem. The first approach involves placing bounds on the parameters of the regression equation of interest to take account of the effects of selection bias. Consider the most difficult estimation problem--the case in which the vector of variables z, the regressors in the selection equation, are a subset of the variables in the regression equation. Suppose that p, the correlation coefficient of ~ and u, is known a priori. Then equation (17) indicates that a, the disturbance in the regression of y on x (using the selected sample), has a variance that is a function of only two sets of unknowns--a and y. If data are available to estimate the selection equation, then ~ can be consistently estimated using the estimation procedure described above. This information is sufficient to generate a consistent estimate of as (see Olsen, 1980). Coupled with a consistent estimate of I, this is sufficient to purge the coefficient estimates from the regression of y on x of the inconsistencies induced by selection. This can be done using the relationship between the coefficients consistently estimated by the regression of y on x and the true coefficient vector ~ described in equation (18). Consistent estimates of ~ can then be computed for each maintained value of p. Allowing p to vary from -1 to +1 will then trace a bounded set of estimates that bracket the estimate of ~ that would be generated if p

76 was known a priori. In addition, confidence intervals for the extreme points of the bracketed set of estimates could be used to measure the uncertainty in the estimates. The bounds on the estimated value of ~ (given p) could be further narrowed if the investigator possessed a priori knowledge concerning the values Of P. on, ,, and/or B. The most likely candidate for such a Driori knowledge that would be widely shared by the readership of the prospective data summary would be p. In many instances, the researcher may be able to perform experiments to learn about the value of p (this topic is discussed in more detail below). A formal Bayesian analysis could be helpful in deriving a posterior distribution on _ that takes account of the investigator's uncertainty concerning the value of P. While these approaches clearly do not provide a single point estimate of $, they may be quite helpful in narrowing the uncertainty about ~ created by the presence of sample selection. A second approach that can be pursued involves the use of exclusion restrictions to yield a consistent point estimate of 0. The bounding approach does not require the investigator to specify which of the variables included in the estimated regression equation are included because they are important in the selection of the sample as distinct from the regression equation of interest. Equation (16) indicates that variables that enter the selection equation but not the regression equation of interest will have a nonzero coefficient in the regression equation if a selected sample is used to estimate the regression equation. But if the investigator is willing to specify that at least one of the included variables is included only because of its role in the selection of the sample, then it is possible to get a consistent m.~tim~ - of R - ~ - n i n The presence Of sample selection. Examining equation (18), it is clear that if a regressor, say z;, is included in the estimated regression only because of its role in the selection equation, then the regression of y on x will consistently estimate --pot for the coefficient of Zen. If data are available to estimate the parameters of the selection equation, this will be sufficient to get a consistent estimate of pee. Equation (18) indicates that this is sufficient to get a consistent estimate of the entire coefficient vector 0. This procedure can be generalized straightforwardly to handle multiple exclusion restrictions (see Heckman, 1979, and Olsen, 1980).

77 It is important to stress that the critical assumption that identifies ~ is the assumption that one of the zigs affects the observed y's only through the selection process. The imposition of such a restriction should not be taken lightly; it should be done only if the investigator is convinced that it is essentially correct, i.e., that z; has very little influence on the nonselected y values. If the investigator has considerable uncertainty about the value of the coefficient of z; in the nonselected regression, then this specification uncertainty should be incorporated into the analysis and should not be ignored by imposing an arbitrary exclusion restriction. - ~ Failure to do so may yield extremely unreliable estimates of D. The third approach involves dealing directly with the factors giving rise to the selection bias. As we noted earlier, selection bias arises only if some of the factors affecting both the selection of the sample and the regression process of interest cannot be observed. This suggests that an attempt to measure some of these factors, by means of, for example, interviews of the relevant individuals involved in each of the cases in the sample, would reduce the magnitude of the selection bias. Experiments might be helpful in identifying the most prominent factors giving rise to a nonzero covariance between the unobserved factors affecting both selection and the regression process of interest. A related strategy involves specifying other variables besides the dependent variables in the selection and regression equations that are also affected by the unobserved factors that give rise to selection bias. This approach is pursued in Garber et al. (in this volume). Finally, another approach, proposed by Heckman (1979), can be used to get a consistent estimate of ~ in all cases, including the case in which the investigator is not willing to impose any exclusion restrictions on the selection variables. For the case in which the regressor in the selection and regression equations are the same, identification of ~ effectively rests on specifying a different functional form for the relationship between the probability that an observation is included in the sample and the vector of regressors x and the relationship between x and the process of interest (as represented by y) (Olsen, 1980). This approach is quite sensitive, however, to the specific functional forms chosen (see Goldberger, 1980). When one or more

78 exclusion restrictions are exploited, Olsen's findings suggest that Heckman's approach yields a very similar coefficient estimate to an alternative approach suggested by Olsen (1980). In practice, then, three basic approaches can be adopted to control for selection bias. The ideal approach is to measure the relevant factors well enough to eliminate the nonzero covariance between the disturbances in the selection and the regression equations. This will eliminate the sample selection bias entirely. If this is not possible, then the investigator can attempt to find an additional variable that is also affected by the unobserved factors giving rise to the nonzero covariance between the selection and the regression disturbances. If this approach cannot be implemented, then the investigator can consider the imposition of an exclusion restriction on the model. If none of these approaches can be implemented satisfactorily, then the investigator can always resort to the bounding approach. While this approach does not yield a consistent estimate of the regression coefficient vector, it will indicate the potential magnitude of the selection bias. It should be noted that both the exclusion and the bounding approaches require the investigator to estimate the parameters of the selection equation. This equation cannot be estimated unless data are available on the cases not reaching the stage of interest (as well as on the cases that do reach the stage of interest). In the case of the sentencing stage of criminal justice system, this means that data must be available at the very least on the crimes that are not reported and the crimes that are reported but do not lead to an arrest. These data are rarely if ever available. Unless the investigator is willing to introduce considerable a priori information, the best the researcher can do under these circumstances is to control for the selection that takes place at the stages at which data are available. If there is a considerable amount of discrimination at the reporting and arrest stages, the available procedures may still yield unreliable estimates of the extent of discrimination. In particular, they will yield downwardly biased estimates of the extent of discrimination at the sentencing stage even if procedures are introduced to take account of selection bias occurring after the arrest stage. Thus the selection problem cannot be controlled for completely unless data

79 are available for a random sample of all crimes committed. In the absence of such data, selection bias will occur unless there is no discrimination at the reporting and arrest stages. A MODEL OF THE CRIMINAL JUSTICE SYSTEM The processing of cases through the criminal justice system is a complex process, involving a number of different actors, each with individual objectives. In order to make inferences about the role of extralegal factors in the various stages of processing, it is essential to have a model of the criminal justice system. This is not to say that there is no room for an inductive, empirically based analysis, but rather that the analyst cannot remain wholly agnostic about the objectives of the different actors, the factors affecting their decisions, and the role of institutional constraints. Our primary purpose in this section is to construct a model of the criminal justice system that will help us judge the plausibility of the inferences made by previous researchers concerning the role of extralegal factors in case disposition. We begin with a background discussion of several of the key aspects of case processing. The discussion serves as the basis for the formal specifications that follow. Background Discussion of the Model The model is a formal characterization of the interrelationship of four significant aspects of case processing in the criminal justice system: (1) the decision to prosecute; (2) plea bargaining; (3) trial;5 and (4) sentencing. We abstract considerably from the full complexity of the system, ignoring such elements as the bail decision and the choice of attorney. Trial and Sentencing While trial by jury (or judge) is a hallmark of the American criminal justice system, it is not the predominant mode for reaching disposition. Table 2-1 shows the percentage of indictments that were adjudicated

80 by trial in the district court of Washington, D.C., in 1971. Only about a quarter of the indictments resulted in trial, and about half of these resulting in convictions. The other three-quarters of the indictments were either dismissed by The nrm9:~,tor or were settled by a guilty plea. was a guilty plea. The most common form of disposition The statistics in Table 2-1 overstate the frequency of trials (and guilty pleas) because the figures are based on indicted offenses. Not included are cases dismissed at initial screening or dismissed prior to indictment. Rejection rates at these preindictment phases can be very high. For example, in the Washington, D.C., superior court (which is different from the district court) only 16 percent of arrests were indicted and prosecuted as felonies in 1974 (Forst et al., 1977) . Table 2-1 also reveals that trials are more frequen t For example, over 50 percent of the homicides are disposed by trial compared with a 13 percent rate for larceny. The figures in Table 2-1 also suggest that conviction rates at trial appear to be inversely correlated with seriousness. While the statistics in Table 2-1 are for only one, in cases involving more serious offenses. TABLE 2-1 Mode of Disposition by Crime Type--Washington, D.C. District Court, 1971 Disposition Disposition Conviction by Guilty by Trial Rate Plea Crime Type (Percent) (Percent) (Percent) Homicide 50.3 50.0 31.6 Robbery 28.2 66.0 58.7 Assault 34.6 71.4 45.0 Burglary 20.3 68.6 52.3 Larceny 13.0 80.0 69.8 Auto Theft 17.1 75.0 74.3 All Dispositions 24.1 56.5 57.1 SOURCE: Administration Office of the United States Courts (1971:98) .

81 perhaps atypical, court, similar patterns are observed in the 89 federal district courts and in a sample of nonfederal courts examined by Landes (1971). The statistics in Table 2-1 on conviction rates suggest conviction at trial is by no means certain. Various studies of the conviction process for different offenses indicate that, not surprisingly, the quality of the evidence is the most important factor affecting the likelihood of conviction. LaFree (1980) finds that in rape cases, the likelihood of conviction is directly related to the strength of the prosecution's case and inversely related to the strength of the defense's case. The findings of Forst et al. (1977) and Forst and Brosi (1977), while not directly applicable because their estimation data set includes guilty pleas, also illustrate the important influence of case quality on the likelihood of conviction. Probably the single most important factor affecting the sentence of those convicted is the seriousness of the offense. This is clearly revealed in statutory specification of penalties and in statistics on the likelihood of incarceration and time served by offense type. Factors such as weapon use, victim provocation, ana v~cc~myorrenaer relationship, all of which may be interpreted as components of seriousness, also appear to influence sentence (see for example LaFree, 1980; Cook and Nagin, 1979; and Zimring et al., 1976). Another well-documented factor affecting sentence is the offender's prior record. Many statutes formally specify prior record as a criterion for imposing harsher punishment (e.g., in California) and numerous statistical studies strongly suggest that offenders with more extensive criminal histories are more harshly punished. An important question that is not addressed in this paper is the appropriate way to measure prior record. This issue is analyzed in Garber et al. (in this volume). Finally, it appears that persons convicted at trial are sentenced more harshly than those who plead guilty. ~ . . . . . . . . Table 2-2 shows-for the district court of Washington, D.C., the percentage of defendants sentenced to prison contingent on whether the defendant is convicted at trial or through a plea the likelihood of of guilty. The statistics reveal that a prison sentence is substantially greater for those who are convicted at trial. This difference can be interpreted as an indication of the leniency afforded to defendants who are willing to acknowledge their guilt. We will suggest and develop an

82 TABLE 2-2 Percentage of Convictees Sentenced to Prison by Mode of Conviction: District Court of Washington, D.C., 1971 Crime Percent Given Prison Sentence _ Convicted by Convicted Guilty Plea at Trial Homicide 68.2 38.6 Robbery 80.5 94.1 Assault 46.5 55.0 Burglary and Larceny 68.3 80.6 All Cases 65.3 82.8 SOURCE: Administrative Office of the United States Courts (1971). alternative explanation of this result that is based on the nature of the plea bargaining process. Plea Bargaining Plea bargaining involves a prosecutor's offering to the defendant special consideration in return for a plea of guilty. The consideration may be a promise to recommend a sentence to the judge that is acceptable to the defendant or, alternatively, a promise to drop or reduce charges or both. The former is called sentence bargaining, and the latter charge bargaining. While clearly different, there is an essential equivalence in both types of bargaining. The defendant pleads guilty with the expectation of receiving a lighter sentence than if convicted at trial. While judges typically are not directly involved in these negotiations and in some jurisdictions are legally prohibited from doing so, the defendant's expectation about sentence if he pleads must typically be borne out. Otherwise, the plea bargaining process would not persist. The high rate of guilty pleas shown in Table 2-1 suggests that plea bargaining is a common practice in the Washington, D.C., district court. Of course, not all

83 these guilty pleas are necessarily the result of plea bargains. Some proportion is undoubtedly guilty pleas that were tendered without negotiation. There are no official statistics on the proportion of convictions resulting from guilty pleas. But most observers believe plea bargaining is common. For example, Farrel and Swigert (1978:46) state: "In fact, more than 90% of all convictions involve negotiations of a guilty plea between defense and prosecution." Plea bargaining is an established and much used institution in the U.S. criminal courts for several reasons. First, prosecutorial and judicial resources are insufficient to adjudicate all cases by trial. Plea bargaining enables the prosecutor to conserve scarce resources. Second, according to some, the practice of plea bargaining is related to the limited resources of public defenders. Overburdened public defenders may encourage their clients to plea bargain to avoid the time demands of preparing and presenting a case for trial. Third, plea bargaining may be motivated by the desire of the defendant to avoid the risk of severe punishment if convicted at trial and perhaps secondarily to economize on defense fees and to avoid the time cost of going to trial.6 The essential aspect of the plea bargaining process is the negotiation of a "deal" that is acceptable to both the prosecutor and the defendant (and, indirectly, to the judge). In our model a key parameter in the negotiations is the expected sentence at trial--that is, the probability of conviction at trial times the expected sentence if convicted. This suggests that if the defendant's perceptions of the likelihood of conviction and/or the expected sentence if convicted are less than those of the prosecutor, then a bargain may not be struck and the case will be adjudicated at trial. The Decision to Charge The decision to charger is a prosecutorial decision that is most commonly made in response to the arrest of a suspect or the filing of a complaint. Case dropouts at this initial juncture are extremely high. Forst et al. (1977) report that 21 percent of the arrests brought to Washington, D.C., superior court in 1974 were rejected at an initial screening. Another 29 percent were subsequently dismissed by the prosecutor. Dropout rates

84 at this stage appear to vary considerably by crime type. For example, Reiss (1975) reports that in the Washington, D.C., superior court, the dropout rate for homicide arrests was only 4 percent, whereas for aggravated assault it was 23 percent. The discretionary power of the prosecutor at this stage in the criminal process is considerable and, of course, of substantial importance to the potential defendant. If extralegal factors play a major role in determining case disposition, their most important influence may be at this highly discretionary and low-visibility decision point. While extralegal considerations may affect the charging decision, clearly relevant factors also have a major influence. The decision to prosecute appears to be primarily influenced by case quality, seriousness of the alleged offense, and defendant's prior record. In an excellent study of federal prosecutors in the northern district of Illinois, Frase (1978) analyzed the reasons given for rejecting complaints filed for prosecution. In 22 percent of the rejections, insufficiency of evidence and/or witnesses was cited. In another 42 percent of the rejections, the alleged offense was characterized as too trivial for prosecution (e.g., a small amount of contraband was involved). Finally, 16 percent of the rejections cited the accused as having no prior criminal record as the rationale for not accepting the case. These findings are consistent with those of Greenwood et al. (1973), Reiss (1975), and Forst et al. (1977). , , The Model of the Decision to Prosecute, Plea Bargaining, and Conviction and Sentencing The model is composed of two basic equations. The first equation relates the probability of conviction at trial, P. to a set of observable and unobservable factors: P = f(~,x) + ~1 + u ~ (24) The vector ~ is composed of an observable set of attributes of the case that define the quality of the evidence. Factors such as the number of eyewitnesses, the amount of tangible evidence, and verification of the defendant's alibi are components of q. The vector x is composed of a set of observable extralegal case characteristics, such as the race of the defendant and

85 the race of the victim. It is included to account for any discrimination that may be present in the conviction process. The last two terms in equation (24), £1 and u, are two independent classical disturbances. The disturbance £1 represents the influence of factors that are not observed by the investigator but are assumed to be known to both the defendant and the prosecutor. The disturbance u summarizes the influence of factors that prior to the trial are known only to the defendant. Such factors might include the defendant's knowledge of the vulnerability of his or her alibi, agreements made with codefendants on the specifics of their testimony, the nature of the testimony that will be provided by defense witnesses, etc. The presence of u in equation (24) implies that this information will become known at the trial and hence influence P. The asymmetry in the knowledge of u on the part of the defendant and the prosecutor plays an important role in the mea harosinina model that follows. The second equation relates the sentence given conviction, S. to a set of observable and unobservable factors: S = h(e,_) + £2 (25) We assume that there is only one sentencing option available, which the judge specifies at a level S. (In a later section we pose a model that more fully captures the range of options actually available.) The vector e represents various observable aspects of event seriousness, such as charge, weapon use, victim provocation, etc. To avoid unnecessary notation, we also include legally relevant characteristics of the defendant, such as prior record, in e. The vector x is included in equation (25) to account for any discrimination that is present in sentencing. The final term, c2, is a classical disturbance that represents the influence of factors not observed by the investigator that are assumed to be known to both the defendant and the prosecutor. These two equations form the basis of our model of prosecutorial decision making and the plea bargaining process. Our model of prosecutorial decision making borrows from Landes (1971). We assume that the objective of the prosecutor is to maximize the expected punishment received in cases available for prosecution subject to a constraint on the availability of prosecutorial

86 resources. It generalizes the Landes model, first, by being more explicit about the sources of randomness and, second, by including the decision to charge within the . model framework. We begin by defining a variable D, where = D = 0 if the case is dismissed, 8 = 1 if the case leads to a pleaded settlement, 2 if the case goes to trial. Prosecutorial decision making is assumed to be determined by an index SP, where SP = {[h(e,x) + c2][f(q,x) ~ 61]~[l + m(x)] - kRP = [S(P - u)][1 + m(x)] - kRP . (26) In cases that are not dismissed, SP is the sentence that the prosecutor will offer in plea negotiations. The term S(P - u) is the prosecutor's perception of the expected sentence. Its presence in SP reflects the assumed prosecutorial objective of maximizing expected punishment. Note that since the term u is not observed by the prosecutor, it is not included in the prosecutor's estimate of expected sentence at trial. Ignore for the moment the remaining terms in equation (26). Then given the assumed objective, the prosecutor will be indifferent between resolving the case by a negotiated plea with a sentence of S(P - u) and disposing of the case at trial. This is because from the prosecutor's perspective, both modes of disposition result in an equivalent expected sentence for the defendant. The other two terms in equation (26) are included to account for any bias on the part of the prosecutor and for any resource savings that will accrue to the prosecutor if the case is settled without a trial. The function m(x) represents the bias of the prosecutor. If a defendant has characteristics that prompt leniency, then mix) will be less than zero, whereas a value of m(x) greater than zero corresponds to prosecutorial discrimination against certain types of defendants. If the prosecutor does not discriminate at all, then m(x) would be equal to zero for all x. Discrimination is assumed to affect the expected sentence multipli- catively. This captures the idea that the prosecutor does not simply add or subtract a constant amount to the sentence offered in plea negotiations. Instead, he or she adjusts the offer in proportion to the sentence he or

87 she would offer in the absence of discrimination. The final term in equation (26), ARP, is a "leniency bonus" the prosecutor is willing to offer in exchange for a plea of guilty. The term RP represents the prosecutorial resources that would be required if the case went to trial, and ~ is the shadow price of the prosecutor's resources. The presence of this term reflects the constraint on prosecutorial resources. We assume that a case will be dismissed if SP < 0. Intuitively, SP < 0 implies that the expected sentence at trial less the shadow price of the resources expended in prosecution is negative. Assuming that the prosecutor's objective is to maximize expected punishment, it is rational for the prosecutor to dismiss the case and not proceed into plea negotiations when SP < 0. This implies that the probability a case is dismissed is Prob(D = 0) = Prob(SP < 0) = Prob(SEP - u][1 ~ m(x)] < ARP) . (27) We assume that all cases that are not dismissed enter into plea negotiations. Whether the negotiations result in a negotiated plea depends on the decision rule of the defendant. The defendant is assumed to consider the prosecutor's offer, SP, in light of his own perceptions of the expected sentence at trial and perhaps secondarily his estimate of any savings in legal fees and time required to prepare for trial if he accepts the prosecutor's offer. Accordingly, we define a variable SP, which is the maximum negotiated offer acceptable to the defendant, where SP = S.P ~ DRD = [h(e,x) + £2][f(9,X) + ~1 + u] + DRD . (28) The term S.P is the defendant's perception of expected sentence if he goes to trial. Note that it is different from the prosecutor's perception of expected sentence because the defendant, unlike the prosecutor, is aware of the factors included in the term u that will affect the probability of conviction at trial. Ignore for the moment the final term in equation (28). If we assume that the defendant is risk-neutral with regard to prison sentences, then he will be indifferent between taking his chances at trial and accepting a sentence equal to his perception of the expected sentence at trial. This is because both alternatives have an equivalent expected

88 sentence. The term RD measures any savings in defense resources that will result from avoiding trial. The parameter ~ measures the shadow price of defense resources. A value of ~ strictly greater than zero implies that a defendant will be willing to accept a sentence greater than the expected sentence at trial in order to avoid expending the additional time and money required to prepare and present a case at trial. For cases that are not dismissed, we assume that a negotiated plea will result if the prosecutor's offer, SP, is less than the defendant's maximum acceptable offer, SD. Otherwise, the case will be adjudicated at trial. Accordingly, the probability of a negotiated plea given the case is not dismissed is Prob(D = l1D ~ 0) = Prob(SP ~ SD) = Prob (m ~)[(u + cl)h(e,x) + S.£2] - S.u[l + m(x)] ~ DRD + kRP - m(x)f(g,x)h(e,x)) and the probability of going to trial is (29) Prob(D = 21D ~ 0) = l - Prob(D = l1D ~ 0) . (30) Several predictions of this model are of interest. If prosecutors do not discriminate (i.e., m(~ = 0 for all x), the probability that a plea bargain will be struck simplifies to Prob(D = l1D ~ 0) = Prob(S.u < DRD + HARP) . (31) Observe that this expression does not include the probability of conviction at trial. Assuming that case quality, Hi, does not affect sentence following conviction, the model predicts that the likelihood of conviction by plea is unaffected by factors of case quality known to both parties. This is a surprising result. It is a reflection of the assumption that the defendant and the prosecutor have similar perceptions of the effect of ~ on the probability of conviction. The implication of this observation is that if the factors affecting the likelihood of conviction are analyzed using a sample of dispositions that include negotiated pleas, the effects of case quality on conviction are likely to be understated. LaFree (1980) reports evidence for forcible sex offenses that is supportive of this prediction. He finds that for forcible sex offenses,

89 case quality variables are considerably more important in explaining the probability of conviction given trial than in the probability of conviction for all cases, including those that are settled by guilty plea. More generally, the model suggests that for the purposes of estimation, dispositions resulting from trials should not be mixed with dispositions resulting from pleas because they are the outcomes of very different processes. Absent prosecutorial discrimination, another interesting prediction of the model is that the larger the sentence given conviction at trial, the greater the likelihood that plea negotiations will fail and the case will go to trial. Observe that the term DRD + ARP in equation (31) is greater than zero. The probability that S.u is greater than DRD + kRP is then an increasing function of S. The statistics in Table 2-1 are consistent with this prediction. The percentage of cases disposed by trial is greater the more serious the offense. For example, about 50 percent of all homicides are disposed by trial, whereas for larceny only 13 percent of the cases go to trial. The intuition of this result is straightforward. The difference between the prosecutor's and the defendant's perception of the expected sentence if the case goes to trial equals S.u. Recall that u measures factors affecting the probability of conviction that are known only to the defendant. The magnitude of this difference grows with increases in S. thus increasing the likelihood that the prosecutor's offer will not be acceptable to the defendant. Before elaborating on some of the implications of the model, some caveats on the potential narrowness of the model will help place it in perspective. The model undeniably neglects a variety of factors that motivate prosecutors and defendants. It assumes risk neutrality on the part of prosecutors and defendants alike. Defendants in particular may be risk-averse, particularly if the sentence they might receive if convicted at trial is severe. Some defendants who believe they are innocent may also refuse to negotiate on principle. Likewise, prosecutors may flatly refuse to negotiate cases when the defendant is "notorious" or the case has received considerable public attention. In such cases the expected sentence if convicted at trial may not closely approximate the seriousness of the crime. However, the purpose of this model is not to provide a literal characterization of the motives and decision rules of the

go major actors in the criminal justice system. Rather its purpose is to provide some structure for interpreting the results in the literature and to provide a basis for making constructive suggestions for improving future research. In this regard several implications of the model are pertinent. First, the model implies that the stages of the criminal justice system cannot be neatly separated and examined in isolation. Expectations about the outcome at later stages will affect prior processing decisions. For example, the model predicts that both the probability of the prosecutor's dismissing a case and the sentence offered by the prosecutor in plea negotiations are affected by expectations about the outcome of a trial. Thus, any bias on the part of judges or juries may be reflected in the dismissal decision and the Plea barqa~n~nq process. , , The observation that decision making in the criminal justice system is affected by expectations about the actions of parties not directly involved in the decision is not a peculiarity of this model. Other plausible characterizations of decision making in the process would similarly incorporate the effects of expectations. For example, a prosecutor may choose to dismiss a case based on the expectation that a judge would dismiss the case if prosecuted. Second, the model predicts that the sentence received in a negotiated plea will be the function of the product of the probability of conviction at trial and sentence given conviction. This implies that the sentence in a negotiated plea will be a function of the interaction of the seriousness of the offense and case quality. Third, the model implies that an analysis of the determinants of conviction should not mix guilty pleas and trial convictions. The model suggests that the probability of the defendant's pleading guilty is not related to the attributes of case quality known to both the defendant and the prosecutor. In contrast, the probability of conviction at trial is assumed to be directly related to these attributes. In the next section we elaborate on these issues in the context of studies we have reviewed. REVIEW OF SELECTED STUDIE S Our review of the studies selected by the panel suggests that they generally suffer from three flaws: (1) the

91 biases induced by sample selection are ignored; (2) model specifications do not adequately distinguish guilty pleas from trial convictions; and (3) acquittals and dismissals are inappropriately mixed with convictions. The consequences of these practices are discussed below. Sample Selection Bias Sample selection is a natural aspect of the criminal justice system. Cases are screened from the system at various stages. We have discussed the conditions under which this could lead to biased estimates of the extent of discrimination in sentencing. Three conditions were identified. First, the screening process had to be nonrandom. Second, it was necessary that there be some discrimination in at least one of the screening stages prior to sentencing. Third, some of the unobservable factors that play a role at the sentencing stage also had to play a role at an earlier screening stage. We argued that when these conditions were met, selection bias would generally contribute to an underestimate of discrimination at the sentencing stage. There are a number of stages at which these conditions may be met. Table 2-3 lists these stages and the type of unobservable factors that may play a role in the screening decision. TABLE 2-3 Screening Points in the Criminal Justice System Factor State of Selection Principal Type of Unobserved Seriousness of Quality of the Offense Evidence Detection X Arrest X X Prosecution X X Charge type X Conviction X Sanction Type X

92 The stages itemized in Table 2-3 correspond to variou decisions that must be made by one or more actors in the criminal justice system. Four of the decisions-- detection, arrest, prosecution, and conviction--are clear instances of sample selection in that a negative decision terminates case processing. The charge type and sanction type decisions are more subtle examples of sample selection. Unlike the other screening decisions listed in the table, the charge type and sanction type decisions do not result in case dropout. Instead, cases are sorted into various charge type and sanction type categories. Because the sorting decisions are based on case-specific factors, the selection is not random. This introduces the possibility of selection bias. For reasons elaborated below, we suspect that studies that analyze one type of sanction type and secondarily one charge type may be particularly vulnerable to selection bias. The studies we reviewed employed samples that are selected to varying degrees. The nature of the sample used in each study is listed in Table 2-4. All the studies involve samples of cases that resulted in both detection and arrest. A number of the studies focus on cases that are further selected according to whether the cases resulted in prosecution, a specific charge, conviction, and/or like punishment. Two general comments can be made about these studies with regard to sample selection bias. First, if there exists discrimination at the detection and arrest stages (particularly the latter) then sample selection will cause all the studies to underestimate the magnitude of discrimination in sentencing. Second, if the extent of discrimination at each stage prior to sentencing is unknown, but discrimination is suspected at some of the stages, then the more selected the sample used to analyze the sentencing decisions, the greater the chance of selection bias. s Our analysis in the section on sample selection above provides specific guidelines on the factors affecting the magnitude of the bias in the estimate of discrimination in sentencing. The first factor involves the extent of discrimination at each point of selection. For example, if a sample of convictions is used, discrimination may have been a basis for selection at the detection stage, the arrest decision, the prosecution decision, the charge decision, and the conviction process. The greater the extent of discrimination at the various selection stages,

93 the larger will be the bias in the estimate of discrimination at the sentencing stage, ceteris paribus. The other two factors relate to the magnitude of the covariance between the disturbance in the sentence length equation and each of the disturbances in prior selection equations. The magnitude of this covariance will be affected by the investigator's success in measuring the range of factors determining sentence. The more such factors are measured, the smaller will be the role played by unobservables, hence the smaller the covariance of the unobserved influences. Consequently, studies that account for a larger portion of the variation in sentence in the selected sample are likely to be less subject to selection bias. The second factor affecting the magnitude of the covariance is the extent to which specific unobservable case or defendant characteristics commonly affect the sentence length decision and prior selection decisions. We argued above that the commonality of influences is likely to be particularly large for the unobservable factors affecting both the sentence length decision (given the choice of sanction type) and the choice of sanction type. Both decisions are primarily determined .. by the seriousness of the offense and the defendant's prior record. Seriousness of the offense is a particularly difficult concept to measure. It has many dimensions, few of which are typically observed by the investigator. Consequently, selection bias is likely to be particularly great for studies that use samples of cases that resulted in like punishment. Such studies might be expected to find the least evidence of discrimination. These arguments are applied below to the studies we reviewed. The studies are concerned with two types of discrimination, which we distinguish as direct and indirect discrimination. Direct discrimination involves a finding that sentence is affected by offender characteristics such as race and/or SES after holding constant other factors affecting sentence. Indirect discrimination involves a finding that some of the factors affecting sentence, such as whether the defendant makes bail, are related to offender characteristics such as race, income, etc. We concentrate on the former type of discrimination. The studies are divided into two groups according to whether they found direct discrimination. We analyze the degree to which the differences in the findings of the two sets of studies

94 TABLE 2-4 Summary of Selected Studies of Discrimination in Sentencing Study Sample Following Arrest Clarke and Individuals charged with burglary, breaking and Koch ( 1976) entering, and larceny. They also considered separately the cases in their sample that led to convict ion . Lizotte (1977) Union of 200 cases processed by 15 Chicago trial courts during a one-week period and a random sample of 596 Chicago trial cases in which a grand jury returned an indictment. A random sample of individuals arrested or a random sample of prosecutions. Hagan ( 197 5) Chiricos and Individuals sentenced to prison for a diverse Waldo (1975) set of felonies. Farrell and Individuals charged with murder. Swiger t ( 197 8 ) Gibson (1978) Random sample of indictments on felony charge result ing in convict ions . Swigert and Individuals charged with murder. Far rel 1 ( 197 7 ) Tiffany, Individuals convicted at trial of bank robbery, Avichai, and auto theft, interstate transportation of forged Peters (1976) securities, and miscellaneous forgery. LaFree (1980) Individuals charged with forcible sex offenses ( but cases not leading to arrest and/or charge are also analyzed). may be attributable to selection bias. This exercise is at best suggestive--it is post hoc and it is made more difficult because there are other problems with the studies that may account for their findings. (Some of these problems are discussed below.) Nevertheless, the discussion of the various studies is suggestive of the kind of problems sample selection can induce. It also

95 TABLE 2-4 (Cont.) Mixture Mixture of Guilty of Acqui t Pleas and tats and Trial Dismissals Convie- with Con t ions Dictions Nature of Findings Concerning Diserimination Yes Yes No No Yes Yes Yes No Yes Yes No No Ye s, but No guilty pleas also examined separately Yes Diserimination present against low income defen Yes Dismissals: No (?) Aequittals: dants in that they have a greater probability of being sentenced to prison, ceteris paribus. They infer, through, additional tests, that discrimina- tion arises at the sentencing rather than the convict ion stage . Aequittals: No direct inf luenee of occupation or race on prison sentence length found; indirect effects of race acting through bail and attorney choice f ound . No direct inf luenee of race on the outcome of a Yes ease. Indirect effects of race on charge D i smi s sa l s: s e r i ou snes s f ound . Alternately i no luded an d excluded No direct discrimination; if anything, reverse discrimination. Direct discrimination on the basis of occupational prestige found. Indirect discrimination also f ound through the ef f eat of oeeupat tonal press ige on pr for record . No overall racial or SES discrimination found, but discrimination on the part of some judges found. Diserimination on the basis of sex and occupational prestige found. Indirect discrimination on the basis of a normal primitive characterization also f ound . Diserimination on the basis of race found, but only for eases where the defendant had no prior record. Discrimination found against black defendants when the victim is white at a number of processing stages, including the sentencing stage. provides an explanation of why some studies might find evidence of discrimination while others do not. The findings of the various studies concerning direct discrimination are listed in Table 2-4. Four studies found no evidence of direct discrimination, four found evidence of discrimination, and one found evidence of discrimination only for cases in which the defendant does

96 not have a prior record. Some of the studies finding no evidence of direct discrimination did find some evidence of indirect discrimination. Overall, mixed. We argued earlier that sample the results are selection bias will generally cause an underestimate of the true extent of discrimination. This suggests that the studies that did not find evidence of discrimination may be more subject to sample selection bias than those studies that did find evidence of discrimination. This is the issue we address in the remainder of this subsection. We first examine the four studies that found no evidence of discrimination. We begin with the study by Chiricos and Waldo (1975). This analysis deserves special attention because of its prominence in the literature and the quality of the study. . . . Chiricos and Waldo analyzed the length of prison sentence for 17 offenses and further analyzed each offense separately for three different jurisdictions. Their findings suggest that if there is any discrimination in sentencing, it is in favor of lower-SES offenders. The overwhelming nature of the evidence led them to conclude with considerable confidence that there is no discrimination in sentencing in the cases they analyzed. While their conclusion is certainly reasonable, sample selection is an alternative explanation for their findings that we feel is equally plausible. As we argued above, studies analyzing a specific type of sanction for a specific offense are particularly susceptible to selection bias. Thus it is possible that there may be considerable discrimination in sentencing in the sample of cases they analyze, even though their evidence seems to point overwhelmingly against it. There is further evidence that is supportive of the argument that the Chiricos and Waldo findings are attributable to selection bias. In their regressions, Chiricos and Waldo explain on average about 9 percent of the variation in sentence length. . . . . . . . . This implies that their estimating equations include very few of the factors determining sentence length. This is precisely the situation in which selection bias is likely to be large. Furthermore, we argued that selection bias is larger the larger the amount of discrimination at stages preceding the sentence length decision. We emphasized that it is the amount of discrimination in the imprisonment decision that might be particularly important. Two of the studies that we reviewed examine

97 discrimination in the imprisonment decision. Both studies found evidence of discrimination--Clarke and Koch (1976) for the case of larcenies and burglaries and LaFree (1980) for forcible sex offenses. These findings are supportive of our general argument that Chiricos and Waldo's findings may be attributable to selection bias. Selection bias also provides an alternative interpretation of Gibson's findings. Gibson (1978) analyzed for specific offenses the sentence length for each of three sanction types. As with Chiricos and Waldo, this sort of sample specificity makes the analysis particularly vulnerable to selection bias. The specifics of Gibson's findings also suggest sample selection. Gibson examined the sentence length decision for a single jurisdiction outside Atlanta. He analyzed the decisions by all 11 judges in the district both individually and as a group. He found no discrimination for the group as a whole but did find that some judges imposed stiffer sentences on blacks than on other defendants. The differences across judges appear to be related to their racial attitudes, religious preferences, and other factors that suggest discrimination. This is a curious finding. Overall, Gibson found no discrimination, but for individual judges he found substantial discrimination. The explanation he provides is that there is sufficient reverse discrimination on the part of some judges to offset the discrimination by others. Another explanation for this finding is that selection bias causes the extent of discrimination in sentencing by each judge to be underestimated but does not alter the relative ranking of the judges concerning discrimination in sentencing. This would occur if the factors causing selection bias are the same for all judges. Of the other two studies finding no discrimination, there is no apparent reason to think that selection bias contributed to their findings. Both studies used samples that are not especially selected and both examined a diverse set of charges (though this may tend to blur their results). Hagan (1975) analyzed racial discrimination against native Indians in the Canadian criminal justice system and Lizotte (1977) examined SES discrimination in the United States. However, both studies are plagued by other problems that obscure the effects of all factors on sentencing, including race, SES, etc. These problems are discussed below. Thus of the four studies finding no discrimination, two may be particularly subject to selection bias. The

98 remaining question is whether there are good reasons to suspect that the five studies that found evidence of discrimination are less subject to selection bias. We begin with Swigert and Farrell (1977) and Farrell and Swigert (1978), both of which analyze the incidence of discrimination in homicide cases. The two studies analyzed a sample of arrests for homicide. Across the nine studies we reviewed, only one other study (Hagan, 1975) used a sample of cases that are as "unselected" as the sample analyzed by Farrell and Swigert. In contrast to Hagan, however, Farrell and Swigert concentrated on one type of crime. Because they considered the range of charges available in homicide, they avoid the additional selection bias that would arise if they concentrated on only those homicides resulting in conviction for one type of charge, such as first-degree murder. This enabled them to concentrate on a single class of crimes without artificially introducing an additional round of selection. Presumably their concentration on a like set of offenses made it easier to spot discrimination. The sample they used--arrests for homicide--is selected only in that the cases were detected and resulted in arrest. Thus their estimates of discrimination are subject to selection bias only if (1) there is some discrimination at the detection and arrest stages and (2) the sample of cases leading to arrest is unrepresentative of the broader population of homicides. Farrell and Swigert argue that the latter condition is less likely to be satisfied for homicides than for most crimes. They write (1978:439): As an offense type, criminal homicide provides a valuable opportunity for the study of legal treatment. Homicide defendants are more representative of persons who commit homicide than are defendants accused of any other crime of persons who commit that crime. The visibility of the offense and the high clearance rate of deaths due to homicide suggest that individuals charged with murder exemplify persons who actually commit murder; other offenses display a much greater disparity between crimes known to the police and arrests recorded. This suggests that the magnitude of the bias induced by selection at the detection and arrest stages is less for homicide cases than for other types of offenses. This, coupled with Farrell and Swigert's use of an

99 otherwise relatively unselected sample, may account for the fact that they found evidence of discrimination while other studies did not. The third study that found evidence of discrimination is Clarke and Koch (1976). They too examined a sample of arrests (for burglary and larceny) and found that individuals with lower income have a greater chance of going to prison given arrest. Further analysis suggested that the disadvantages sustained by low-income individuals occurred not at the conviction stage but at the imprisonment decision given conviction.l° They found that given conviction, low-income individuals had a considerably greater chance of being sent to prison than high-income individuals. Clarke and Koch effectively found evidence of discrimination among a sample of cases that resulted in conviction. This sample is more selected than a sample of arrests but is less selected than the samples used in studies like those of Gibson (1978) and Chiricos and Waldo (1975), in which the sample is composed of convictions resulting in like punishment. In principle, it is the least-selected sample that could be used to analyze the determinants of sentencing if acquittals and dismissals are not pooled with convictions. (We comment critically later on the implications of pooling nonconvictions with convictions.) For these reasons, we expect that Clarke and Koch's findings are less subject to selection bias than those of Gibson and Chiricos and Waldo. There are two other features of the Clarke and Koch study that may also restrict the magnitude of the selection bias. First, they did not analyze cases charged with a specific offense. Instead, they analyzed a sample of thefts with charges spanning the spectrum of misdemeanor larceny to felonious burglary. This has the advantage of mitigating the selection bias attributable to the specific charging decision of the prosecutor. Second, they did not find evidence of discrimination in the process from arrest to conviction. This means that the selection of the sample that occurs between arrest and conviction is unlikely to bias the estimate of discrimination in the equation explaining imprisonment . . . given conviction. The fourth study finding evidence of discrimination is LaFree (1980). LaFree examined the various stages from arrest to sentence in the processing of forcible sex offenses. Discrimination was examined in the context of

100 the combination of the race of the victim and the race of the defendant. LaFree found that cases involving black defendants and white victims are treated differently than intraracial assaults by whites and blacks (he has virtually no incidents in his sample involving white offenders and black victims). He found that holding all other observable factors constant, cases involving a black defendant and a white victim have a greater probability of resulting in imprisonment given conviction and on average receive a longer prison sentence given imprisonment. LaFree interprets his findings as evidence of discrimination. However, there is an alternative interpretation of his findings. It has been noted by many researchers that a critical factor in rape cases is whether the victim knew the assailant. The racial composition of the victim-defendant dyed may be proxying for this factor. Whatever the interpretation of this variable, however, our arguments about selection are still applicable. We argued earlier that selection bias would reduce the estimate of discrimination at the sentencing stage among a sample of cases resulting in imprisonment if there was some type of discrimination occurring at the stages preceding sentencing. LaFree did find discrimination (i.e., a role for the race of the victim and defendant) in the decision to imprison given conviction. Following the argument we applied to Gibson (1978) and Chiricos and Waldo (1975), this should make it less likely to find discrimination in a sample of cases resulting in imprisonment. However, LaFree did find discrimination, whereas Gibson and Chiricos and Waldo did not. If our explanation of the Gibson and Chiricos and Waldo findings is correct, it is imperative that we explain why selection bias did not affect LaFree's findings to the extent that it affected those of Gibson and Chiricos and Waldo. There are two possible explanations for the difference in LaFree's findings. First, LaFree used explanatory variables that account for a much larger fraction of the variation in sentence length given imprisonment than in the average regression performed by Chiricos and Waldo (Gibson does not report a comparable statistic). The R2 in LaFree's regression equation for sentence length given imprisonment is .27 whereas the average R2 for the comparable regressions in Chiricos and Waldo is .09. As we argued earlier, Selection bias will be smaller the

101 more the investigator is able to measure the factors affecting sentence length that also affect prior decisions such as the imprisonment decision. The second possible explanation for LaFree's findings is that the magnitude of discrimination in sentencing given imprisonment is so large that even selection bias could not mask it completely. There is independent evidence (see Wolfgang and Reidel, 1973) that there may be a great deal of discrimination in rape cases in which the victim is white and the defendant black. Both factors would help to explain the difference between the findings of Gibson and Chiricos and Waldo and those of LaFree. The last study that does find some, albeit mixed, evidence of discrimination is Tiffany et al. (1975). For each of four crime types, they analyzed a sample of cases resulting in conviction by trial. They found evidence of racial discrimination among cases in which the defendant had no prior record. Otherwise, they did not find any general pattern of discrimination. With regard to selection, their sample is more selected than most in that they considered only cases resulting in conviction via trial, but less selected than those of Gibson and Chiricos and Waldo in that they did not restrict their sample to cases resulting in like punishment. They avoided this restriction by the use of an index that arbitrarily scales different types of punishments such as fines, probation, and prison sentences. Their findings may be colored by the use of this arbitrary index which is an issue we discuss in greater length below. It is difficult to say more about the role that selection bias may have played in their findings. Our discussion of selection bias is at best suggestive, but it does illustrate the role that selection bias may play in the various studies. Generally, selection bias is likely to cause all the studies to underestimate the magnitude of discrimination in sentencing decisions. It may also be an important factor in explaining the results of studies like those of Gibson (1978) and Chiricos and Waldo (1975), which analyzed cases for a specific offense resulting in like punishment. We discussed above the various approaches that might be useful in probing the sensitivity to selection bias of the results of the various studies. While these approaches may involve some additional assumptions that might be controversial, they at least provide a means of addressing the issues. We have tried to argue in this subsection that the problem is

102 sufficiently serious that the use of these approaches might yield considerable dividends. Mixing Guilty Pleas and Trial Convictions The vast proportion of cases ending in conviction result from a negotiated guilty plea. Of the nine studies we reviewed, only one (Tiffany et al.. 1975) explicitly excludes guilty pleas from the analysis. The other eight do not distinguish between guilty pleas and trial convictions or merely enter an additive dummy variable in a sentencing equation to account for the type of conviction. This implicitly incorporates the assumption ~, , ~ that the relationship between sentence given conviction (or, more broadly, case outcome) and various observable features of the case is the same for guilty pleas and trial convictions, except perhaps for an additive constant. If this is not correct, then the estimating equations used in the various studies are misspecified. More important, guilty pleas generally dominate trial convictions in samples of convictions. Since the great proportion of guilty pleas are tendered as the result of plea bargaining, sentencing equations that are estimated on samples of all convictions primarily provide information about the factors affecting sentence in successful plea negotiations. The natural question this raises is what we learn from such exercises about the incidence of discrimination in sentencing. The answer critically depends on the way the plea bargaining process is perceived to operate. And different models of the plea bargaining process suggest very different interpretations. None of the studies we reviewed introduces a model of the plea bargaining process, nor is there any single model that is widely accepted. Therefore, we have chosen to interpret the results of the various studies in terms of the model of the plea bargaining process we presented in the previous section. While this model is stylized, we feel that it captures many of the essential features of the plea bargaining process. Suppose we find that in a sample of sentences comprised of or dominated by negotiated guilty pleas, black defendants receive stiffer sentences, holding all other factors constant. Our model suggests that there may be three very different explanations for this finding. First, it may reflect that judges discriminat e

103 against blacks. If black defendants anticipate that they are likely to receive a stiffer sentence than white defendants if convicted at trial, then the model suggests that blacks would be willing to accept a stiffer negotiated settlement than whites, holding all other factors constant. This explanation also presumes that the prosecutor actively exploits judicial bias in sentencing. Another explanation for this finding is that the probability of conviction is higher for black defendants than for white defendants, holding all other factors constant, and that this is perceived bv both the defendant and the prosecutor. This also causes both the prosecutor and the (black) defendant to have a higher expected sentence, thereby resulting in a stiffer negotiated sentence. Again, this requires the prosecutor to exploit the Biases inherent in the system. A third explanation for the finding is that the prosecutor directly discriminates against black defendants. We specified above that a case would result in a negotiated settlement whenever the prosecutor offered the defendant a settlement that was less than the sum of the defendant's expected sentence and a positive term reflecting the cost to the defendant of waging a trial. In many instances, the settlement that is acceptable to the defendant may be greater than the minimum settlement the prosecutor would be willing to offer (assuming no discrimination). prosecutor with sufficient discretion to offer stiffer settlements to some types of defendants. In this explanation, it is the prosecutor and not the judge or jury that is the direct source of the discrimination. Without senaratina auilEv pleas from Rev Prick - This may provide the convictions in some way, it is not possible to distinguish between these three explanations. The only study that does make this distinction is Tiffany et al. (1975). However, even this study never addresses the issue of why some cases go to trial and others are settled by negotiation. In fact, it raises an additional issue that we have not considered: Among those defendants who choose to go to trial, why do some opt for a jury trial while others opt for a bench trial? In light of their findings that convictions in bench trials result in lesser sentences than convictions in jury trials for what appear to be comparable types of cases, it is perplexing that any defendant would ever opt for a jury trial. Until we understand the reasons that cases

104 are disposed of in different ways, even inferences that are made from cases disposed of in a common way must be treated with caution. As we noted in the previous section, such cases represent a selected sample, which introduces the possibility of selection bias. Our model of the plea bargaining process also provides some guidance on how the sentence received from a negotiated settlement should be analyzed. The model suggests that the sentence in negotiated guilty pleas is a function of the seriousness of the offense and the legal and extralegal characteristics of the defendant as they interacted with the probability of conviction. This implies that such sentences include information about factors affecting the probability of conviction as well as the factors affecting sentence length given conviction. The multiplicative form of the interaction provides some guidance on how this information might be extracted. This interaction also implies that for guilty pleas the effect of variations in the seriousness of the offense on sentence depends on the quality of the evidence. In contrast, we assumed in the previous section that the sentence received by those who are convicted at trial is related only to the seriousness of the offense and the characteristics of the defendant. This suggests that mixing guilty pleas and trial convictions introduces a misspecification into models of the sentencing process. It is likely to blur the relationship between sentence and both the seriousness of the offense and the legal and extralegal characteristics of the defendant, thus making it more difficult to assess the true extent of discrimination in the criminal justice system. Mixing Outcomes Several of the studies we reviewed test for discrimi- nation by scaling the outcome of a case and relating this scaled metric to various explanatory variables. The scaled metric is formed by assigning scores to the various possible outcomes, acquittals and dismissals at one end of the scale and prison sentences at the other. The scores are then regressed on factors that proxy for

105 the seriousness of the case, the quality of the evidence, the defendant's prior record, and various legal and extralegal factors. Our model suggests that mixing different types of outcomes in this way is likely to blur the true extent of discrimination in the criminal justice system. The model suggests that (1) the factors affecting the likelihood of acquittal are different from the factors affecting the sentence decision given conviction at trial and (2) the factors affecting the likelihood of dismissal and the sentence decision given conviction via negotiated guilty plea are a mixture of the factors affecting the acquittal decision and the factors affecting the sentence decision (given conviction at trial). Thus condensing these different outcomes into one index effectively mixes together very different processes. Moreover, it makes it considerably more difficult to determine the source of discrimination if evidence in support of discrimination is found. There is evidence in the various studies that is supportive of our view of the way the criminal justice system operates. LaFree (1980) found that in forcible sex offenses, the probability of conviction is primarily a function of the quality of the evidence, whereas the severity of punishment given conviction at trial is primarily related to the seriousness of the offense. In his study of the dismissal decision, Frase (1978) found that the probability of dismissal is primarily related to seriousness of the offense, the quality of the evidence, and the prior record of the defendant. The various studies we reviewed that focus on the sentencing decision given conviction confirm that the primary determinant of the sentencing decision is the seriousness of the offense. The evidence from the other studies is also consistent with the hypothesis that the various outcomes (following arrest) are the product of different processes. The model we presented provides guidance on how the processes behind these different outcomes can be analyzed. As we noted earlier, it also provides guidance on how guilty pleas should be treated. The one issue we have not yet addressed is the implication of using an artificial index to scale the different types of sentences that are typically dispensed by judges. This practice, frequently used in the studies we reviewed, is the subject of the following section.

106 GENERALI BED MODELS OF SENTENCING Criminal statutes typically empower judges to choose among a set of sentencing alternatives for a specified criminal act. Most commonly, these sentencing alternatives are prison, probation, fine, or some combination of these. For first-degree murder, execution is another sentencing option in an increasing number of states. With few exceptions, criminal statutes also permit judges broad discretionary power in determining the type and length of sentence. This discretionary power is particularly broad in jurisdictions with indeterminate sentencing statutes. Under such statutes, prison terms of a determinate length are not required and in some instances are prohibited. Instead, the judge is permitted or required to specify only a minimum and a maximum term. Within the bounds of the minimum and maximum sentence, parole boards have broad discretionary power to determine actual time served. Determinate and mandatory minimum sentencing statutes attempt to varying degrees to structure and limit sentencing discretion, but in most instances judges still have considerable latitude in specifying sentence. The variety of sentencing options available to judges and the discretion they are permitted in choosing among them greatly complicates an analysis of sentencing behavior. The investigator must be attentive not only to the factors determining length of sentence but also to type of sentence. The multiple facets of the sentencing decision raise a number of difficult modeling and measurement problems, which are described below. 1. A common practice in sentencing studies is to collapse qualitatively different types of sentences into a single index of sentence severity. These sentence severity indices are in some respects uncomfortably arbitrary. Are there alternative approaches for analyzing sentencing decisions that do not require the investigator ex ante to impose a severity index? 2. In cases in which a judge does not impose a determinate sentence but instead specifies only a minimum and maximum sentence, how should such a sentence decision be characterized and modeled? In this section we discuss two models of the sentence decision that do not rely on the use of an imposed

107 severity index. We also address the problem raised in the second question. Two Models of Sentencing Decisions The model we have developed assumes that sentence ~ can be represented by a sinale index, S. severity ~ Several of the papers we have reviewed implicitly make the same assumption. Tiffany et al. (1975) and Diamond and Zeisel (197S) scaled sentences of different type and length with slightly modified versions of a severity index developed by the Federal Administrative Office of the Courts (1973). This index is, with a few exceptions, consistent with a lexicographic ordering of sanction severity. Severity scores across sanction types (e.g., prison versus probation) typically do not overlap and thus reflect an implicit ranking of sanction types from least to most severe. Within sanction type, sentences are ordered from least to most severe. The ordering of the severity of sanction types is in general intuitively reasonable. A suspended sentence is scaled lower than supervised probation, and supervised probation equal to or lower than an active prison term. In some instances these orderings are open to reasonable dispute (e.g., fines of any amount are assumed less severe than supervised probation), but in our opinion the major problems with collapsing sentences of different types and degree into a single metric are not the consequence of the assumed ordering of sanction severity. Rather, our principal reservations about the use of these scales stem from two issues. First, the assumed cardinality of the scales is uncomfortably arbitrary. For example, Tiffany et al. (1975) assign a score of 2 to supervised probation of 13-36 months, whereas prison sentences of 13-24 months and 49-60 months are assigned scores of 7 and 14, respectively. Is a prison sentence of 55 months twice as harsh as a sentence of 20 months, and is a 20 month prison sentence 2.5 times as harsh as a sentence of 18 months of supervised probation? The arbitrary cardinality of the scales may increase the difficulty of detecting more subtle influences on sentencing decisions by introducing (unnecessary) error in the measurement of sentencing outcomes. It also makes the interpretation of the magnitude of measured effects more difficult because the unit of measurement has no

108 objective interpretation. Moreover, having collapsed sentence into a single metric, the investigator can only analyze variations in this index with a sinule-equation - processing model. This constrains the decision on sanction type to be a function of the same factors that influence the decision on sentence length within each sanction type. It also greatly constrains the functional form of the two relationships. In this section we develop two approaches for analyzing sentencing decisions that do not require the ex ante imposition of a severity scale for reducing qualitatively different sentences to a single metric. In the first approach, both the type and the length of the sentence are determined by the value of a single latent variable reflecting characteristics of the case and the offender. The second approach allows for the possibility that decisions on the type and the length of the sentence are made with different decision rules. Latent Variable Model of Sentencing Figure 2-2 is a graphical depiction of our latent variable model of sentencing. The vertical axis in the figure depicts the value of a latent variable, Z*. We assume that Z* is determined by Z* = _'e + D'x + I,_ _ (32) where e is a vector of observed variables defining event seriousness and legally relevant defendant character- istics, x is a vector of observed extralegal character- istics of the defendant and victim, a and ~ are vectors of coefficients, and £ is an unobserved classical disturbance. The disturbance is assumed to measure the influence of various factors that are relevant to the judge in determining sentence but are not observed by the investigator. The latent variable Z* can be interpreted as an index measuring the judge's perception of the sentence merited in the case, with larger values of Z* corresponding to more severe (or at least as severe) sentences. The coefficient vectors a and ~ can be interpreted as weights that calibrate the importance of various observable case-specific factors in determining the sentence. The sentence imposed is assumed to be related to Z* as follows. Define a series of threshold parameters, Tj,

109 Z* lo: 111 > He On LL He Prison ~ 72 months ~~ TJ 60 months < Prison 6 72 months T]_1 ' j + 1 6 months < Probation 6 12 months T. J _ _ T1 T3 Fine 6 $500 T2 Suspended Sentence FIGURE 2-2 Latent Var table Model of Sentenc ing

110 j = 1, 2, . . . , J. corresponding to each of J sentencing options presumed to be available to the judge. The Tj's are ordered so that T1 corresponds to the least severe sentence and TJ to the most severe sentence. Sentencing option j is assumed to be chosen whenever Tj < Z* < Tj + 1, where TJ + 1 is definitionally set equal to infinity. In Figure 2-1, the Tj's are denoted by horizontal slashes on the vertical axis. The It's define J sentencing options, which are ordered in Figure 2-2 from the least severe to the most severe. The ordering is based on the severity scale developed by the Federal Administrative Office of the Courts. The judge is assumed to compute the value of an index Z* from case-specific information, some of which is observed by the investigator. He then compares the value of his index with the threshhold parameters denoted in the figure, choosing the sentencing option j with threshold parameter Tj for which Tj ~ Z* < Tj + 1. Of interest are the parameters a, 0, and the Tj's. The coefficient vectors a and ~ measure the effects of various case-specific factors on the sentence imposed. The threshold parameters coupled with a and ~ define an implicit severity scale. The width between the Tj'S coupled with a and ~ determines the range of values of the explanatory variables in _ and x that map into each sanction type. Both the coefficient vectors a and ~ and the threshold parameters Tj, j = 1, 2, . . . , J. can be estimated from a sample of observations on e, x, and the sentence imposed (given a rank ordering of the alternative sentences). A fuller discussion of this model, which is called the ordered PROBIT model for the case in which £ iS assumed to be normally distributed, and the procedures available to estimate its parameters, can be found in Altman et al. (1981:Ch. 2). The virtue of this latent variable model of the sentencing decision is that it does not require the analyst to impose an arbitrary scale to reduce sentences of different type and degree to a single metric. It only requires that the analyst be able to rank order the sentence options chosen according to a measure of severity. The threshold parameters, in conjunction with and 6, implicitly define a severity scale for offenses. Because these parameters are estimated from the data rather than imposed a priori, the implied severity scale is the one that best "fits" or "rationalizes" the sentences in the sample. It can be

111 thought of as an estimate of the scale implicitly used by judges to translate case-specific information into actual sentences. This type of model fits quite well into the framework we have developed. We assumed that there was only one sentencing option available to the judge, which he or she could set at a level S. The prosecutor and the defendant were assumed to make decisions based on their expectations about S and the probability of conviction. Such a model can easily be estimated by interpreting S to be the latent variable Z* defined above. We need only assume that the prosecutor and defendant operate on the basis of their perception of the expected sentence, the severity of the sentence (given conviction) being represented by the latent variable Z*. An Alternative Model of the Sentencing Decision While we believe the latent variable model is an improvement over the severity scale approach, it is perhaps an excessively restrictive conception of the sentencing decision. In particular, it is assumed that the choice of sanction type and length of sentence are both determined by the same latent index Z*. This implicitly assumes that the same factors that determine the sanction type also determine the length of sentence. It also constrains considerably the way the case-specific factors can affect the choice of sanction type and the choice of sentence length within the sanction type. The model can be generalized considerably if we model the sentencing decision as a two-step decision, the first step corresponding to the choice of sanction type and the second step corresponding to the length of sentence given the sanction type. The first step can be modeled like the model discussed above. Define a latent variable R* as an index that determines the choice of sanction type. We assume that R* is generated by R* = I'd + v , where g is a vector of case-specific factors that affect the choice of sanction type, ~ is a vector of coefficients (or weights), and v is an unobserved disturbance. The variable R* is then mapped into a choice of sanction type by a set of threshold parameters rk, k = 1, 2, . . . , K, where K is the number of

112 sanction types available. As above, the choice of sanction type depends on the value of R* relative to the rk. Given a sample of observations on ~ and the choice of sanction type, we can estimate ~ and the rk, k = 1, 2, . . . , K. The second step can be modeled as a regression equation. Let Sk represent the length of sentence for sanction type k. Then Sk is assumed to be determined by Sk = ~ kehk + Wk ' where t~ is a vector of variables that affect the length of sentence for sanction type k, ok is a vector of coefficients for the kth sanction type equation, and wk is the disturbance for the kth sanction type equation. The parameter vector Ok can be estimated given a sample of observations on Sk and hk. This type of model of the sentencing decision cannot be easily adapted to the theoretical model we presented in an earlier section. The fact that sentence severity can no longer be characterized directly or indirectly by a single metric S greatly complicates the model. If the objectives of the prosecutor and defendant are to be characterized in terms of a single metric of sentence severity, then some mechanism for collapsing sentences of different type and degree into this metric is necessary. As previously discussed, the latent variable model of sentencing provides this mechanism. The second model provides no such mechanism. The absence of such a mechanism is a result of the second model's recursive characterization of the sentencing process. The model does not require that all case and defendant characteristics relevant to sentencing be collapsed into a single metric that maps onto an actual sentence. Indeed, the model is an explicit disavowal of this conception of the sentencing process. As a consequence, incorporation of this recursive model of sentencing into the model developed earlier requires a generalization of that model that takes account of the multiplicity of sanctioning alternatives. The prosecutor's objective of punishment maximization would have to be specified in terms of a set of qualitatively different penalty alternatives. Likewise, the defendant's objective of minimizing his expected punishment would have to be similarly specified. Bargaining would have to take place concerning both the sanction type and the sentence length. This is a much

113 more complicated process to model and is beyond the scope of this paper. Modeling Indeterminate Sentences In reviewing the literature on the effects of extralegal factors on case disposition, we have encountered an important and perplexing problem: How should indeterminate prison sentences be analyzed? As noted previously, in many jurisdictions judges are not required and are sometimes prohibited from imposing a prison term of a specified length. Instead they impose only a minimum and maximum term or in some instances simply remand the defendant to the custody of the Department of Corrections for a completely unspecified period. When the prison term is not specified exactly by the judge, actual time served is determined by a parole board or in some jurisdictions by the parole board with the consent of the sentencing judge. Since the factors determining the length of sentence are an issue of particular concern, the use of appropriate models and statistical methodologies for analyzing indeterminate sentences deserves careful attention. To our knowledge, this issue has not been seriously addressed in the literature. Furthermore, in the studies reviewed, we found that the approaches used in dealing with this problem varied widely and that none was satisfactorily justified. For example, Chiricos and Waldo (1975) characterize an indeterminate sentence by the minimum term, Lizotte (1977) by the average of the minimum and maximum, and Tiffany et al. (1975) by the maximum. In no case do the authors provide a cogent justification for their measurement of a variable that is central to their analysis. Unfortunately, we cannot provide any specific suggestions for dealing with this problem. Its resolution requires a model of the objectives judges pursue in specifying the minimum and maximum sentence. While we have not been able to specify such a model, a discussion with one judge suggests that the following considerations may be useful in specifying a model. 1. The seriousness of the offense, the defendant's prior record, and his perceived threat to society may in the judge's opinion require at least some minimum term of incarceration.

1 114 2. If an individual on parole is convicted of another crime, he may be required to serve the remainder of his term for the crime for which he had been paroled. A high maximum may thus serve as a substantial deterrent to criminal behavior while on parole. The judge's view of the need for this deterrent may affect his specification of the maximum. 3. The judge may be very uncertain about the possi- bility that the defendant will reform. This uncertainty may be particularly great when the defendant has no prior record but has committed a serious crime. Uncertainty about the defendant's potential for reform may affect the spread between the minimum and the maximum. We urge that immediate attention be given to develop- ment of a model of indeterminate sentencing. The absence of any such model is a major gap in the literature. EXPERIMENT S I N S~NTENCI NG RE SEARCH Designed experiments provide an alternative source of data for sentencing research. Most experiments in the literature are designed to analyze the behavior of judges. They present a group of judges with a set of case files and ask them the sentences they would impose (see Diamond and Zeisel, 1975; Partridge and Eldridge, 1974). The cases may be actual cases or artificially constructed ones designed to probe specific issues. Most studies tend to be interested in specific questions, such as the magnitude of discrimination and disparity in sentencing. We concentrate on these two issues in the discussion that follows. The terms discrimination and disparity are somewhat vague. A more specific operational definition is provided by Kadane (personal communication, 1980): Suppose two tr ials dif fer only in the race , sex or religion of the defendant (I like to think of them as plays or films in which the race, sex or religion of the actor playing the defendant differs, but the script is the same). Then any difference in sentencing we could call discrimination. Suppose two trials differ only in the identity of the judge deciding the case. Then differences in the sentence could be called disparity.

115 This notion of disparity can be extended by noting that identical cases tried by the same judge might result in different sentences (one might imagine a judge trying the same case repeatedly and having his recollection of the previous trial erased before each new trial). This form of disparity could be called within-judge disparity. The form of disparity described by Kadane might be called between-judge disparity. These definitions emphasize that within-judge disparity is inherently random. The severity of this type of disparity might be quantified using a measure of the amount of variation in this quantity, such as its variance. The between-judge disparity is fixed for each judge but varies randomly throughout the population of all judges. Again, the severity of the problem is reflected by the amount of variation in this quantity, which might be measured by its variance. In the presence of disparity, the amount of discrimination, as defined above, is a randomly varying quantity as well. Its severity can be captured using, say, the mean difference between sentences in cases differing only in race, sex, or SES of the defendant. In view of the random nature of these phenomena, the measurement of disparity and discrimination is inherently a statistical problem. Ideally we would like to observe different judges sentencing the same case (to measure disparity between judges), the same judge sentencing a case several times (to measure disparity within judges) and the same judge sentencing the same cases with only the race or sex of the defendant changed (to measure discrimination). Data from such an experiment might be analysis of variance analyzed using the additive mixed model: Yijk = ~ + vi + dj + ck + Silk where Yi,k is in case i,j,k, vi, _, the sentence chosen i = 1, . . . , I, are the effects of the discriminatory factors, dj, j = 1, . . . , J. are the effects of the J judges, ok, k - 1, . . . , K, are the effects of the K basic cases, cijk are the within-judge disparities for each judgment, and ~ is the overall mean sentence. The factors vi and ck are fixed. The factors dj and cijk are assumed to be independent random variables with zero mean and respective variances o] and o2. Our interest focuses on the levels of discrimination vl, . . . , v (33)

116 and the amounts of disparity between judges, al, and within judges, 0£ . 1l For the case of nonexperimental data, the experimenter cannot control the cases that appear in court. Consequently, the cases studied for each level of the discriminatory factor and each judge are different. The model analogous to equation (33) that is appropriate for the analysis of nonexperimental data is Yijk - ~ + Vi + di + cijk + silk (34) where cijk is subscripted by i, j, and k to denote that it differs for each case. The parameters of this model cannot be estimated directly without imposing identifying restrictions on the cijk's. The usual approach is to use some variables like seriousness of the offense, defendant's prior record, etc. to approximate the case effects. If we can write Milk = g(Xijk) + Vijk, where . . . . - - some un known g(~) is a unction that aepenc s on parameters, x is a vector of case attributes, and milk is a disturbance with variance o2, then equation (34) can be expressed as Yijk = ~ + Vi + Pi + g(Xijk) + Vijk + £ijk If the errors vijk have mean zero and are independent of and cijk, then all of parameters of the model are identified except o2 and o2. However, 4ov + off), which provides an upper bound on both an and at, is identified. Unfortunately, due to selection effects and omitted variables, the assumption that E[vijk] = 0 is rarely justified; as a result, it is very difficult, if not impossible, to obtain consistent estimates of the parameters from these observations alone. The problems in dealing with observational data are due primarily to the lack of comparability in the observed cases. This requires that the investigator control for differences in cases using observed attributes of the case. Some of the problems with this approach, such as sample selection, were discussed earlier; others are discussed in Garber et al. (in this volume). A reasonable alternative to using observational data is to create an environment that is similar to the courtroom in its essential features but that enables the investigator to control the cases that are considered. This is the idea behind sentencing experiments. If J judges consider K different cases, each at I levels of

117 the discriminatory variable, then the imposed sentences might be written as Yijk = H* + Vi* + ad* + Ck* + £ijk This model is of the same form as equation (33) and its parameters can easily be estimated. However, since the experimental setting is not identical to the court setting, the parameters, in particular the effects of the discriminatory variable vi*, . . . , vI* and the disparity variances of* and o2*, may not be equal to the corresponding parameters in equation (33). This difficulty is often referred to as the problem of the external validity of the experiment. There are three main challenges to the external validity of a sentencing experiment: Cases may not be complete or real enough to simulate actual cases, judges participating in the experiment may not be representative of judges as a whole, and judges' awareness that they are _ , , participating In an experiment m~gnc cause them to respond differently than they would in a courtroom. These problems and some possible remedies are discussed in turn. In an experimental setting it is impossible to present judges with case information that is identical to the information they would have in a court setting. This is especially true if cases are summarized in terms of a limited number of pertinent variables. If judges do not receive all the information they need, then they may be unwilling to make a decision or, if forced to make a decision, they may use the information available along with their perceptions about the unavailable information. If these perceptions are correlated with the available information, then coefficient estimates may reflect the influence of this imputed information rather than the information that is provided. In particular, vi* and vi, the effects of the discriminatory variable, may not be the same. To minimize this problem, judges should be provided with information that is as close as possible to the information that would be available in a courtroom. Judges might attend selected court hearings or be shown video tapes of actual hearings. Video tapes have been used in a number of experimental studies of jury behavior. For artificially constructed cases (e g., cases that are constructed from actual ones by, say, changing the race of the defendant) actors might be used

118 to act out the hearing. This could be done once and recorded on video tape or it could be done repeatedly with each judge in the experiment acting as the presiding judge. The drawback of this approach is its high cost, both in time and money. Case files are less realistic but also easier and less expensive to use. It is not clear how much is actually lost if a well-designed case file is substituted for an actual or reenacted hearing. Preliminary experiments might be used to determine a reasonable format for presenting cases. Certain control questions designed to determine whether the information presented is adequate might be composed. For example, judges might be asked whether they felt that any additional information that might be available in court would change their decision. Several different questions might be asked, such as "What decision would you make based on the information you have?" and "What decision would you most likely make if you encountered this case in court?" If the judges use the available information to construct subjective probability distributions over the possible values of the unavailable information, then these two questions address different aspects of these distributions. The answer to the second question would be the sentence associated with the mode of the subjective distribution, whereas, under quadratic loss, the first question would be answered with the sentence corresponding to the mean of that distribution. Thus in the presence of incomplete information, the answers to these questions might differ, whereas they would be the same if the necessary information was provided. Different answers can therefore be taken as an indication that the case information was not adequate (Manski and Nagin, 1981, discuss this point in the context of consumer choice surveys). If an experiment is based on a subset of judges, then it is imperative that the subset be representative. In many sentencing experiments the participating judges are volunteers. Even when all judges in a particular jurisdiction participate in an experiment, nonresponse rates are often so high that participation in the experiment has to be viewed as essentially voluntary. As a result, judges who do participate are likely to be more conscious of existing problems and more interested in reducing them than the average judge. An experiment based on such a sample will tend to underestimate the

119 seriousness of these problems. To protect against this kind of bias, an experimental format, such as personal interviews, can be used to minimize the nonresponse rate. Even if the cases presented to the judges are real cases presented in their natural setting, the judges will always be aware of the fact they are participating in an experiment. Their decisions clearly will not have the impact of decisions handed down in court, for a prison sentence handed down in an experiment does not send anyone to prison. As a result, judges may treat a decision in an experiment less seriously than a decision in court. To alleviate this problem, judges must be provided with an incentive to treat experiments with the same importance they would treat an actual case. For example, decisions made by a panel of judges on an actual case might be provided to the presiding judge before a decision is rendered. This is done in the sentencing council experiments discussed in Diamond and zeisel (1975). The most serious problem in sentencing experiments is the evaluative nature of the experiments themselves. Most experiments are designed to collect data on a specific problem, e.g., the extent of discrimination and Disparity in sentencing. This can rarely be concealed from the judges participating in the experiment. As a result, individual judges may try to ensure that they do not deviate too far from perceived norms, thus leading to an underestimate of the severity of the problem under study. This individual sensitivity to evaluation can be reduced by keeping responses anonymous. However, the fact that results of the experiments may be used by critics of the judiciary to support changes in the system may cause judges who want to maintain the status quo to adjust their decisions to reduce the apparent severity of the problems under study. This bias is likely to be particularly severe if the experiment is an unusual event rather than a routine matter. It may be reduced if making decisions on experimental cases is required of all jduges in a jurisdiction on a regular basis. The reaction to the experimenter's intent may also be reduced if the experimenter can deceive the judges as to the purpose of the experiment. This requires a convincing cover story and a carefully designed questionnaire that does not reveal the true purpose of the experiment. Deceptions of this type are often used for similar reasons in psychological experiments,

120 although they raise serious ethical questions (see Rosenthal and Rosnow, 1969). Furthermore, in view of the narrow range of issues considered in most sentencing studies, it is not clear whether these deceptions will succeed. Thus it is unlikely that these biases can be eliminated completely. If it is not possible to prevent the judges from adjusting their answers in an experiment, then it might be possible to control for these adjustments by modeling the process that generates them. Suppose, for example, that experimental cases have been constructed from actual cases by varying, say, the race of the defendant. In this case, using equations (33) and (35), for each k there is one pair i,j (the race of the actual defendant and the judge who heard the case) for which the actual decision is available. For the other i,j pairs only experimental observations exist. Thus for each k, Yijk = ~ + Vi + JO + Ck + Sick for one pair id, and Yijk = H* + Vi* + ad* ~ Ok* + £ijk otherwise. These observations might be combined by assuming that the overall mean sentence and the case effects are the same for the experimental and nonexperimental observations, i.e., p* = ~ and Ck* = Ok, but that the effects of the discriminatory factor and the disparities in the experimental observations have been scaled down by the factors a, 8, and y, respectively. Thus vi = avi, dj* = D6j, and eijk* = Y£ijk, where a, 8, y > 0 (and probably less than one). The observed court cases can then be used to calibrate the experimental responses. This point illustrates one way in which experiments can be used in conjunction with nonexperimental data. Experiments can be used to validate results obtained from nonexperimental data or to provide alternative estimates with different biases. In particular, as noted above, observed court cases provide only an upper bound on the disparity within judges, whereas experiments (before adjustment) tend to underestimate this anantitv. Simultaneous use of experiments and courtroom ~ . . . _ observations can thus provide bounds on the severity ot disparity. Experiments might also be used to deal with the selection problem. Wilkins et al. (1973) and others use experiments to

121 analyze the details of the judges' decision processes, including the variables they use in making decisions and the order in which these variables are considered. Similar experiments could be performed with other members of the criminal justice system, such as the prosecutor. The results might provide information about the factors that contribute to the correlation between the unobserved variables in the different stages of the selection process. This information might help the investigator assess the magnitude of the correlation and determine which, if any, additional variables should be measured. Experiments can be used to address a number of questions that cannot be answered using observational data. For example, judges might be asked to choose both a determinate sentence and a minimum and a maximum sentence for hypothetical cases. Their responses could be used to evaluate the implications of laws on determinate sentencing. Experiments can also provide information about cases that occur too infrequently in court for observational data to provide accurate results. Many studies, for example, have found it impossible to investigate the relationship between sentence and the defendant's sex because the number of women in their samples was negligible. So far our discussion has been concerned with experiments for analyzing the behavior of judges. Other aspects of the criminal justice system can also be analyzed with experiments. For example, experiments could be designed to determine whether prosecutors act in a discriminatory fashion when deciding whether to prosecute a case. Experiments might also be useful aids for constructing models of the plea · ~ Bargaining process. In addition to providing data for analysis, experiments may also have a beneficial side effect, especially if they are conducted on a regular basis. Many judges and other members of the criminal justice system are sensitive to the problems of disparity and discrimination in sentencing. The results of regular controlled experiments might reduce disparity and discrimination by helping judges understand and calibrate their own decisions. The major drawback to experiments is their cost. The problems associated with experimental data may seem easier to solve than the problems of observational data, but the cost of running experiments, both in money and in the demands they place on the judge's time, make it difficult to obtain samples that are large enough to provide very precise estimates of the parameters of interest. Thus it is unlikely that observational data, in which sample sizes are typically

122 large, can be dispensed with entirely. The simultaneous use of both approaches, in which the particular advantages of each approach can be exploited, is an avenue that deserves more attention in future work. C ONCLUSI ONS We argued that the studies of discrimination in case disposition generally suffer from at least one of three major shortcomings: (1) the absence of formal models of the processing decisions in the criminal justice system, (2) failure to consider the sample selection biases that result from the many screening decisions in the criminal justice system, and (3) the use of arbitrary scales for scaling qualitatively different dispositions. Most of our discussion of these problems focused on ways in which they can lead to underestimates of the severity of discrimination in the criminal justice system. Despite these problems, some studies do find evidence of discrimination. However, this should not be interpreted as suggesting that discrimination is actually present. There are many other problems, such as the omission of important variables possibly correlated with race or social status, that can lead to overestimates of the severity of discrimination. Some of these points are discussed in detail in Garber et al. (in this volume). Each of the shortcomings enumerated above is, in principle, remediable. However, correcting them will require a formidable research agenda. Carefully specified models reflecting the essential motivations of the principal actors in the criminal justice system and the dynamics of their interplay are required. Furthermore, the data sets to be considered will have to be carefully chosen and perhaps combined with the results of designed experiments in order to mitigate the effects of sample selection. Novel and complex statistical techniques will be needed for the analysis. While these obstacles are formidable, we see no alternative to addressing these problems. If they continue to be neglected, then the extent of discrimination in the criminal justice system will continue to be mired in uncertainties so great that no generally accepted resolution will ever be reached.

123 APPENDIX Proposition: If x is uniformly distributed then t1 = t2, where t1 - E[xlx + w1 ~ -(I + 6)] - E(xlx + w1 ~ -a) (A-1) t2 - E[xlx + w1 > -(a + 0), x + w2 ~ -(a + 0) ] - E(xlx + w1 ~ -a, x + w2 > -a) (A-2) Proof: Equations (A-1) and (A-2) can be rewritten as t1 = E[xlx > -(a + 8) - wl) - E(xlx > -a - wl) (A-3) t2 = E[xlx > -(a + B) - wl, x > - (a + B) - w2) - E(xlx > -a - wl, x > -a - w2) . (A-4) Let fl(r) ~ P(-W1 = r) and f2(r) _ p[max(-wl,-w2) = r]. Note that given w1 and w2, one of the two conditioning arguments in each of the two terms on the righthand side of equation (A-4) is redundant. Using this and fl(~) and f2(~) to integrate out w1 and w2, equations (A-3) and (A-4) can be written as ¢1 = S{E[XIX > -(a + D) + P] - E(xlx > -a + f)}fl(~)df t2 = S{E[XIX > -(a + $) + P] - E(xlx > -a + P)}f2(~)dT, which implies t1 ~ t2 = S{E[XIX > -(a + 6) + P] - E(xlx > -a + f)}[fl(~) ~ f2(~)]d~ ~ (A-5) Using the fact that if x is uniformly distributed, E(xlx > \) = (a + A)/2, where a is the maximum value x can assume, equation (A-5) implies t1 ~ ¢2 = 1/2J-~[fl(r) - f2(~)]df = 0 , where the second equality follows from the fact that fl(~) and f2(r) are proper probability density functions. This result generalizes trivially if x is multiplied by any scalar in the conditioning arguments in equations (A-1) and (A-2). This establishes the assertion in the text that if x is uniformly distributed and Y1 = Y2 = Y3 then 82 ~ 63

124 NOTES 1. This is because a linear function of constrained to lie between zero and one. is not 2. In the jargon of statistics, neither a2 nor b is identified (assuming, in the case of b, that x contains a constant regressor). This can be seen as follows. Multiply a2, b, and ~ by the same positive constant. Then P(Yi = lax) is unchanged. Hence it is not possible to estimate the levels of both ~ and a2. Instead, a2 is typically set equal to one for estimation purposes and ~ is effectively estimated relative to the arbitrary value assigned to a2. As for b, suppose that x contains a constant regressor. Then if 01, the constant term in the regression, and b are changed by the same amount, b - xi'8, hence F(b - xi'D) remains unchanged. As a result, for estimation purposes, b is typically set equal to zero and the cutoff level is subsumed into the constant. 3. The coefficient of ui in this expression follows from the fact that if E(ylz) is linear in z then y can be expressed as y = nz + [Cov(y,z)/Var(z)](z - nz) + v , where nz _ E(z), V(v) = oy(1 _ p2), p = [Cov(y,z)/ay~z], V(z) - c2z, and V(y) _ o2y . 4. The selection that occurs as a result of the imprisonment decision is somewhat different from other selection mechanisms we have discussed. The imprisonment decision is made by the judge who also determines the length of the sentence. The formal distinction between the imprisonment decision and the determination of the sentence length is thus somewhat artificial. Nevertheless, if the two decisions are viewed as separable, which is implicit in studies that investigate the sentence length for individuals that have been sent to prison, then the appropriate mathematical formulation of this process is the same as the one that would be appropriate if the decisions were made by separate individuals. As a result, the same model applies. 5. We do not distinguish between jury and bench trials. The model could easily be generalized to include this option, but such a generalization would only complicate

125 the discussion without further illuminating the points we wish to make. 6. Another relevant factor is time spent in pretrial detention. Conditions in jail are frequently worse than in prison. If the defendant opts for a trial, the time spent in pretrial detention is likely to be increased. 7. The decision to charge includes the choice of whether to prosecute and the choice of which charges to file given prosecution. We consider only the former choice. 8. Dismissal can occur before or after charges have been filed. We treat dismissals that occur after charges have been filed as decisions not to charge. The term dismissal is restricted to instances in which the prosecutor declines to prosecute after an arrest has been made. 9. The factors giving rise to selection bias involve the stages preceding the sentence length decision and thus are not related to the true extent of discrimination in the sentence length decision of each judge. 10. However, we argue below that this finding may actually be the result of discrimination at the prosecution andVor conviction stage rather than in sentencing. 11. The purpose of introducing this model is merely to fix ideas. The discussion could equally well be based on a more complicated AN OVA model, one in which the effects of the discriminatory factors are viewed as nested within judges, a binary model, a binary plus a conditional continuous model, or an ordered multiple response model. REFERENCES Administrative Office of the United States Courts 1973 Federal Offenders in United States District Court, 1971. Washington, D.C.: Administrative Office of U.S. Courts. Altman, E. I., R. A. Avery, R. A. Eisenbeis, and J. F. Sinkey, Jr. 1981 Application of Classification Techniques in Business, Banking, and Finance. Greenwich, Conn.: JAI Press.

126 Chiricos, T. G., and G. P. Waldo 1975 Socioeconomic status and criminal sentencing: an empirical assessment of a conflict proposition. American Sociological Review 40(December):753-772. Clarke, S. H., and G. G. Koch 1976 The influence of income and other factors on whether criminal defendants go to prison. Law & Society Review 11(1):59-92. Cook, P. J., and D. S. Nagin 1979 Does the Weapon Matter? An Evaluation of a Weapon-Emphasis Policy in the Prosecution of Violent Offenders. Washington, D.C.: Institute for Law and Social Research. Cox, D. R. 1970 Analysis of Binary Data. London: Methuen & Co. Diamond, S. S., and H. Zeisel 1975 Sentencing councils: a study of sentence disparity and its reduction. University of Chicago Law Review 43:109-149. Farrell, R. A., and V. L. Swigert 1978 Prior offense record as a self-fulfilling prophecy. Law and Society 12tSpring):437-453. Forst, B., and K. Brosi 1977 A theoretical and empirical analysis of the prosecutor. Journal of Legal Studies 6(1):177-191. Forst, B., J. Lucianovic, and S. J. Cox 1977 What Happens After Arrest? A Court Perspective of Police Operations in the District of Columbia. Washington, D.C.: Institute for Law and Social Research. Frase, R. S. 1978 The decision to prosecute federal criminal charges: a quantitative study of prosecutorial discretion. University of Chicago Law Review 47:246-330. Gibson, J. L. 1978 Race as a determinant of criminal sentences: a methodological critique and a case study. Law and Society Review 12(Spring):455-478. Goldberger, A. S. 1964 Econometric Theory. Sons. New York: John Wiley & 1980 Abnormal Selection Bias. Unpublished manuscript. University of Wisconsin.

127 Greenwood, P., et al. 1973 Prosecution of Adult Felony Defendants in Los l Angeles County: A Policy Perspective. Santa Monica, Calif.: Rand Corporation. Heckman, J. J. Hagan, J. 1975 Parameters of criminal prosecution: an application of path analysis to a problem of criminal justice. Journal of Criminal Law & Criminology 65(4):536-544. 1979 Sample selection bias as a specification error. Econometrica 47(1):153-161. LaFree, G. D. 1980 The effect of sexual stratification by race on official reactions to rape. American - Sociological Review 45(October):842-854. Landes, W. M. 1971 An economic analysis of the courts. Journal of Law and Economics 14:61-106. Lizotte, A. J. 1977 Extra-legal factors in Chicago's criminal courts: testing the conflict model of crimina 1 justice. Social Problems 25(5):564-580. l Manski, C. F., and D. Se Nagin 1981 Behavioral Intentions and Revealed Preference. Unpublished manuscript. Carnegie-Mellon Un iversity. Olsen, R. 1980 A least squares correction for selectivity bias Econometrica 48:1815-1820. . Partridge, A., and W. G. Eldridge 1974 The second circuit sentencing study: a repor t to the judges of the second circuit. Federal Judicial Center No. 74-4. Reiss, A. J. 1975 Public prosecutors and criminal prosecution in the United States of America. Juridical Review:1-21. Rosenthal, R., and R. L. Rosnow, eds. 1969 Artifacts in Behavioral Research. New York: - Academic Press. Swige' , V. L., and R. A. Farrell 1977 Normal homicides and the law. American Sociological Review 42(February):16-32. Tiffany, L. P., Y. Avichai, and G. W. Peters 1975 A statistical analysis of sentencing in federal courts: defendants convicted after trial, 1967 1968. The Journal of Legal Studies 4:397-417.

128 Wilkins, L. J., D. U. Gottfredson, J. O. Robinson, and C. A. Sadowsky 1973 Information Selection and Use in Parole Decision-Making. NCCD Research Center, National Council on Crime and Delinquency, Davis, Calif. Wolfgang, M. E., and M. Reidel 1973 Race, judicial discretion, and the death penalty. The Annals of the American Academy of Political and Social Science 407(May):119-133. Zimring, F. E., J. Eigen, and S. OtMalley 1976 Punishing homicide in Philadelphia: perspectives on the death penalty. University of Chicago Law Review 43(2):227-252.