Click for next page ( 56


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 55
Discrimination in the Criminal Justice System A Critical Appraisal of the Literature Steven Klepper, Daniel Nagin, and Luke-lon Tierney INTRODUCTION . Discrimination in the criminal justice system is an issue of substantial social concern. The discretionary of the principal actors--the police, prosecutors, judges--are considerable and allow ample latitude unfair treatment of persons of a specific race or powers and for social background. A large empirical literature has emerged concerning the extent of discrimination in the criminal justice process. These studies examine separately or in combination the effect of race or social class on the likelihood of arrest, prosecution, bail, conviction, and the type and severity of sentence. The findings of the studies are by no means consistent. Some find evidence of discrimination while others do not. In this paper we argue that there are major flaws in the literature we have reviewed that limit its usefulness for making inferences about the extent of discrimination in the criminal justice system. We also suggest research strategies to remedy these weaknesses. Our critique and suggestions are prompted by a review of 10 papers, chosen by the panel on the basis of their salience in the literature and their quality, as well as a number of additional papers. While our paper is based on a review of a small sample of studies, we are confident that our conclusions apply generally to the larger literature. First, to some degree our criticisms apply to all of the studies 55

OCR for page 55
56 reviewed, which makes it unlikely that they do not apply to the larger literature. Second, implementation of several of our recommendations requires the use of statistical methods that have only recently been developed and are not yet widely employed. Third, implementation of these statistical procedures requires the use of modeling approaches that have not been widely adopted in the criminological and sociological literature. Our review suggests three major remediable flaws in the literature: (1) The Absence of Formal Models of Processing Decisions in the Criminal Justice System Case dispo- sition--whether it is dismissal, acquittal, conviction, or sentencing--is the consequence of the interplay of a diverse set of actors, each with individual objectives. Even if a disposition does not directly involve one of these actors, expectations about their actions if they were to become involved may affect decisions. For example, a prosecutor may choose to dismiss a case based on the expectation that a judge will do the same if the case is prosecuted. Similarly, a defendant may choose to accept a plea bargain on the basis of an expectation of the likelihood of conviction at a jury trial and the sentence if convicted. In order to model decisions at each stage of the criminal justice system, a theory of the important decision criteria of each of the major actors and their interaction is required. Without such a theory, estimating equations are likely to be misspecified, which in turn is likely to result in serious biases in the estimated effects of included variables and an inability to discern the effects of more subtle influences. The latter is particularly pertinent to measuring the effects of social class and race because their influence, while possibly of sufficient magnitude to warrant concern, are probably less important in affecting disposition than clearly legally relevant variables like case quality and the seriousness of the crime. Perhaps even more important is that without a well-structured theory, inferences about the role of social status and other factors at each processing stage may be extremely misleading. For example, an observation that social status affects sentences in negotiated pleas may not reflect prosecutorial bias but rather the biases of judges or juries. We regard this point as crucial because implementation of policies to rectify any undesirable effects of race and social status on

OCR for page 55
57 disposition requires a knowledge of the stage(s) of the criminal justice system at which these factors are important. (2) Sample Selection Biases Resulting from Screening and Processing Decisions The criminal justice system has been likened to a leaky sieve. ~ A ~ In Washington, D.C., for example, of every 100 felony arrests only 13 result in felony convictions. Of the remaining 87, 16 result in misdemeanor convictions. Nearly all the rest are rejected for further processing at an initial screening or subsequently dismissed by a prosecutor, judge, or grand jury. Of those convicted only about 32 percent are incarcerated (Forst et al., 1977). Thus, cases that reach the sentencing stage are a very select group that typically represent only a small proportion of the population of "similar" cases (e.g., same arrest charges) that originally entered the system. Moreover, even those cases entering the system via an arrest are themselves a selected sample of crimes. In most major metropolitan areas, clearance rates (crimes solved by the police, typically by the arrest of a suspect) hover around 20 percent. This low clearance rate principally reflects the absence of any suspect but is also affected by the exercise of arrest discretion by the police. By the very nature of the system, analyses of the determinants of sentence must be executed on a selected sample of cases, namely those that have resulted in conviction. Since the selection process is by no means random. it mav induce serious biases in parameter estimates of included variables. Such biases may, for example, result in an inappropriate conclusion that racial considerations influence sentencing decisions when in fact they do not. Recently developed econometric procedures can be employed in some circumstances to cope with the biases induced by sample selection. (3) Use of Arbitrary Scales to Measure Qualitatively Different Dispositions dismissal or acquittal. sentences include fines, probation, and prison or some combination of these at a specified amount or duration. Many of the papers we reviewed employ arbitrary rules for measuring these qualitatively different outcomes. The index that results serves as the basis (e.g., the dependent variable in a regression model) of the correlates of "severity of outcome." While the scales that are applied are not patently unreasonable, serious questions remain about the degree to which A case may be disposed by For convictions, possible for an analysis

OCR for page 55
58 findings are simply an artifact of an artificial scale. We are particularly concerned that use of these arbitrary scales may conceal the importance of subtle influences that could be measured if such scales were not used. ORGANI ZATI ON OF THE PARE R While the approaches we suggest for coping with these problems will improve the quality of statistical inference about discrimination, we are under no illusion that their adoption will yield definitive results. The combination of our relative ignorance about the factors determining case-processing decisions and the problems of using nonexperimental data ensure that definitive findings will not be forthcoming soon. In response to the inherent limitation of studies based on nonex- perimental data, we have included a section on the use of experiments to measure discrimination in the criminal justice system. This section discusses the limitation of experiments, approaches for minimizing these limitations, and strategies for combining experimental and nonexperi- mental data. The paper is organized as follows. We begin with a review of statistical issues that arise in the analysis of binary data. Next we discuss the so-called sample selection phenomenon and elaborate on its effects. We then develop a model of the criminal justice system. Next we review selected studies in the context of the sample selection phenomenon and the model developed. We then discuss alternative models of the sentencing decision that do not require the use of arbitrary severity indices. Next we discuss experimental approaches to measuring discrimination. We conclude with a summary of our major points. THE ANALYST S OF BINARY VARIABLES Many decisions in the criminal justice system involve binary outcomes, such as the prosecutor's choice to dismiss or prosecute a case or the jury's decision to find the defendant guilty or innocent. It is common practice to define a binary y such that y = 1 if one outcome (say, a verdict of guilty) occurs and y = 0 otherwise. In a number of the studies we reviewed, the relationship between the likelihood of the event y = 1

OCR for page 55
59 and a vector of variables x is examined by regressing y on x. The purpose of this section is to point out some hazards of this approach and to describe an alternative approach that we employ in subsequent sections. We begin with a discussion of the classical regression model. The model assumes that a random variable y can be related to a vector of variables x by Yi = Xi ~ + hi i = 1, . . . , N , (1) where column vectors are underlined, ~ is the K x 1 vector of regressors for the ith observation in a sample of size N. ~ is a K x 1 vector of unknown parameters, and si is the disturbance or error associated with the ith observation. The errors c1, . . . ~ eN are assumed to be independent with zero mean and common variance o2. The regressors ~ , . . . , ~ are often assumed to be nonstochastic, although they may also be assumed to be random variables that are independent of the errors ' e ~ ~ ~ N. These assumptions imply that the conditional distribution of y given x is such that E(yil_i) = Xi $_ and V(yilxi) = o2 . (2) (3) Equations (2) and (3) state that for each xi, the distribution Of Yi given xi is such that E(yilxi) is linear in xi and V(yilxi) is constant for all i. Under these assumptions, it is well known that ordinary least squares provide consistent and unbiased estimates of the coefficient vector $. The assumptions of the classical regression model are appropriate in cases in which the dependent variable has a large, approximately continuous range of possible values. However, in the case of a binary variable, many ~ . . . . . . _ . ~ IS a binary variable that takes on only the values of zero and one. Let p(x) equal the probability that y = 1 given x. Then it is easy to demonstrate that or tne assumptions are no longer tenable. suppose Y E(yilXi) = P(Xi) and (4)

OCR for page 55
60 Var(yil xi) = p(xi) [1 - p(Xi)] - (5) Equation (4) indicates that the conditional expectation of y given x is equal to the conditional probability that y equals one given x. If the range of the observed x values is very small, then it may be appropriate to approximate p(x) by x'0. In this case, ordinary least squares will consistently estimate $. (5) indicates that V(yilxi) is not the same for se' i. This implies that ordinary least-squares estimates are inefficient and the standard hypothesis tests are invalid. These problems can be corrected by using standard techniques for adjusting for heteroskedas- ticity. If, however, the range of the x values is large, then the fact that p(~ can take on only values between zero and one (it is a probability) implies that it cannot be approximated by a linear function of x.1 In this case ordinary least-squares estimates are consistent, and the inconsistency may be very severe. To illustrate this point, consider the linear probability model: However, equation _'D if O ~ x'D < 1_ _ _ _ p(x) = 0 if x'0 ~ 0 1 if x'D > 1 . (6) Figure 2-1 displays the form of p(x) for the case in which x is a scalar. Also shown is a set of hypothetical observations that might arise in which a number of the x-values fall outside the range where p(~ is linearly increasing. The dashed line depicts the model that would be estimated by ordinary least squares. By choosing enough observations with very high or very low x-values, the slope of the model estimated by ordinary least squares can be made arbitrarily small. This difficulty can be avoided by fitting the linear probability model specified in equation (6) using nonlinear least squares. However, this introduces severe computational problems. Therefore, it is useful to consider alternative models. Most models for binary variables that have been proposed in the statistical literature can be written as p(x) = F(x't) , where F is a continuous, nondecreasing function with F(-~) = 0 and F (be) = 1, i.e., F is a continuous distribution function. The choice of a particular form (7)

OCR for page 55
- ~ - / 61 it_ The Linear Probability Model ----OLS Regression Line x'; Observed Data Point FIGURE 2-1 The Bias in Ordinary Least-Squares Estimation for F is usually somewhat arbitrary and should be taken in the same spirit as the assumptions of linearity and normally distributed errors in simple regression models. The most popular models of this form are the linear probability model discussed above, which is obtained by setting o F(z) = z if z < 0 if 0 < z < 1 if z > 1 , the PROBIT model, where F(z) is set equal to the cumulative standard normal distribution function, and the LOGIT model, with F(z) = eZ/(l + eZ). Models of this form can be motivated in a number of ways. We employ one such motivation repeatedly in the following sections. Let y* represent a latent, unobserved variable that can vary between plus and minus infinity. The latent variable y* is assumed to be related to x by the standard regression function Yi* = Xi '0 + Pi ~ (8) where ci is an unobserved disturbance with mean zero and constant variance o2. The dichotomous variable y is then assumed to be related to y* by Yi = 1 if Yi* > b 0 if Yi* < b , (9)

OCR for page 55
62 where b is an unknown cutoff level. this implies P(Yi = Taxi) P(Yi = 0lxi) Using equation (8), P(_i'6 + ~ i > b) p ( i > b - _i'6) ~ 1 - p( i > b - Xi 0) If F(.) is the~distribution function of c, then this can be alternatively stated as P (Yi = 1' Xi) _ p (Xi) = F (b - Xi ~ ) P(Yi = 0Ixi) _ 1 - P(_i) = 1 - F(b - Xi B) (10a) . (lob) Note that equation (10) is in essentially the same form as equation (7). The unknown parameters of the model are the vector of coefficients 6, the variance o2, and the cutoff level b. It can be shown that neither o2 nor b can be estimated because they are not uniquely defined.2 But the coefficient vector ~ can be estimated directly from a sample of observations on the binary variable y and x. One widely employed estimation procedure is called maximum likelihood estimation. It has a number of desirable features for cases in which a relatively large sample of observations is available on (y, x). For the LOGIT and PROBIT models, specially designed computer algorithms for calculating the maximum likelihood estimator and its estimated standard errors in large samples are available. For a more complete discussion of estimation and other issues in binary variable models, see Goldberger (1964:248-251) and Cox (1970). To get a better idea of how this approach can be used to model events that occur in the criminal justice system, consider the example of a jury determining whether a defendant is guiltye The jury hears the evidence, which can be summarized in terms of various attributes of the case, such as the number of eyewitnesses, whether a weapon belonging to the defendant was recovered, etc. Suppose that the investigator can observe some of these attributes, perhaps from court records, and can quantify them in terms of a numerical vector x - (xl, x2, , xK)'. Other attributes of the case, such as the credibility of the witnesses, are not recorded in court records and hence cannot be observed by the investigator. Let their composite influence be represented by s. The jury then might be viewed as computing an index y* = x'D + ~ measuring the

OCR for page 55
63 strength of the case against the defendant. The observable factors xl, x2, . . . , xK are given weights of 01, 62, . . . ~ ~K, respectively relative to the weight assigned to s. The jury then determines whether to convict the defendant on the strength of the evidence, . measured by y*, by comparing y* to a level b and Declaring the defendant guilty if y* > b and not guilty otherwise. The critical level b is presumed to be determined according to the interpretation of the notion "beyond a reasonable doubt." The statistical problem is to determine the factors the jury takes into account and their relative importance. Among other matters, the investigator might be interested in testing whether juries discriminate against certain types of defendants, in which case the personal characteristics of the defendant might be included in x. The problem facing the investigator is that he or she observes the vector of case attributes x and whether the defendant is convicted, but not y* and c. To estimate ~ using this information, the jury's decision process can be modeled as Ii = 1 if Yi* = Xi ~ + hi _ b 0 if Yi* = Xi ~ + ~ ~ b ~ where I is a binary variable that equals one for conviction and zero otherwise. The vector of coefficients ~ are the parameters of interest. This is in precisely the same form as equations (8) and (9). Hence the weights 61, 62, . . . ~ OK can be estimated directly using the approach discussed above. A similar approach is taken in the subsequent sections to model other decisions in the criminal justice system that involve binary outcomes. ~ . ~ [~ . . . SELECTION Selection Bias The criminal justice process can be thought of as a series of stages, each involving a different set of actors. The first stage involves the detection of a crime, followed by communication of the crime to the police, arrest, prosecution, trial, and sentencing. The literature indicates that the various actors involved at each stage make calculated decisions about the types of

OCR for page 55
64 crimes that are processed to the next stage. For example, studies of the prosecutor indicate that less serious crimes and those with weak evidence are more likely to be dismissed following arrest. These same two characteristics appear to influence the decision by the police to make an arrest, while the quality of the evidence certainly affects the likelihood that a jury will render a verdict of guilty and pass a case on to the sentencing stage. Other factors, such as the prior record and socioeconomic status of the criminal, also appear to play a role in some of the stages. As a result of deliberate actions of the various actors in the system, the crimes that reach each successive stage in the system after the first are not representative of the broader population of crimes. Samples used to study the various stages in the system are thus selected according to certain characteristics. This does not itself pose a problem for the investigator. A potential problem does arise, however, from the combination of the sample selection process and the fact that some of the features of a case that affect the way it is processed cannot be observed by the investigator. For example, prosecutors and judges may possess a great deal of qualitative evidence about a case that the investigator cannot observe from court records. In other instances, the investigator may not observe other, less qualitative types of evidence, such as whether the criminal used a weapon. The combination of screening and incomplete measurement implies that criminals reaching the later processing stages are not representative of the unobservable (to the investigator) as well as the observable features of the population of cases entering the system. This introduces the possibility of sample selection bias. The type of biases that may arise can be illustrated best with an example. Consider the sentencing of convicted criminals. Suppose that the various actors in the system discriminate against individuals with low socioeconomic status (SES) as well as individuals committing more serious crimes. (The latter form of "discrimination" may be socially desirable.) Then consider high-SES individuals who are convicted of a crime. Holding the effect of factors that are observable to the investigator constant, such individuals would ordinarily have a lower probability of reaching the sentencing stage (given the hypothetical assumption of

OCR for page 55
65 discrimination). If they have been convicted, then, holding constant the effect of the factors observable to the investigator, they must be unrepresentative concerning the factors unobservable to the investigator that contribute to reaching the sentencing stage. For example, they may have exhibited a greater degree of premeditation than low-SES individuals who have been convicted, or a greater fraction of them may have used a weapon than low-SES convicted criminals. (The degree of premeditation and weapon use are assumed to be unobservable to the investigator.) This may cause problems when the investigator tries to determine the factors influencing the sentencing decision. Suppose that SES does not affect sentencing, but seriousness of the crime does. By the above argument, if discrimination exists against low-SES individuals at the earlier stages of the criminal justice system, then, ceteris paribus, high-SES convicted criminals will be above average on both the observable and unobservable dimensions of the seriousness of a crime. Judges are assumed to observe both sets of factors and to take both into account when deciding on a sentence. The investigator, however, can observe only one set of dimensions. Even after taking account of the observable differences in the cases of highand low-SES criminals, the investigator will still find that high-SES criminals receive longer sentences. This will suggest that judges discriminate against high-SES individuals, even though there is no discrimination at the sentencing stage and there is discrimination against low-SES individuals at the stages preceding sentencing. More generally, this example points out that if there does exist discrimination against low-SES individuals at the sentencing stage, then the biases induced by sample selection might mask the true extent of the discrimination. It is conceivable that the biases might even create the illusion of reverse discrimination at the sentencing stage, when in reality discrimination against low-SES individuals is present at all stages. The biases induced by sample selection are of course more general than the example--they might occur at any stage in the system following the first screening stage, and they might distort the effect of any of the observable features of a crime. This suggests that it is essential to try to account for the effects of sample selection in order to make reliable inferences about the various processing stages in the criminal justice system.

OCR for page 55
118 to act out the hearing. This could be done once and recorded on video tape or it could be done repeatedly with each judge in the experiment acting as the presiding judge. The drawback of this approach is its high cost, both in time and money. Case files are less realistic but also easier and less expensive to use. It is not clear how much is actually lost if a well-designed case file is substituted for an actual or reenacted hearing. Preliminary experiments might be used to determine a reasonable format for presenting cases. Certain control questions designed to determine whether the information presented is adequate might be composed. For example, judges might be asked whether they felt that any additional information that might be available in court would change their decision. Several different questions might be asked, such as "What decision would you make based on the information you have?" and "What decision would you most likely make if you encountered this case in court?" If the judges use the available information to construct subjective probability distributions over the possible values of the unavailable information, then these two questions address different aspects of these distributions. The answer to the second question would be the sentence associated with the mode of the subjective distribution, whereas, under quadratic loss, the first question would be answered with the sentence corresponding to the mean of that distribution. Thus in the presence of incomplete information, the answers to these questions might differ, whereas they would be the same if the necessary information was provided. Different answers can therefore be taken as an indication that the case information was not adequate (Manski and Nagin, 1981, discuss this point in the context of consumer choice surveys). If an experiment is based on a subset of judges, then it is imperative that the subset be representative. In many sentencing experiments the participating judges are volunteers. Even when all judges in a particular jurisdiction participate in an experiment, nonresponse rates are often so high that participation in the experiment has to be viewed as essentially voluntary. As a result, judges who do participate are likely to be more conscious of existing problems and more interested in reducing them than the average judge. An experiment based on such a sample will tend to underestimate the

OCR for page 55
119 seriousness of these problems. To protect against this kind of bias, an experimental format, such as personal interviews, can be used to minimize the nonresponse rate. Even if the cases presented to the judges are real cases presented in their natural setting, the judges will always be aware of the fact they are participating in an experiment. Their decisions clearly will not have the impact of decisions handed down in court, for a prison sentence handed down in an experiment does not send anyone to prison. As a result, judges may treat a decision in an experiment less seriously than a decision in court. To alleviate this problem, judges must be provided with an incentive to treat experiments with the same importance they would treat an actual case. For example, decisions made by a panel of judges on an actual case might be provided to the presiding judge before a decision is rendered. This is done in the sentencing council experiments discussed in Diamond and zeisel (1975). The most serious problem in sentencing experiments is the evaluative nature of the experiments themselves. Most experiments are designed to collect data on a specific problem, e.g., the extent of discrimination and Disparity in sentencing. This can rarely be concealed from the judges participating in the experiment. As a result, individual judges may try to ensure that they do not deviate too far from perceived norms, thus leading to an underestimate of the severity of the problem under study. This individual sensitivity to evaluation can be reduced by keeping responses anonymous. However, the fact that results of the experiments may be used by critics of the judiciary to support changes in the system may cause judges who want to maintain the status quo to adjust their decisions to reduce the apparent severity of the problems under study. This bias is likely to be particularly severe if the experiment is an unusual event rather than a routine matter. It may be reduced if making decisions on experimental cases is required of all jduges in a jurisdiction on a regular basis. The reaction to the experimenter's intent may also be reduced if the experimenter can deceive the judges as to the purpose of the experiment. This requires a convincing cover story and a carefully designed questionnaire that does not reveal the true purpose of the experiment. Deceptions of this type are often used for similar reasons in psychological experiments,

OCR for page 55
120 although they raise serious ethical questions (see Rosenthal and Rosnow, 1969). Furthermore, in view of the narrow range of issues considered in most sentencing studies, it is not clear whether these deceptions will succeed. Thus it is unlikely that these biases can be eliminated completely. If it is not possible to prevent the judges from adjusting their answers in an experiment, then it might be possible to control for these adjustments by modeling the process that generates them. Suppose, for example, that experimental cases have been constructed from actual cases by varying, say, the race of the defendant. In this case, using equations (33) and (35), for each k there is one pair i,j (the race of the actual defendant and the judge who heard the case) for which the actual decision is available. For the other i,j pairs only experimental observations exist. Thus for each k, Yijk = ~ + Vi + JO + Ck + Sick for one pair id, and Yijk = H* + Vi* + ad* ~ Ok* + ijk otherwise. These observations might be combined by assuming that the overall mean sentence and the case effects are the same for the experimental and nonexperimental observations, i.e., p* = ~ and Ck* = Ok, but that the effects of the discriminatory factor and the disparities in the experimental observations have been scaled down by the factors a, 8, and y, respectively. Thus vi = avi, dj* = D6j, and eijk* = Yijk, where a, 8, y > 0 (and probably less than one). The observed court cases can then be used to calibrate the experimental responses. This point illustrates one way in which experiments can be used in conjunction with nonexperimental data. Experiments can be used to validate results obtained from nonexperimental data or to provide alternative estimates with different biases. In particular, as noted above, observed court cases provide only an upper bound on the disparity within judges, whereas experiments (before adjustment) tend to underestimate this anantitv. Simultaneous use of experiments and courtroom ~ . . . _ observations can thus provide bounds on the severity ot disparity. Experiments might also be used to deal with the selection problem. Wilkins et al. (1973) and others use experiments to

OCR for page 55
121 analyze the details of the judges' decision processes, including the variables they use in making decisions and the order in which these variables are considered. Similar experiments could be performed with other members of the criminal justice system, such as the prosecutor. The results might provide information about the factors that contribute to the correlation between the unobserved variables in the different stages of the selection process. This information might help the investigator assess the magnitude of the correlation and determine which, if any, additional variables should be measured. Experiments can be used to address a number of questions that cannot be answered using observational data. For example, judges might be asked to choose both a determinate sentence and a minimum and a maximum sentence for hypothetical cases. Their responses could be used to evaluate the implications of laws on determinate sentencing. Experiments can also provide information about cases that occur too infrequently in court for observational data to provide accurate results. Many studies, for example, have found it impossible to investigate the relationship between sentence and the defendant's sex because the number of women in their samples was negligible. So far our discussion has been concerned with experiments for analyzing the behavior of judges. Other aspects of the criminal justice system can also be analyzed with experiments. For example, experiments could be designed to determine whether prosecutors act in a discriminatory fashion when deciding whether to prosecute a case. Experiments might also be useful aids for constructing models of the plea ~ Bargaining process. In addition to providing data for analysis, experiments may also have a beneficial side effect, especially if they are conducted on a regular basis. Many judges and other members of the criminal justice system are sensitive to the problems of disparity and discrimination in sentencing. The results of regular controlled experiments might reduce disparity and discrimination by helping judges understand and calibrate their own decisions. The major drawback to experiments is their cost. The problems associated with experimental data may seem easier to solve than the problems of observational data, but the cost of running experiments, both in money and in the demands they place on the judge's time, make it difficult to obtain samples that are large enough to provide very precise estimates of the parameters of interest. Thus it is unlikely that observational data, in which sample sizes are typically

OCR for page 55
122 large, can be dispensed with entirely. The simultaneous use of both approaches, in which the particular advantages of each approach can be exploited, is an avenue that deserves more attention in future work. C ONCLUSI ONS We argued that the studies of discrimination in case disposition generally suffer from at least one of three major shortcomings: (1) the absence of formal models of the processing decisions in the criminal justice system, (2) failure to consider the sample selection biases that result from the many screening decisions in the criminal justice system, and (3) the use of arbitrary scales for scaling qualitatively different dispositions. Most of our discussion of these problems focused on ways in which they can lead to underestimates of the severity of discrimination in the criminal justice system. Despite these problems, some studies do find evidence of discrimination. However, this should not be interpreted as suggesting that discrimination is actually present. There are many other problems, such as the omission of important variables possibly correlated with race or social status, that can lead to overestimates of the severity of discrimination. Some of these points are discussed in detail in Garber et al. (in this volume). Each of the shortcomings enumerated above is, in principle, remediable. However, correcting them will require a formidable research agenda. Carefully specified models reflecting the essential motivations of the principal actors in the criminal justice system and the dynamics of their interplay are required. Furthermore, the data sets to be considered will have to be carefully chosen and perhaps combined with the results of designed experiments in order to mitigate the effects of sample selection. Novel and complex statistical techniques will be needed for the analysis. While these obstacles are formidable, we see no alternative to addressing these problems. If they continue to be neglected, then the extent of discrimination in the criminal justice system will continue to be mired in uncertainties so great that no generally accepted resolution will ever be reached.

OCR for page 55
123 APPENDIX Proposition: If x is uniformly distributed then t1 = t2, where t1 - E[xlx + w1 ~ -(I + 6)] - E(xlx + w1 ~ -a) (A-1) t2 - E[xlx + w1 > -(a + 0), x + w2 ~ -(a + 0) ] - E(xlx + w1 ~ -a, x + w2 > -a) (A-2) Proof: Equations (A-1) and (A-2) can be rewritten as t1 = E[xlx > -(a + 8) - wl) - E(xlx > -a - wl) (A-3) t2 = E[xlx > -(a + B) - wl, x > - (a + B) - w2) - E(xlx > -a - wl, x > -a - w2) . (A-4) Let fl(r) ~ P(-W1 = r) and f2(r) _ p[max(-wl,-w2) = r]. Note that given w1 and w2, one of the two conditioning arguments in each of the two terms on the righthand side of equation (A-4) is redundant. Using this and fl(~) and f2(~) to integrate out w1 and w2, equations (A-3) and (A-4) can be written as 1 = S{E[XIX > -(a + D) + P] - E(xlx > -a + f)}fl(~)df t2 = S{E[XIX > -(a + $) + P] - E(xlx > -a + P)}f2(~)dT, which implies t1 ~ t2 = S{E[XIX > -(a + 6) + P] - E(xlx > -a + f)}[fl(~) ~ f2(~)]d~ ~ (A-5) Using the fact that if x is uniformly distributed, E(xlx > \) = (a + A)/2, where a is the maximum value x can assume, equation (A-5) implies t1 ~ 2 = 1/2J-~[fl(r) - f2(~)]df = 0 , where the second equality follows from the fact that fl(~) and f2(r) are proper probability density functions. This result generalizes trivially if x is multiplied by any scalar in the conditioning arguments in equations (A-1) and (A-2). This establishes the assertion in the text that if x is uniformly distributed and Y1 = Y2 = Y3 then 82 ~ 63

OCR for page 55
124 NOTES 1. This is because a linear function of constrained to lie between zero and one. is not 2. In the jargon of statistics, neither a2 nor b is identified (assuming, in the case of b, that x contains a constant regressor). This can be seen as follows. Multiply a2, b, and ~ by the same positive constant. Then P(Yi = lax) is unchanged. Hence it is not possible to estimate the levels of both ~ and a2. Instead, a2 is typically set equal to one for estimation purposes and ~ is effectively estimated relative to the arbitrary value assigned to a2. As for b, suppose that x contains a constant regressor. Then if 01, the constant term in the regression, and b are changed by the same amount, b - xi'8, hence F(b - xi'D) remains unchanged. As a result, for estimation purposes, b is typically set equal to zero and the cutoff level is subsumed into the constant. 3. The coefficient of ui in this expression follows from the fact that if E(ylz) is linear in z then y can be expressed as y = nz + [Cov(y,z)/Var(z)](z - nz) + v , where nz _ E(z), V(v) = oy(1 _ p2), p = [Cov(y,z)/ay~z], V(z) - c2z, and V(y) _ o2y . 4. The selection that occurs as a result of the imprisonment decision is somewhat different from other selection mechanisms we have discussed. The imprisonment decision is made by the judge who also determines the length of the sentence. The formal distinction between the imprisonment decision and the determination of the sentence length is thus somewhat artificial. Nevertheless, if the two decisions are viewed as separable, which is implicit in studies that investigate the sentence length for individuals that have been sent to prison, then the appropriate mathematical formulation of this process is the same as the one that would be appropriate if the decisions were made by separate individuals. As a result, the same model applies. 5. We do not distinguish between jury and bench trials. The model could easily be generalized to include this option, but such a generalization would only complicate

OCR for page 55
125 the discussion without further illuminating the points we wish to make. 6. Another relevant factor is time spent in pretrial detention. Conditions in jail are frequently worse than in prison. If the defendant opts for a trial, the time spent in pretrial detention is likely to be increased. 7. The decision to charge includes the choice of whether to prosecute and the choice of which charges to file given prosecution. We consider only the former choice. 8. Dismissal can occur before or after charges have been filed. We treat dismissals that occur after charges have been filed as decisions not to charge. The term dismissal is restricted to instances in which the prosecutor declines to prosecute after an arrest has been made. 9. The factors giving rise to selection bias involve the stages preceding the sentence length decision and thus are not related to the true extent of discrimination in the sentence length decision of each judge. 10. However, we argue below that this finding may actually be the result of discrimination at the prosecution andVor conviction stage rather than in sentencing. 11. The purpose of introducing this model is merely to fix ideas. The discussion could equally well be based on a more complicated AN OVA model, one in which the effects of the discriminatory factors are viewed as nested within judges, a binary model, a binary plus a conditional continuous model, or an ordered multiple response model. REFERENCES Administrative Office of the United States Courts 1973 Federal Offenders in United States District Court, 1971. Washington, D.C.: Administrative Office of U.S. Courts. Altman, E. I., R. A. Avery, R. A. Eisenbeis, and J. F. Sinkey, Jr. 1981 Application of Classification Techniques in Business, Banking, and Finance. Greenwich, Conn.: JAI Press.

OCR for page 55
126 Chiricos, T. G., and G. P. Waldo 1975 Socioeconomic status and criminal sentencing: an empirical assessment of a conflict proposition. American Sociological Review 40(December):753-772. Clarke, S. H., and G. G. Koch 1976 The influence of income and other factors on whether criminal defendants go to prison. Law & Society Review 11(1):59-92. Cook, P. J., and D. S. Nagin 1979 Does the Weapon Matter? An Evaluation of a Weapon-Emphasis Policy in the Prosecution of Violent Offenders. Washington, D.C.: Institute for Law and Social Research. Cox, D. R. 1970 Analysis of Binary Data. London: Methuen & Co. Diamond, S. S., and H. Zeisel 1975 Sentencing councils: a study of sentence disparity and its reduction. University of Chicago Law Review 43:109-149. Farrell, R. A., and V. L. Swigert 1978 Prior offense record as a self-fulfilling prophecy. Law and Society 12tSpring):437-453. Forst, B., and K. Brosi 1977 A theoretical and empirical analysis of the prosecutor. Journal of Legal Studies 6(1):177-191. Forst, B., J. Lucianovic, and S. J. Cox 1977 What Happens After Arrest? A Court Perspective of Police Operations in the District of Columbia. Washington, D.C.: Institute for Law and Social Research. Frase, R. S. 1978 The decision to prosecute federal criminal charges: a quantitative study of prosecutorial discretion. University of Chicago Law Review 47:246-330. Gibson, J. L. 1978 Race as a determinant of criminal sentences: a methodological critique and a case study. Law and Society Review 12(Spring):455-478. Goldberger, A. S. 1964 Econometric Theory. Sons. New York: John Wiley & 1980 Abnormal Selection Bias. Unpublished manuscript. University of Wisconsin.

OCR for page 55
127 Greenwood, P., et al. 1973 Prosecution of Adult Felony Defendants in Los l Angeles County: A Policy Perspective. Santa Monica, Calif.: Rand Corporation. Heckman, J. J. Hagan, J. 1975 Parameters of criminal prosecution: an application of path analysis to a problem of criminal justice. Journal of Criminal Law & Criminology 65(4):536-544. 1979 Sample selection bias as a specification error. Econometrica 47(1):153-161. LaFree, G. D. 1980 The effect of sexual stratification by race on official reactions to rape. American - Sociological Review 45(October):842-854. Landes, W. M. 1971 An economic analysis of the courts. Journal of Law and Economics 14:61-106. Lizotte, A. J. 1977 Extra-legal factors in Chicago's criminal courts: testing the conflict model of crimina 1 justice. Social Problems 25(5):564-580. l Manski, C. F., and D. Se Nagin 1981 Behavioral Intentions and Revealed Preference. Unpublished manuscript. Carnegie-Mellon Un iversity. Olsen, R. 1980 A least squares correction for selectivity bias Econometrica 48:1815-1820. . Partridge, A., and W. G. Eldridge 1974 The second circuit sentencing study: a repor t to the judges of the second circuit. Federal Judicial Center No. 74-4. Reiss, A. J. 1975 Public prosecutors and criminal prosecution in the United States of America. Juridical Review:1-21. Rosenthal, R., and R. L. Rosnow, eds. 1969 Artifacts in Behavioral Research. New York: - Academic Press. Swige' , V. L., and R. A. Farrell 1977 Normal homicides and the law. American Sociological Review 42(February):16-32. Tiffany, L. P., Y. Avichai, and G. W. Peters 1975 A statistical analysis of sentencing in federal courts: defendants convicted after trial, 1967 1968. The Journal of Legal Studies 4:397-417.

OCR for page 55
128 Wilkins, L. J., D. U. Gottfredson, J. O. Robinson, and C. A. Sadowsky 1973 Information Selection and Use in Parole Decision-Making. NCCD Research Center, National Council on Crime and Delinquency, Davis, Calif. Wolfgang, M. E., and M. Reidel 1973 Race, judicial discretion, and the death penalty. The Annals of the American Academy of Political and Social Science 407(May):119-133. Zimring, F. E., J. Eigen, and S. OtMalley 1976 Punishing homicide in Philadelphia: perspectives on the death penalty. University of Chicago Law Review 43(2):227-252.