Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4 Empirically Based Sentencing Guidelines and Ethical Considerations Franklin M. Fisher and Joseph B. Kadane INTRODUCTION The U.S. Parole Board initiated the study of empirically based guidelines to describe the decision rules it had been using implicitly. The board's purpose was to inform itself about the pattern of its own decisions. As a purely descriptive device, such a study has no ethical implications. Later the research emphasis shifted from parole to sentencing and to a more normative focus on what decisions should be. Nonetheless, the technology involved in developing empirically based guidelines still bears a strong resemblance to the analysis of parole decisions. Ethical considerations in particular are avoided in these analyses. This paper examines the philosophy of empirically based sentencing guidelines. The strong basic philosophy we pursue is to follow an empirically based mode as far as we can, not because we are particularly attracted to the conservatism inherent in this line (whatever was done in the past must have been just, even if we cannot explain it), but because we find that surprisingly quickly our thoughts lead us to require new ethical judgments. Thus, in particular, we find that even when empirically based guidelines are expected to do no more than reduce sentence disparity, some ethical judgment is required. If past decisions may have involved ethically irrelevant factors such as race, the purging of those factors, while possible, requires more than the judgment 184

185 that they should be purged. Further ethical judgments are necessarily involved. THE SIMPIEST CASE: NO ETHICALLY IRRELEVANT VARIABLES Consider first the simplest case, in which sentences have in the past depended on a set of independent variables, all of which are believed to be ethically appropriate. Thus, for example, variables such as those describing seriousness of offense are appropriate in sentencing; variables such as race are not. We can represent this situation by the following equation: S = ~ + ~ + c, (1) where S is sentence length; R is a set of ethically relevant variables; a is a set of unknown slope parameters; ~ is an unknown constant term; and ~ is a random disturbance. (For ease of exposition, we deal for the present with the linear case only and restrict attention to sentence length as the variable to be determined.!) In this situation, if we suppose that the decisions of the past were ethically acceptable on the average, the justification for guidelines becomes the presence of the random disturbance, c. That disturbance may involve factors affecting particular judges on particular days, or it may involve the factors peculiar to individual cases that lead judges to sentence differently. There is an apparent tension here as to whether it is desirable that equation (1) fit the data well or badly. If the equation fits badly, then apparently it will provide only an uncertain guide as to what past practice actually was. If the equation fits well, then the influence of the random term, £, will be small and there will be little disparity to reduce. In fact this apparent tension is not real, because there is a difference between how well the model fits and how closely the parameters ~ and a are estimated. With large enough sample sizes or enough variation in the underlying data, it is quite possible to estimate a and ~ with considerable precision while still having a large unexplained variance. In that case we could estimate average past behavior quite accurately but there would be considerable disparity in the sense of scatter

186 around such average behavior. Note that this makes it particularly important not to use overall measures of goodness-of-fit such as R2 as the sole or principal measures with which to assess the model. What really matters are the standard errors of the estimated parameters.2 If the parameters and thus past average behavior can be reliably estimated but there is considerable variation around that behavior, it may appear desirable to reduce that variation. This is the basic rationale for empirically derived guidelines. It rests on the view that judges were correct in the past on the average but that judges themselves or society would wish to reduce the extent of individual variation around those averages. If the model has been correctly specified so that all the important variables affecting the sentencing decision have been included, and if all these variables are ethically relevant ones, this may be an appealing view, provided disparity is high. While some room for individual factors and individual judgment will always be necessary, it may seem reasonable to require judges explicitly to justify any large departures from the systematic collective wisdom. In the context of this model, this is easy to do in principle. The process of estimating equation (1) will also estimate o2, the variance of c. We denote that estimate by o*2. Now choose a constant, k. Judges will be required to write an explicit justification of their actions whenever their sentence does not lie within ko* of the estimated average sentence for the particular value of R present in the case decided. The predicted sentence is S* = 6* + Ra* , where asterisks denote estimates. How should k be chosen? Given the distribution of £ (which can be approximated from the data), a choice of k is equivalent in the above procedure to requiring that judges write explicit justifications for cases that fall farther away from the average sentence than some stated fraction (e.g., 90 percent) of cases would have done in the past. What fraction should be chosen depends on the extent to which one wishes to reduce disparity in this way. While such a choice depends in part on what one sees as the source of past disparity, it is also an ethical choice.

187 This is perhaps seen most easily by considering the following. There is no intrinsic reason why upward departures from average sentencing behavior (harsh sentences) and downward departures (lenient sentences) should be treated identically. One might, depending on one's ethical views, choose different values of k, say k1 and k2, for the two different types of departures, using a smaller value when departures are considered worse. Plainly, the choice of such values depends on ethical considerations; those considerations cannot be avoided by restricting the choice to k1 = k2 and treating both kinds of departures symmetrically. Before moving into more complicated cases, one point is worth making. Using models in this way requires that the model be either correct or a close approximation. (It also requires that it be estimated using the best available practice.) If, in particular, variables are wrongly omitted from equation (1) that are correlated with those included, the estimated effects will be wrong and the guidelines misleading. This will be particularly important if the omitted variables are ethically irrelevant. To take a leading example, suppose that the true model is not equation (1) but rather S = ~ + ~ ~ ID + £, (2) where I is a single ethically irrelevant variable that, for purposes of focusing discussion, we will take to be a dichotomous variable indicating race (with I = 0 for blacks and I = 1 for whites). Suppose also that among . the variables in R are one or more that are correlated with race. To fix ideas, suppose the variable in question is a measure of prior record. Then mistaken estimation of equation (1) instead of equation (2) when race has actually mattered directly in the past will lead to erroneous estimation of a. Furthermore, the derived guidelines will build in the ethically irrelevant effect of race by giving (in the simplest case) an inappropriate coefficient to prior record (among other things). In other words, such misspecification will lead to those with longer prior records being given long sentences not simply because of the effect of prior record in judicial decisions but also because those with longer prior records tend to be black. Past racism will be incorporated in the guidelines and the resulting coefficients will be biased in more than one sense.

188 Other misspecifications will lead to a number of less dramatic results. In the limiting case in which the omitted variables are not correlated with any of the included ones, such omission will not lead to biased estimates of the parameters that describe average behavior. It will, however, lead to inefficient esti- mates of those parameters. In addition, the effects of such omitted variables will be attributed to disparity, whereas they may represent not random occurrences but precisely those explicable case-by-case variations that one would not wish to reduce. Plainly, correct specification is very important. Whether we know enough to achieve it is a separate question. THE PRE5E=E OF A SINGLE ETHICALLY IRRELEVANT VARIABLE: THE LINEAR CASE We now face directly the question of what to do in the situation of equation (2), in which an ethically irrele- vant variable such as race has influenced past deci- sions. (For ease of exposition we begin with the case of only one such variable, treating the more complex case below.) We have already seen how not to treat such a case--one must not delete the ethically irrelevant variable from the equation being estimated. A positive prescription is now required. The problem can be posed as follows. The justifica- tion for empirically based genes ides In the view that the collective decisions of the past represent, on average, an ethically desirable standard. In the present case, however, that is manifestly untrue; such decisions, by assumption, were contaminated by the use of an ethically irrelevant criterion, race, to affect sentence length. Is it possible to purge past decisions of that contaminating effect and to use the purged estimates to inform future decisions through the construction of guidelines? The answer is yes but the accomplishment of this task necessarily involves another ethical choice. Begin by estimating equation (2) (in the simplest case by multiple regression). This yields estimates of 6, a, and which we denote by asterisks. Note that a*, in particular, is an estimate of the effect of the ethically

189 relevant variables, R. with the effect of race held constant. In terms of the example used above, this procedure estimates the effect of longer prior record given race--an effect uncontaminated by the fact that blacks tend to have longer records than do whites. This is useful information, for it tells us (in this linear model) what the average difference in sentence was between offenders with good and those with bad records independent of race.3 If we can decide on the base level of sentence in the guidelines for one case, then we can use the estimates to derive levels for others. This can be described in an equivalent but perhaps more revealing way. Suppose that we estimate equation (2) as described. We can then go on to use the estimated equation as determining the average sentence to be used in the guidelines and purge it of the racial effects by choosing a value for I, say I', to be used for all future cases of whatever race. The average sentence used in the guidelines for cases with characteristics represented by R will then be S* - 6* + Ra* + I'0* . (3) The effect of changes in R will then be measured by a* so that the choice of I' is equivalent to the choice of a base level as above. How should that choice be made? This is an inescap- able ethical decision. To see this, consider what different choices of I' imply. To choose any value of I' is to treat all offenders in a racially neutral way but the particular choice determines how they should be treated. Thus, to choose I' = 0 for guideline construc- tion is to treat later offenders on average as blacks were treated previously. To choose I' = 1 is to treat them as whites were treated previously. To choose I' 1/2 is to treat them as getting exactly the average of previous black and white treatment. This is an essen- tially ethical choice that cannot be made simply by referring to the average of past experience.4 = However I' is chosen, note that the choice of k as in the simplest case will make judges explicitly justify departures that cannot be accounted for by random variation in more than a corresponding fraction of the cases. This will force any judges who still use race in an important way to make explicit justification.

190 NONLINEARITIES AND MORE THAN (IN E ETHICALLY IRRELEVANT VARIABLE This same analysis readily extends to the case in which the relationship to be estimated is nonlinear. Suppose that equation (2) is replaced by S = F(R, I, £ ) , where F (. . . , 1) is some function, and we continue with a single dichotomous ethically irrelevant variable, I, for the moment (and continue the race example to fix ideas). Noting that I still takes on the values of either zero or one, we can represent this equivalency in a different way. Define FO (R. c) _ F(R, 0, £) ; F1(R, I) _ F(R, 1, £) Then for either for the two possible values of I, F (R. I, ~ ) - (1 - I ) FO (R. £ ) + IF1(R, £ ) (4) . (5) · (6) This corresponds to the general case in which the sentencing behavior of judges is allowed to be completely different for blacks from that for whites--complete interaction; the linear case considered above is a special case of this. In this circumstance we once again estimate the full descriptive model of sentencing behavior, equation (6). This is then purged of racial effects by applying the model for a given choice of I, say I', to all future cases. The form of (6) new makes it apparent that the choice of I' is equivalent to the necessarily ethical choice of what average between former black and former white cases is to be used. A choice of I' = 0 treats all offenders as if they were black; a choice of I' = 1 treats them as if they were white; a choice between zero and one determines an average.5 Note that this interpretation depends on the dicho- tomous nature of I. If I were a continuous variable we would estimate (4) directly. A choice of I' to use in the estimated version of (4) would then still be an ethical choice but, except in special cases, it would not correspond to a simple averaging of sentences previously given for various values of I.

191 If more than one ethically irrelevant variable has mattered in the past, more than one ethical choice (in addition to the choice of k above) must be made. Thus consider the case of two such variables that we take to be dichotomous. Suppose that I1 now represents race as above and I2 represents whether there was a guilty plea (assuming this to be ethically irrelevant) with I2 = 0 denoting no such plea and I2 = 1 denoting such a plea. Rewrite (4) as S = F(R, It, I2, £) . (7) Define FOO(R, £ ) FO1(R, £ ) F1O(R, c) Fll(R, £ ) F(R, O. O. £ ) ; F(R, O. 1, £ ) ; F(R, 1, O. £ ) ; F(R, 1, 1, £ ) . (8) Then, similar to the construction in (6), for the possible values of I1 and I2 we can write F(R, I1, I2, £) = (1 - I1) (1 - Ii)FOO(R, £) + (1 - I 1) I 2F (R. £ ) ~ I 1(1 - I2)F1O(R, £ ) + IlI6Fll(R, £) . (9) That is, there are separate relationships allowed for blacks not pleading guilty, blacks pleading guilty, whites not pleading guilty, and whites pleading guilty. The construction of empirically based guidelines now proceeds by estimating (9) and choosing two values, I' and I'2, to be used in the estimated equation that results. These choices, necessarily ethical, determine the weights to be used in averaging the previous average sentences of the four groups in guidelines to be used for all future offenders. Note, however, that there are only two choices to be made, not more than two, despite the fact that four groups are to be averaged. This corresponds to the fact that the weights used to average the guilty plea and not-guilty plea groups must be the same for blacks as for whites if race is to play no role in the use of the guidelines. Equivalently, the weights used to average the black and white groups must be the same for those pleading guilty as for those not pleading guilty if the

192 presence or absence of a guilty plea is to play no role in the use of the guidelines. Where n ethically irrelevant variables are involved, n ethical choices must be made. If n is large, even though only n such choices must be made, the view that guide- lines can or should be based on past behavior rather than constructed directly from ethical or societal considera- tions loses much of its force, although the estimated a coefficients may still help to inform decisions. CONCLUSION We are uncomfortable with the whole enterprise of empirically based sentencing guidelines, for several reasons. First, they are by their nature unthoughtfully conservative. What is past may be prologue, but it is surely not unswervingly just. We prefer guidelines that arise from ethical principles, deducing the shape of the guidelines from those principles, as was done in Minnesota. Second, taking empirically based guidelines on their own terms leads us to require ethical judgments: For example, shall we treat blacks as we used to treat whites, or conversely, or use an average? We anticipate that ethical experts might say "neither" and propose a different punishment schedule entirely, but this would lead back to a Minnesota-type approach. Finally, there is the matter of implementation. These procedures assume that the model is correctly specified. Incorrect specification can lead to reintroduction of racial bias and other kinds of substantial injustice. We should add that correct specification is very difficult to achieve. In conclusion, empirically based sentencing guidelines strike us as a species of computer-driven conservatism. They do not avoid hard ethical questions, and they mislead those who would construct guidelines by substi- tuting statistical sophistication, which is useful but not essential, for ethical sophistication, which is critical. NOTES 1. For convenience of notation we have not written out terms such as Rem. The reader is free to think of R as a

193 single variable. The more general case would have Ra = alRl + a2R2 + + akRk . 2. For a discussion of this and similar issues see Franklin M. Fisher (1980) Multiple regression in legal proceedings. Columbia Law Review 80(2):702-736. 3. Although prior record is itself a composite of several variables, we ignore this for simplicity of exposition. 4. Note in particular a choice of I' to generate the same average sentence length for all cases in the sample as actually occurred builds in a judgment that such an average was "right" despite the fact that it was influ- enced by the racial mix of cases in the sample. To attempt to set I' empirically by estimating I' together with a and ~ to give the best fit in the sample is even worse. It can be shown to be equivalent to leaving race out of the estimated equation altogether (by absorbing I'0 into 6, the constant term), the case of misspecification considered above. There may be other ways to correct for the effects of race. For example, in a rather extreme form of affir mative action, one might wish to take account of the fact that blacks are discriminated against elsewhere. Such discrimination can mean that blacks have a worse prior record or are more likely to be unemployed than whites. One can imagine correcting variables such as prior record or unemployment by regressing them on race, then giving those variables in equation (3) the values they would have on the average if the offender were white, or the values they would have if the offender were black, or some other common value. This would involve a correction for race more extreme than simply a uniform value for I' and would be likely to lead to wholesale reliance on regression rather than to analysis of individual offender characteristics. 5. Note that the choices outside the range of (0, 1) are also possible. This would mean treating all offenders better than whites were treated in the past or all offenders worse than blacks were treated in the past (assuming discrimination to have been against blacks). To do so is to depart fairly sharply from the notion that past judgments are ethically acceptable, however--the view that lies behind empirically based guidelines.