Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 184

4
Empirically Based
Sentencing Guidelines and
Ethical Considerations
Franklin M. Fisher and Joseph B. Kadane
INTRODUCTION
The U.S. Parole Board initiated the study of empirically
based guidelines to describe the decision rules it had
been using implicitly. The board's purpose was to inform
itself about the pattern of its own decisions. As a
purely descriptive device, such a study has no ethical
implications. Later the research emphasis shifted from
parole to sentencing and to a more normative focus on
what decisions should be. Nonetheless, the technology
involved in developing empirically based guidelines still
bears a strong resemblance to the analysis of parole
decisions. Ethical considerations in particular are
avoided in these analyses.
This paper examines the philosophy of empirically
based sentencing guidelines. The strong basic philosophy
we pursue is to follow an empirically based mode as far
as we can, not because we are particularly attracted to
the conservatism inherent in this line (whatever was done
in the past must have been just, even if we cannot
explain it), but because we find that surprisingly
quickly our thoughts lead us to require new ethical
judgments. Thus, in particular, we find that even when
empirically based guidelines are expected to do no more
than reduce sentence disparity, some ethical judgment is
required. If past decisions may have involved ethically
irrelevant factors such as race, the purging of those
factors, while possible, requires more than the judgment
184

OCR for page 184

185
that they should be purged. Further ethical judgments
are necessarily involved.
THE SIMPIEST CASE:
NO ETHICALLY IRRELEVANT VARIABLES
Consider first the simplest case, in which sentences have
in the past depended on a set of independent variables,
all of which are believed to be ethically appropriate.
Thus, for example, variables such as those describing
seriousness of offense are appropriate in sentencing;
variables such as race are not. We can represent this
situation by the following equation:
S = ~ + ~ + c,
(1)
where S is sentence length; R is a set of ethically
relevant variables; a is a set of unknown slope
parameters; ~ is an unknown constant term; and ~ is a
random disturbance. (For ease of exposition, we deal for
the present with the linear case only and restrict
attention to sentence length as the variable to be
determined.!)
In this situation, if we suppose that the decisions of
the past were ethically acceptable on the average, the
justification for guidelines becomes the presence of the
random disturbance, c. That disturbance may involve
factors affecting particular judges on particular days,
or it may involve the factors peculiar to individual
cases that lead judges to sentence differently.
There is an apparent tension here as to whether it is
desirable that equation (1) fit the data well or badly.
If the equation fits badly, then apparently it will
provide only an uncertain guide as to what past practice
actually was. If the equation fits well, then the
influence of the random term, £, will be small and
there will be little disparity to reduce.
In fact this apparent tension is not real, because
there is a difference between how well the model fits and
how closely the parameters ~ and a are estimated. With
large enough sample sizes or enough variation in the
underlying data, it is quite possible to estimate a and
~ with considerable precision while still having a
large unexplained variance. In that case we could
estimate average past behavior quite accurately but there
would be considerable disparity in the sense of scatter

OCR for page 184

186
around such average behavior. Note that this makes it
particularly important not to use overall measures of
goodness-of-fit such as R2 as the sole or principal
measures with which to assess the model. What really
matters are the standard errors of the estimated
parameters.2
If the parameters and thus past average behavior can
be reliably estimated but there is considerable variation
around that behavior, it may appear desirable to reduce
that variation. This is the basic rationale for
empirically derived guidelines. It rests on the view
that judges were correct in the past on the average but
that judges themselves or society would wish to reduce
the extent of individual variation around those
averages. If the model has been correctly specified so
that all the important variables affecting the sentencing
decision have been included, and if all these variables
are ethically relevant ones, this may be an appealing
view, provided disparity is high. While some room for
individual factors and individual judgment will always be
necessary, it may seem reasonable to require judges
explicitly to justify any large departures from the
systematic collective wisdom.
In the context of this model, this is easy to do in
principle. The process of estimating equation (1) will
also estimate o2, the variance of c. We denote that
estimate by o*2. Now choose a constant, k. Judges will
be required to write an explicit justification of their
actions whenever their sentence does not lie within ko*
of the estimated average sentence for the particular
value of R present in the case decided. The predicted
sentence is
S* = 6* + Ra* ,
where asterisks denote estimates.
How should k be chosen? Given the distribution of £
(which can be approximated from the data), a choice of k
is equivalent in the above procedure to requiring that
judges write explicit justifications for cases that fall
farther away from the average sentence than some stated
fraction (e.g., 90 percent) of cases would have done in
the past. What fraction should be chosen depends on the
extent to which one wishes to reduce disparity in this
way. While such a choice depends in part on what one
sees as the source of past disparity, it is also an
ethical choice.

OCR for page 184

187
This is perhaps seen most easily by considering the
following. There is no intrinsic reason why upward
departures from average sentencing behavior (harsh
sentences) and downward departures (lenient sentences)
should be treated identically. One might, depending on
one's ethical views, choose different values of k, say
k1 and k2, for the two different types of departures,
using a smaller value when departures are considered
worse. Plainly, the choice of such values depends on
ethical considerations; those considerations cannot be
avoided by restricting the choice to k1 = k2 and
treating both kinds of departures symmetrically.
Before moving into more complicated cases, one point
is worth making. Using models in this way requires that
the model be either correct or a close approximation.
(It also requires that it be estimated using the best
available practice.) If, in particular, variables are
wrongly omitted from equation (1) that are correlated
with those included, the estimated effects will be wrong
and the guidelines misleading. This will be particularly
important if the omitted variables are ethically
irrelevant.
To take a leading example, suppose that the true model
is not equation (1) but rather
S = ~ + ~ ~ ID + £,
(2)
where I is a single ethically irrelevant variable that,
for purposes of focusing discussion, we will take to be a
dichotomous variable indicating race (with I = 0 for
blacks and I = 1 for whites). Suppose also that among
.
the variables in R are one or more that are correlated
with race. To fix ideas, suppose the variable in
question is a measure of prior record. Then mistaken
estimation of equation (1) instead of equation (2) when
race has actually mattered directly in the past will lead
to erroneous estimation of a. Furthermore, the derived
guidelines will build in the ethically irrelevant effect
of race by giving (in the simplest case) an inappropriate
coefficient to prior record (among other things). In
other words, such misspecification will lead to those
with longer prior records being given long sentences not
simply because of the effect of prior record in judicial
decisions but also because those with longer prior
records tend to be black. Past racism will be
incorporated in the guidelines and the resulting
coefficients will be biased in more than one sense.

OCR for page 184

188
Other misspecifications will lead to a number of less
dramatic results. In the limiting case in which the
omitted variables are not correlated with any of the
included ones, such omission will not lead to biased
estimates of the parameters that describe average
behavior. It will, however, lead to inefficient esti-
mates of those parameters. In addition, the effects of
such omitted variables will be attributed to disparity,
whereas they may represent not random occurrences but
precisely those explicable case-by-case variations that
one would not wish to reduce.
Plainly, correct specification is very important.
Whether we know enough to achieve it is a separate
question.
THE PRE5E=E OF A SINGLE ETHICALLY
IRRELEVANT VARIABLE: THE LINEAR CASE
We now face directly the question of what to do in the
situation of equation (2), in which an ethically irrele-
vant variable such as race has influenced past deci-
sions. (For ease of exposition we begin with the case of
only one such variable, treating the more complex case
below.) We have already seen how not to treat such a
case--one must not delete the ethically irrelevant
variable from the equation being estimated. A positive
prescription is now required.
The problem can be posed as follows.
The justifica-
tion for empirically based genes ides In the view
that the collective decisions of the past represent, on
average, an ethically desirable standard. In the present
case, however, that is manifestly untrue; such decisions,
by assumption, were contaminated by the use of an
ethically irrelevant criterion, race, to affect sentence
length. Is it possible to purge past decisions of that
contaminating effect and to use the purged estimates to
inform future decisions through the construction of
guidelines?
The answer is yes but the accomplishment of this task
necessarily involves another ethical choice. Begin by
estimating equation (2) (in the simplest case by multiple
regression). This yields estimates of 6, a, and
which we denote by asterisks. Note that a*, in
particular, is an estimate of the effect of the ethically

OCR for page 184

189
relevant variables, R. with the effect of race held
constant. In terms of the example used above, this
procedure estimates the effect of longer prior record
given race--an effect uncontaminated by the fact that
blacks tend to have longer records than do whites. This
is useful information, for it tells us (in this linear
model) what the average difference in sentence was
between offenders with good and those with bad records
independent of race.3 If we can decide on the base
level of sentence in the guidelines for one case, then we
can use the estimates to derive levels for others.
This can be described in an equivalent but perhaps
more revealing way. Suppose that we estimate equation
(2) as described. We can then go on to use the estimated
equation as determining the average sentence to be used
in the guidelines and purge it of the racial effects by
choosing a value for I, say I', to be used for all future
cases of whatever race. The average sentence used in the
guidelines for cases with characteristics represented by
R will then be
S* - 6* + Ra* + I'0* .
(3)
The effect of changes in R will then be measured by a*
so that the choice of I' is equivalent to the choice of a
base level as above.
How should that choice be made? This is an inescap-
able ethical decision. To see this, consider what
different choices of I' imply. To choose any value of I'
is to treat all offenders in a racially neutral way but
the particular choice determines how they should be
treated. Thus, to choose I' = 0 for guideline construc-
tion is to treat later offenders on average as blacks
were treated previously. To choose I' = 1 is to treat
them as whites were treated previously. To choose I'
1/2 is to treat them as getting exactly the average of
previous black and white treatment. This is an essen-
tially ethical choice that cannot be made simply by
referring to the average of past experience.4
=
However I' is chosen, note that the choice of k as in
the simplest case will make judges explicitly justify
departures that cannot be accounted for by random
variation in more than a corresponding fraction of the
cases. This will force any judges who still use race in
an important way to make explicit justification.

OCR for page 184

190
NONLINEARITIES AND MORE THAN (IN E
ETHICALLY IRRELEVANT VARIABLE
This same analysis readily extends to the case in which
the relationship to be estimated is nonlinear. Suppose
that equation (2) is replaced by
S = F(R, I, £ ) ,
where F (. . . , 1) is some function, and we continue
with a single dichotomous ethically irrelevant variable,
I, for the moment (and continue the race example to fix
ideas).
Noting that I still takes on the values of either
zero or one, we can represent this equivalency in a
different way. Define
FO (R. c) _ F(R, 0, £) ; F1(R, I) _ F(R, 1, £)
Then for either for the two possible values of I,
F (R. I, ~ ) - (1 - I ) FO (R. £ ) + IF1(R, £ )
(4)
. (5)
· (6)
This corresponds to the general case in which the
sentencing behavior of judges is allowed to be completely
different for blacks from that for whites--complete
interaction; the linear case considered above is a
special case of this.
In this circumstance we once again estimate the full
descriptive model of sentencing behavior, equation (6).
This is then purged of racial effects by applying the
model for a given choice of I, say I', to all future
cases. The form of (6) new makes it apparent that the
choice of I' is equivalent to the necessarily ethical
choice of what average between former black and former
white cases is to be used. A choice of I' = 0 treats all
offenders as if they were black; a choice of I' = 1
treats them as if they were white; a choice between zero
and one determines an average.5
Note that this interpretation depends on the dicho-
tomous nature of I. If I were a continuous variable we
would estimate (4) directly. A choice of I' to use in
the estimated version of (4) would then still be an
ethical choice but, except in special cases, it would not
correspond to a simple averaging of sentences previously
given for various values of I.

OCR for page 184

191
If more than one ethically irrelevant variable has
mattered in the past, more than one ethical choice (in
addition to the choice of k above) must be made. Thus
consider the case of two such variables that we take to
be dichotomous. Suppose that I1 now represents race as
above and I2 represents whether there was a guilty plea
(assuming this to be ethically irrelevant) with I2 = 0
denoting no such plea and I2 = 1 denoting such a plea.
Rewrite (4) as
S = F(R, It, I2, £) . (7)
Define
FOO(R, £ )
FO1(R, £ )
F1O(R, c)
Fll(R, £ )
F(R, O. O. £ ) ;
F(R, O. 1, £ ) ;
F(R, 1, O. £ ) ;
F(R, 1, 1, £ ) . (8)
Then, similar to the construction in (6), for the
possible values of I1 and I2 we can write
F(R, I1, I2, £) = (1 - I1) (1 - Ii)FOO(R, £)
+ (1 - I 1) I 2F (R. £ )
~ I 1(1 - I2)F1O(R, £ )
+ IlI6Fll(R, £) . (9)
That is, there are separate relationships allowed for
blacks not pleading guilty, blacks pleading guilty,
whites not pleading guilty, and whites pleading guilty.
The construction of empirically based guidelines now
proceeds by estimating (9) and choosing two values, I'
and I'2, to be used in the estimated equation that
results. These choices, necessarily ethical, determine
the weights to be used in averaging the previous average
sentences of the four groups in guidelines to be used for
all future offenders.
Note, however, that there are only two choices to be
made, not more than two, despite the fact that four
groups are to be averaged. This corresponds to the fact
that the weights used to average the guilty plea and
not-guilty plea groups must be the same for blacks as for
whites if race is to play no role in the use of the
guidelines. Equivalently, the weights used to average
the black and white groups must be the same for those
pleading guilty as for those not pleading guilty if the

OCR for page 184

192
presence or absence of a guilty plea is to play no role
in the use of the guidelines.
Where n ethically irrelevant variables are involved, n
ethical choices must be made. If n is large, even though
only n such choices must be made, the view that guide-
lines can or should be based on past behavior rather than
constructed directly from ethical or societal considera-
tions loses much of its force, although the estimated a
coefficients may still help to inform decisions.
CONCLUSION
We are uncomfortable with the whole enterprise of
empirically based sentencing guidelines, for several
reasons. First, they are by their nature unthoughtfully
conservative. What is past may be prologue, but it is
surely not unswervingly just. We prefer guidelines that
arise from ethical principles, deducing the shape of the
guidelines from those principles, as was done in
Minnesota.
Second, taking empirically based guidelines on their
own terms leads us to require ethical judgments: For
example, shall we treat blacks as we used to treat
whites, or conversely, or use an average? We anticipate
that ethical experts might say "neither" and propose a
different punishment schedule entirely, but this would
lead back to a Minnesota-type approach.
Finally, there is the matter of implementation. These
procedures assume that the model is correctly specified.
Incorrect specification can lead to reintroduction of
racial bias and other kinds of substantial injustice. We
should add that correct specification is very difficult
to achieve.
In conclusion, empirically based sentencing guidelines
strike us as a species of computer-driven conservatism.
They do not avoid hard ethical questions, and they
mislead those who would construct guidelines by substi-
tuting statistical sophistication, which is useful but
not essential, for ethical sophistication, which is
critical.
NOTES
1. For convenience of notation we have not written out
terms such as Rem. The reader is free to think of R as a

OCR for page 184

193
single variable.
The more general case would have
Ra = alRl + a2R2 + + akRk
.
2. For a discussion of this and similar issues see
Franklin M. Fisher (1980) Multiple regression in legal
proceedings. Columbia Law Review 80(2):702-736.
3. Although prior record is itself a composite of
several variables, we ignore this for simplicity of
exposition.
4. Note in particular a choice of I' to generate the
same average sentence length for all cases in the sample
as actually occurred builds in a judgment that such an
average was "right" despite the fact that it was influ-
enced by the racial mix of cases in the sample. To
attempt to set I' empirically by estimating I' together
with a and ~ to give the best fit in the sample is
even worse. It can be shown to be equivalent to leaving
race out of the estimated equation altogether (by
absorbing I'0 into 6, the constant term), the case of
misspecification considered above.
There may be other ways to correct for the effects of
race. For example, in a rather extreme form of affir
mative action, one might wish to take account of the fact
that blacks are discriminated against elsewhere. Such
discrimination can mean that blacks have a worse prior
record or are more likely to be unemployed than whites.
One can imagine correcting variables such as prior record
or unemployment by regressing them on race, then giving
those variables in equation (3) the values they would
have on the average if the offender were white, or the
values they would have if the offender were black, or
some other common value. This would involve a correction
for race more extreme than simply a uniform value for I'
and would be likely to lead to wholesale reliance on
regression rather than to analysis of individual offender
characteristics.
5. Note that the choices outside the range of (0, 1) are
also possible. This would mean treating all offenders
better than whites were treated in the past or all
offenders worse than blacks were treated in the past
(assuming discrimination to have been against blacks).
To do so is to depart fairly sharply from the notion that
past judgments are ethically acceptable, however--the
view that lies behind empirically based guidelines.