Read "Research on Sentencing: The Search for Reform, Volume II" at NAP.edu

« Previous: 4 EMPIRICALLY BASED SENTENCING GUIDELINES AND ETHICAL CONSIDERATIONS

Page 194 Cite

Suggested Citation:"5 THE CONSTRUCTION OF SENTENCING GUIDELINES: A METHODOLOGICAL CRITIQUE." National Research Council. 1983. Research on Sentencing: The Search for Reform, Volume II. Washington, DC: The National Academies Press. doi: 10.17226/101.

Page 195 Cite

Page 196 Cite

Page 197 Cite

Page 198 Cite

Page 199 Cite

Page 200 Cite

Page 201 Cite

Page 202 Cite

Page 203 Cite

Page 204 Cite

Page 205 Cite

Page 206 Cite

Page 207 Cite

Page 208 Cite

Page 209 Cite

Page 210 Cite

Page 211 Cite

Page 212 Cite

Page 213 Cite

Page 214 Cite

Page 215 Cite

Page 216 Cite

Page 217 Cite

Page 218 Cite

Page 219 Cite

Page 220 Cite

Page 221 Cite

Page 222 Cite

Page 223 Cite

Page 224 Cite

Page 225 Cite

Page 226 Cite

Page 227 Cite

Page 228 Cite

Page 229 Cite

Page 230 Cite

Page 231 Cite

Page 232 Cite

Page 233 Cite

Page 234 Cite

Page 235 Cite

Page 236 Cite

Page 237 Cite

Page 238 Cite

Page 239 Cite

Page 240 Cite

Page 241 Cite

Page 242 Cite

Page 243 Cite

Page 244 Cite

Page 245 Cite

Page 246 Cite

Page 247 Cite

Page 248 Cite

Page 249 Cite

Page 250 Cite

Page 251 Cite

Page 252 Cite

Page 253 Cite

Page 254 Cite

Page 255 Cite

Page 256 Cite

Page 257 Cite

Page 258 Cite

Page 259 Cite

Page 260 Cite

Page 261 Cite

Page 262 Cite

Page 263 Cite

Page 264 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 The Construction of Sentencing Guidelines: A Methodological Critique Richard F. Sparks INTRODUCTI ON The purpose of this paper is to discuss critically a number of conceptual and methodological problems asso- ciated with the construction of empirically based sentencing guidelines.] Guidelines of the type with which this paper is concerned are the most recently proposed technique for attempting to deal with a problem which has been a subject of concern for at least a century: controlling the discretion of individual decision makers in the criminal justice system.2 Sentencing guidelines differ in a number of interesting and important ways from other techniques for controlling discretion in sentencing, such as sentencing codes (Ferri, 1921; Glueck, 1928), mandatory sentences, or "presumptive" sentences. For this reason, sentencing guidelines solve some of the problems associated with these other techniques, while simply bypassing others. Empirically based guidelines do raise a number of problems of their own; these are the problems of most concern in this paper. My focus is primarily on the construction of sentenc- ing guidelines. I do not discuss any theoretical or empirical issues relating to the implementation of guidelines in different types of jurisdictions; nor, a fortiori, do I deal with assessing the impacts (in any sense of that term) of guidelines on sentencing practice, e.g., with the complex problem of estimating "compliance" 194

195 with guidelines after they have been introduced. Nor, indeed, do I address all the problems that might reason- ably be said to be associated with constructing sentenc- ing guidelines. A political scientist, for instance, would no doubt find it interesting to explore the relationships between different types of legal systems and judiciaries and the acceptance of judicially sup- ported guidelines as a means of controlling discretion in sentencing; legal theorists and sociologists of organiza- tions could similarly find grist for their respective mills. My concern is with what might be called the technology of developing sentencing guidelines, as that technology has been represented in a number of different American jurisdictions over the past decade or so. In discussing some of the problems of this technology, I refer to the decision-making guidelines that have been developed and/or implemented in a few American jurisdic- tions in recent years. My primary purpose in doing this will be illustrative rather than evaluative. Much of the empirical research done by those who have been involved in developing guidelines in recent years has been severely flawed in methodological terms; as a result, that research has often yielded descriptions of antece- dent sentencing practices that were both inaccurate and misleading. In one sense this may not have mattered much, since (in at least some jurisdictions) the findings of the empirical research carried out as a preliminary to the formulation of guidelines were substantially modified in the light of considerations of legal or social policy. I shall also argue, however, that much of this empirical work has rested on a faulty conception of the proper role of research in relation to the development of guidelines. empirical research--if it is correctly done--can be - There are indeed a number of ways in which , useful to those planning to introduce sentencing guide- lines (or other techniques for controlling discretion). Much research to date in this area, however, appears to have serious technical shortcomings, which in some cases may have obscured important questions of policy and in others may lead to highly undesirable consequences-- including some consequences that guidelines are supposed to avoid. The construction of empirically based sentencing guidelines has been said to involve three distinct steps (Zimmerman and Blumstein, 1979; Gottfredson et al., 1978; Kress, 1980). The first of these is the collection of data on past sentencing practice. The second is the

196 analysis of those data aimed at producing a model of past sentencing practice; such models usually take the form of statistical equations purporting to show the relation- ships between such things as seriousness of offense and prior record to past sentencing outcomes. The third is the translation of the model thus obtained into a prescriptive instrument--that is, the guidelines them- selves. In later sections of this paper, these three steps are discussed in some detail; each has distinctive problems associated with it, and as we shall see the three-step account itself has certain flaws. But as a preliminary, it may be useful to look briefly at the guidelines that are meant to be the end-product of this three-step exercise. If the objective of the collection and statistical analysis of data on sentencing is the construction of an instrument to guide future sentences-- rather than, say, the testing of conflict or Marxist theories about the criminal justice system (e.g., Hagan, 1975; Lizotte, 1978)--then this has important implica- tions for the kinds of data collection and analysis that are reasonable to do. THE CONCEPT OF DECISION-MAKING GUTDELINES3 Empirically based decision-making guidelines were first proposed in the field of criminal justice by Don M. Gottfredson and Leslie T. Wilkins, for use in connection with the decisions of parole boards to release offenders from prison. The U.S. Parole Commission has used various versions of the Gottfredson-Wilkins guidelines since 1972 (see Gottfredson et al., 1975; Gottfredson et al., 1978). A feasibility study of the application of guidelines to sentencing was begun by Gottfredson and Wilkins in 1974; while not all of the guidelines subse- quently develped in various American jurisdictions have followed what may be called the Gottfredson-Wilkins model, the great majority have done so. The basic concept of the Gottfredson-Wilkins model of guidelines is as follows. Decision makers in the criminal justice system--for example, judges or parole board member --are given information about the patterns of decision making in their jurisdictions in the past; they then use this information to guide their decisions in the future. In the case of parole decision making (which is of course concerned only with offenders who are already incarcerated) the information typically consists

197 of a range of months or years served in prison before release on parole. The parole board may release pris- oners after they have served terms falling within that range without any further justification. Alternatively, the board may depart from the guidelines-setting a term of incarceration that falls outside the suggested range--if there are special factors that appear to make this appropriate, although the board must state its reasons for any such departure. The typical form of such guidelines is a two- dimensional matrix or table, in which the rows correspond to different types of current offense (usually although not necessarily ordered by seriousness), and the columns correspond to an offender score, which is usually largely a function of prior criminal record but may also include other personal attributes (e.g., a presumed measure of social stability, such as employment status, education, or the absence of drug use). Each of the cells of the resulting matrix contains the normal range of months or years of incarceration for offenders with the particular combination of offense type and offender score defining the cell. Table 5-1--which is based on the U.S. Parole Commission's current guideline - -is an example of such a matrix. This table shows that, for example, an offender who has been imprisoned for an offense of "low moderate" seriousness (examples, in the U.S. Parole Commission's TABLE 5-1 Customary Total Times to be Served in Prison Before Release, in Months, Under U.S. Parole Commission Guidelines Severity of Parole Prognosis (Salient Factor Score) Offense Very Good Good Fair Poor Low 6-10 8-12 10-14 12-16 Low Moderate 8-12 12-16 16-20 20-25 Moderate 12-16 16-20 20-24 24-30 High 16-20 20-26 26-32 32-38 Very High 26-36 36-45 45-55 55-65 SOURCE: Adapted from data in Gottfredson et al. (1978:24-26).

198 matrix, include fraud involving less than $1,000 and the simple possession of marijuana) and who has a good Prognosis (as measured by the commission's Nsalient score) should normally serve between 12 and 16 _ factors months in prison before release on parole. Evidently, a very similar kind of matrix could be used by judges in deciding what sentences to impose on convicted offenders, although there are some important differences between sentencing and parole guidelines, which follow from differences in the decisions they are meant to regulate. Before turning to those matters, however, let us consider what is distinctive about the Gottfredson-Wilkins concept of guidelines, compared with other techniques that have been proposed for regulating, controlling, or structuring discretionary decisions. Two things appear to be important: (1) Ranges rather than points. The parole guidelines originally proposed by Gottfredson and Wilkins provided for a range of months or years to be served before release from prison; in this respect their guidelines differ from most forms of presumptive sentencing (e.g., California's Uniform Determinate Sentencing Law of 1976), under which the term to be imposed in the normal defined in terms of a single point or period of time, such as two years.4 case is (2) Nonmandatory ranges. It is essential to the Gottfredson-Wilkins concept of guidelines that judges or parole board members are not legally required to impose a sentence or fix a term within the range stipulated by the guidelines matrix. They may of course do this; if they do, then no further justification of that sentence or term is required. They may decide that it is appropriate to depart from the guidelines, if there are special features of a case that seem to justify a higher or lower term than the normal range of the matrix cell provides. They should state their reasons for doing so, citing the features of the case that in their opinion make a higher or lower sentence appropriate. It is perhaps these two features--a range of permitted variation and the option of departure from that range in explicitly justified circumstances--that have led Gottfredson and Wilkins to describe sentencing and parole guidelines as a means of structuring decision makers' discretion, rather than limiting or eliminating it (Gottfredson et al., 1978:8). Providing a normal range

199 of prison or jail time within which no special justifi- cation is needed does two things. First, it recognizes that for any combination of factors, such as offense type and offender score (however those are defined), there will probably still be some differences between cases-- for example, in the amount of property stolen or damaged, the amount of injury inflicted, or the vulnerability of the victim--that may justify some variation in sentences imposed. A normal range also accepts the view that people may reasonably disagree about the appropriate penalty, given the facts of a particular case. Guide lines aim to set limits on that kind of difference of opinion, by providing that sentences outside the stipulated range must be specially justified. It may be noted in passing that there are a number of . other features that Such a system of sentencing gu~de- lines may have, which although not intrinsic to the process of constructing guidelines may nonetheless have some implications for the analysis of past sentencing (Some of these features were suggested by practice. Gottfredson and Wilkins; others were not, but are exemplified by guidelines now in existence.) First, there may be rules that limit the grounds on which sentences outside the normal guideline range may be justified, so that departure is permissible only if one or more of an explicit list of aggravating or mitigating factors is present. The Minnesota guidelines, for example, are accompanied by a list of four mitigating factors and four aggravating factors that may justify departure from the prescribed range; there is also an explicit list of 11 factors that may not be used as grounds for departure.5 Second, there may be limits placed on the length or type of sentence that can be imposed by judges if they do go outside the normal range; the first set of proposed Pennsylvania guidelines, for example, limited aggravated and mitigated sentences to adjacent cells of the matrix.6 Third, the reasons given for departing from the guideline range may be incorporated into the process of appellate review of sentences, if there is such a process in the jurisdiction.7 Alternatively or additionally, information on the use of the guidelines (including departures and the reasons given for them) may be made available to the judiciary or the sentencing commission, who may then decide whether the guidelines should be modified in some respect. This kind of feedback process, involving continuous monitoring of the guidelines after

200 their implementation, was in fact regarded by both Gottfredson and Wilkins as central to the concept of guidelines; it is what they meant by Making policy explicit and by the "evolutionary models they proposed (Gottfredson et al., 1975; Gottfredson et al., 1978; Gottfredson and Gottfredson, 1980). This process is a feature of the U.S. Parole Commission's current proce- dures and appears to have led to modifications of the commission's guidelines from time to time; it is also envisaged by the Minnesota Sentencing Guidelines Commis- sion, although that state's guidelines have not been in operation long enough to see how it will work in practice. What about the empirical basis of sentencing guide- lines? There is certainly an impression conveyed by the literature on this subject that an analysis of past sentencing practice is intrinsic to the concept of sentencing guidelines. All the guidelines developed to date have taken as their starting point a statistical analysis of past practice, the purpose of which was ostensibly to identify those factors most strongly associated with sentencing variation in the past. Gottfredson and Wilkins have both said on numerous occasions that guidelines are "descriptiveN rather than "prescriptive" (see, for example, Wilkins et al., 1976:31-32; Gottfredson et al., 1978; compare Press, 1980:11-12). Similarly, the New Jersey guidelines state that n it should be emphasized that the purpose of sentencing guidelines is not to persuade judges regarding what is the 'right' sentence or the 'best' sentencer (McCarthy, 1978:6) and elsewhere repeat the ~descriptive, not prescriptive" idea. Given this rhetoric and its associated history, it may seem natural to assume that sentencing guidelines must be based on an analysis of past sentencing practice.8 It takes only a moment's reflection, however, to see that this is not necessarily so; and that the much-touted empirical basis of guidelines is by no means intrinsic to the construction of an instrument for controlling decision makers' discretion.9 On the contrary, a matrix like that in Table 5-1 could obviously be made up--by a legislature, a sentencing commission, or a parole board--without any reference whatever to past decision-making practice. This is in fact precisely what happened with the Oregon parole guidelines, which were first developed in 1975 and given statutory authority in 1977. No analyses of past decisionrmaking practices were carried out before these guidelines were formulated;

201 instead, the board, under the chairmanship of Ira Blalock, simply made up the appropriate ranges of time to be served by different types of offenders. It is in fact unclear how far Blalock and his colleagues were trying, in creating their guidelines and the associated defini- tional rules, to reflect past paroling practice in the state.l° What is clear is that they did not carry out any detailed analysis of those past practices, and of course they did not need to do so. prescribed. They simply That said, it may be agreed that "obtaining an empirical description of current sentencing behavior is a reasonable first step in the process of sentencing guideline development" (Zimmerman and Blumstein, 1979:2). There are several reasons why this may be the case. First, most advocates of guidelines have been animated by a belief that these will somehow help to reduce disparity in sentencing; given this animus, it may be thought prudent to show that there has in fact been such disparity in the past.ll Second, there may be a genuine feeling that what was done in the past was by and large right (and so ought to be incorporated into anything aiming to prescribe what should be done in the future); I suspect that this view has fairly widespread support, especially among the judiciary, although it is az~'cu't to get anyone to say so in public. It may indeed be agreed that past sentencing practice has been correct on the average, but that there has been too much variation around that average; disparity in this sense of excessive variation need not entail, of course, that sentences in the past were based on morally iniquitous factors such as race or social class. If it is felt that the judiciary's collective wisdom has in the past been generally on target, then research may give a clearer picture of what the targets in question have been; this may help judges to sentence in a more consistent fashion. Finally, and perhaps more cynically, it may be thought that it will be comforting, especially to the judiciary, to claim that sentences in the future will not be too different from what they were in the past; a bit of preliminary number-crunching may make this politically expedient claim more plausible.l2 . . . . . ~ In any event, one muse scare somewhere, when Implementing sentencing reform; and it plainly seems better to begin with good empirical evidence than with unsupported speculation.13 After all, many people--including many judges and legislators--do not know that an offender given an

202 "indeterminate" sentence of 5 to 15 years in prison is likely to be back out on the street in perhaps 24 months; research may help to convince them of this fact, if indeed it is a fact.14 It should not be forgotten, however, that it is perfectly possible to construct sentencing or parole guidelines in the back-of-an-old-envelope fashion followed by the Oregon parole board; these might be called guidelines by fiat. I shall have little to say about Such genes in this paper, which is mainly concerned with the problems of carrying out empirical research on past sentencing practice. It is important to keep in mind the possibility of such guidelines, however, when considering the construction of empirically based ones. To do so may serve to remind us that there is no necessary connection between descriptions of current practice and guidelines as a prescriptive instrument. THE DESCRIPTION OF PAST SENTENCING PRACTICE I begin by distinguishing, definitionally, between a sentencing policy and sentencing practice. I use the term policy to refer to a description of the various things that enter consciously into the decisions of judges (or parole boards). It includes not only their (probably rather mixed) views about the proper goals of their decisions but also their (sometimes not fully articulated) views as to what they should do in a particular type of case to accomplish those goals, the features of the case that justify their doing one thing rather than another, and so on. A judge's sentencing policy, by this definition, would be described by a sincere answer to the question, "what do you generally do with cases of type X, and why?" ~ In all probability, the answers to a number of supplementary questions would also be relevant. Such an account of sentencing policy assumes that it basically involves the application to particular cases of some rules or recipes of the form, "If the case is of type X, then do Y"; it also implies at least minimal self-consistency on the part of individual judges over time. The term sentencing practice, by contrast, is used to refer to what may be called an external description of judges' sentencing behavior; it does not incorporate any reference to what the judge(s) in question thought, believed, intended, etc. when imposing the sentences in

203 question. Sentencing practices are what are described by statements like "Court A imprisoned 75 percent of its convicted burglars, whereas Court B put 95 percent of its convicted burglars on probation"--statements than can in principle be verified or falsified by summary statistics, observation, etc., which entail no reference to the conscious plans of action on the part of judges leading - · . up to the sentences in question. The importance of- th IS distinction is that there is, generally speaking, only one correct description of the sentencing policy followed by a judge at a particular time and place, whereas there is an infinity of correct descriptions of the judge's practice, consisting of the sentences imposed at that time and place.15 It is clear that many if not most of those who have done research on sentencing with a view to creating guidelines have wanted to influence sentencing policies. Gottfredson and Wilkins, for example, claim that their early work with the U.S. Parole Board was "making paroling policy explicit. n Similarly, the sentencing guidelines developed in Minnesota and Pennsylvania were very explicit statements of policy: They were intended, one might say, as Recipes for sentencing" that judges were to follow in the future. Given that aim, it seems reasonable to suppose that the empirical research that has been carried out, as a preliminary to formulating guidelines, should have been research on previous sentencing policies. It is important to note, however, that this has almost never been the case. In almost every instance, the research on past decision making with which we are concerned has been of a kind that (at best) could only have produced an external description of past practice. The earliest research of Gottfredson and Wilkins is a sort of exception. Gottfredson and Wilkins first obtained from parole board members their subjective ratings of a number of variables, such as seriousness of offense, risk of recidivism, institutional behavior, etc., for a number of cases. They then analyzed these (using the statistical procedure known as multiple regression) and showed that seriousness of offense and length of prior record were the two variables most strongly related to the lengths of time served to parole in the cases studied. They then categorized and cross- classified these two variables to obtain a matrix like the one shown in Table 5-1 above, calculated median times to parole in each of the cells of that table, judgmen- tally "smoothed" those medians, then bracketed them with . . ,

204 more or less arbitrarily chosen ranges to produce the guidelines. That was the extent of their empiricism. To say that this method made explicit a policy of the U.S. Parole Commission that had been in effect all along was a pretty safe claim. Seriousness of current offense and length of prior record have been said to be the most important, morally appropriate determinants of sentences in an enormous number of jurisdictions; any study of sentencing practice that does not find these things to be most strongly associated with severity of sentences (at least for adults) has probably measured something incorrectly. The inferential leap that Gottfredson and Wilkins made from practice to policy was thus not a very great one--especially since those two variables were defined in terms of the assessments of parole board members rather than in some more objective fashion. Any more ambitious inferences, however, are perilous. For one thing, even with the two obviously relevant variables just mentioned, an accurate assessment of sentencing policy requires that we know just what judges mean by saying that, for example, an offense is serious or a prior record minor; concepts of that kind may in practice be quite complex and variable, and (perhaps surprisingly) these particular concepts are still imperfectly under stood.l6 To be sure, most of us have some fairly crude commons ens e notions about what makes a crime serious or a prior criminal record trivial, but these notions clearly do not take us very far. Moreover, there is not at the present time anything that could be called a theory about how judges (or parole boards or analogous decision makers) actually decide what sentences or prison terms to impose. We know next to nothing, for example, about the ways in which attributes of the current offense(s) and facts about the offender's prior criminality tend to be combined, in practice, so as to influence the judge's choice of sentence. Further- more, we knew little about what other kinds of things (prospects for rehabilitation, for example) may be considered by judges or parole boards in certain cases. As is well known, the philosophy of sentencing is now in some considerable turmoil, in the United States and elsewhere; what Allen (1964) call-cd the "rehabilitative ideal" is fast losing what little credibility it ever had, in most jurisdictions, and "just deserts" (van Hirsch, 1975) are being served up in its place. This makes it extremely difficult to infer anything about the

205 sentencing policies of judges from objective data on cases dealt with in the past--not least because a reasonably concrete attribute like number of prior felony convictions may have one kind of effect if rehabilitation is the judge's goal, and quite another if the aim is what u sed to be called retr ibution . Very well, it may be said; in order to construct guidelines, let us forget past policies and instead produce a descr iption of past practices. We can obtain empirically a picture of what judges in fact did in cer ta in types of cases in the past and use that as th e basis of some rules prescribing what they should do in the future. There is something r ight about this; but not much. The definition of policy that I gave at the beginning of this section refers to the things that ar e consciously used by judges. But it plainly cannot be assumed that the only things that influence the outcomes of sentencing decisions are things of which the judge is aware; just because there are reasons that judges can g ive (and, we assume, sincerely give) for sentences, it does not follow that those sentences are not inf luenced by th ings of which they are not aware. For example: a judge may sincerely and deeply bel ieve that he or she is not racially prejudiced; yet he or she may be ("uncons- ciously" ~ disposed to give heavier sentences to black s than to whites, ceter is par ibus. 17 There is a story, probably apocryphal, that it is regarded as more serious to shoot a cow in eastern Oregon than to shoot your wife in western Oregon. Judges from the two ends of the state might agree on the def inition of seriousness, while being unaware of the regional difference (if there is one) in the application of that definition in different parts of the state. Research on sentencing in the past, if intended as a oreliminarv to auidelines; ores:c~r ibina nentenc~ina in the ~ ~ ~ _ _ ~ ~ ~ ~ ~ A ~ _ _ _ _ ~ future, must thus examine both policy and practice. The trouble is that, at present, we have virtually no theory about either policy or practice, once we get beyond a small number of commonsense ideas. This in turn is impor tant, s ince without some k ind of theory, however humble, we cannot possibly decide what information we should obtain about sentencing practice in the past. Consider these two statements: (1) Court A imprisoned 75 percent of its convicted burglars, whereas Court B put 95 percent of its burglars on pr obation .

206 (2) Court A imprisoned 75 percent of its convicted, one-eyed, green-haired sodomites, whereas Court B put 95 percent of such convicted offenders on probation. Suppose that both of these statements are true for some pair of jurisdictions. Which do we accept as a description of sentencing practice? The answer is that, as matters now stand, we have precious little ground for choosing between them. Of course, we may rule out statement (2) on commons ense grounds; or we may actually go out and ask judges, in an artful fashion, whether or not being monocular or green-haired is something that they ever take into account. or we might try to get a handle on this experimentaly, e.g., by systematically varying one-eyedness and green hair among burglars, robbers, con men, etc. as well as those convicted of sex offenses (perhaps there are several green-haired judges in Court B. whereas judges in Court A have an unconscious aversion to one-eyed persons, and these two peculiar attributes are correlated). My point is that both statements could be true, for some sample of offenders; without something that can reasonably be called a theory, we have no ground for preferring one to the other. It follows that without some sort of theory, we can have no real idea what sort of information to collect in order to give a useful description of past sentencing practice. It would probably not occur to researchers in this field to collect data on green hair and monocularity (or at least I hope it would not); if it did, they would probably not succeed, since these two attributes are not, so far as I am aware, regarded as important by the probation officers and other social workers who currently write presentence reports in most American jurisdic- tions.18 By now the reader is probably thoroughly tired of these two far-fetched examples of possible correlates with sentencing outcomes. They are, of course, deliberately far-fetched; but that is not the point. What is true for them--namely, that we need some kind of theory, even if only a rather vulgar one, to make sense of their correlations with severity of sentences-- is equally true of what are, on their face, much more plausible candidates for inclusion in a useful descrip- tion of sentencing practice. I give two examples, drawn from the research that my colleagues and I recently carried out in an evaluation of statewide sentencing guidelines (Sparks et al., 1982). There are some data suggesting that the recommendations

207 made by probation officers, as part of the pretrial or presentence reports that they prepare for judges, may exert an important influence on the judges' decisions to place convicted offenders on probation rather than incarcerating them. In field research that we carried out in Massachusetts, before that state's judicially sponsored guidelines were developed, however, we dis- covered that the recommendations of probation officers there cut very little ice. We also discovered, through interviews and observation, that the recommendations of prosecutors and defense counsel were important determi- nants of the sentences finally imposed on offenders-- though in complex ways that varied considerably among judges and among the four counties in which our field research was done.l9 Had we been carrying out research on past sentencing policy--or even, more humbly, on past sentencing practice--in Massachusetts, we could properly have ignored probation officers' recommendations, but we should certainly have collected data on defense counsel's recommendations. Those who carried out the research on which the Massachusetts guidelines are based did just the opposite: They recorded the irrelevant recommendations of probation officers (when these were available from records) and failed to record those of defense counsel.~° With a view to prescribing sentencing policy for the future through guidelines, research has been done on sentencing in the past in several American jurisdictions. Most of this research has not, however, investigated past sentencing policies; at best, it has produced descrip- tions of past practice, which have not been guided by any kind of theories about how judges actually decide what sentences should be imposed, because there are no such theories at the present time. It is vital to see, however, that some well-founded beliefs about the way in which judges actually make sentencing decisions must be thought out (and, prefer- ably, tested by observing and interviewing judges and others involved in the sentencing process) before research on past sentencing practice is begun. Such a set of beliefs--even if they do not amount to a theory-- will largely influence the kinds of information on past practice that it is reasonable to collect. AS Zimmerman and Blumstein (1979:9) have pointed out, one should ideally try to obtain information only on variables that seem theoretically reasonable or are believed (perhaps from previous research) to be empirically correlated with

208 sentencing practice; there is no point in going to the often-considerable expense of collecting data on vari- ables that are irrelevant will not be used in later statistical analyses. Lacking any kind of theory, many if not most of those who have done research on sentencing with a view to formulating guidelines have, it seems, simply set out to collect as much data of any kind whatever as they could find in existing records and afford to have keypunched for computer analysis. The most extreme example seems to have taken place in New Jersey, where the guidelines project staff "decided that every bit of data could possibly affect sentences, and that therefore no assump- tions should be made at the outset to dismiss any dated (MoCar tiny, 1978:10).21 For example, the New Jersey project attempted to collect information from presentence reports on "education of offender's parents/guardian." As it turned out, however, data on this variable were recorded (whether or not accurately) in only 7 percent of their cases (McCarthy, 1978:16n.). This is likely to happen with many such recondite variables, especially if the data in question are decided upon and collected after the fact, e.q., from presentence reports. A more important question, however, is what would one do with data on education of offender's parents/guardian if they were recorded in, say, 93 percent of all cases instead of 7. There is plainly no reason to think that this item of information should figure in judges' sentencing decisions in any important way. What reason is there to think that it does figure--even in the handful of cases in which the judge is aware of it? What data on past sentencing practice should be collected? Perhaps the best answer is that data should be obtained on all variables that might reasonably be supposed to have been associated with sentencing in the past in a nontrivial proportion of the cases intended for statistical analyses. At a minimum, as Gelman et al. (1979) have suggested, such analyses should at least consider the information base available to the judge at the time when the sentencing decision was made; and if there is doubt, it is clearly better to be inclusive at the first stage of data collection, given that items that subsequently prove to be clearly irrelevant can be discarded later. Basing analyses of past practice on the information that judges had before them conveniently skirts one of the problems of validity commonly encoun- tered in social research, since it does not really matter

209 whether that informatiO2n3was correct, so long as the judges believed it was. However, there are may be problems in identifying the judges' information base, especially if (as has been the case in all guidelines research projects to date) the data on past practice are collected from presentence reports or other records, after the offender has been sentenced. Presentence reports, typically compiled by probation officers, usually give no information about the offender's demeanor in court, the issues that may be found (either during a trial or at the time of a plea) to justify mitigation or aggravation of sentence, the behavior of counsel, and many other matters that may influence the sentence finally imposed. As just noted, however, it is also important to collect data on things that might have affected sentenc- ing decisions in the past, even though this was undesired and unintentional. A few obvious candidates, in common- sense terms, are the race and ethnicity of the offender (and victim), the region within a state or similar jurisdiction in which the case was dealt with, the sex, occupation, and social status of the offender (and the victim), and the identities of judge, prosecutor, and defense counsel involved in the case.2 These things will not, presumably, be used in prescriptive guidelines, but they may be important in analyses of past practice, if the notion of empirically based guidelines is to have any meaning. A special problem is posed by the dependent or outcome variables typically used in studies of sentencing practice. As the next section of this paper discusses, it is necessary to keep separate at least two outcomes of the sentencing decision: (1) the in-out decision and (2) the "how long?" decision for those who are incarcerated. In most jurisdictions, determining the value of the first of these variables--i.e., whether the offender was incarcerated and if so where--seems unlikely to pose many problems. But the second variable--duration of incarceration--is more difficult. The problem is that, in most American jurisdictions, the time to be served by incarcerated offenders is finally determined, not by the sentencing judge but by the parole board. Although there are differences from state to state, in many if not most states the judge will pronounce either a maximum term, or a maximum and a minimum term, with the amount of time actually served by the offender to be determined within the judicially imposed limits.25 In some jurisdic -

210 Lions, indeed, the sentence imposed by the judge--usually a maximum term-may be very much a pro forma pronouncement. If one wishes to model antecedent sentencing practice in such jurisdictions, then data on judicially imposed (maximum) terms may be of little relevance. What will matter to the length of time a prisoner stays "inside'' is not what the judge says but what the parole board later decides. It appears that it was for this reason that the Minnesota guidelines researchers drew two separate samples of offenders on which they based their analyses. The decision to incarcerate was based on a sample of 2,399 persons convicted of felonies in fiscal 1978; this sample was drawn from court records. The study of sentence duration, however, was based on all 847 of the prisoners released from state correctional institutions in 1978, either on parole or at the expiration of the sentencers) that led to their commitment (see Minnesota Sentencing Guidelines Commission, 1980:4). This was no doubt the only feasible approach, but it raises some problems of inference.26 In general, data on a sample of parolees can provide only minimal information about judicial views as to lengths of prison terms. In very "indeterminate jurisdictions, of course, judges may in effect have no views of their own on appropriate lengths of terms; more precisely, though they may have some views, these may not be reflected in the terms eventually served by the offenders whom they sentence, which will be fixed later by the parole board. To the extent that this is the case, the empirical analysis on which guidelines are ultimately based will not be an analysis solely of judicial behavior; instead, it will involve some oompo- site of judicial and paroling behavior. (The matter is further complicated by the fact that judges may have effective control over the lengths of jail terms, if these are counted as n in" sentences--in the Minnesota guidelines, which are concerned solely with state prison sentences, they are not.) Different situations may arise in other jurisdictions. TO Mnnna~husetts. for example, judges may sentence some - offenders either to the state prison at Walpole or to the reformatory at Concord (or to a local jail). For those sent to Walpole, minimum parole eligibility is either one-third or two-thirds of the judicially imposed sentence, depending on type of offense. For those sent to Concord, however, the usual rule is that the offender stays inside for 6 months for each 5 years of term

211 imposed; thus an offender sentenced to 15 years in Concord would normally be released after 18 months in the institution. Now, discussions with Massachusetts judges--by the Massachusetts guideline project staff as well as by my colleagues and me during our periods of field work in that state27--strongly suggested that the judges were well aware of the times that offenders would normally serve (unless penalized for institutional misconduct) in the two different institutions and that they consciously tailored the sentences they imposed, in order to try to ensure that the offender was "off the streets for what they regarded as an appropriate period of time.28 Data from the Massachusetts Department of Corrections, moreover, show that the state's judges were generally correct in assuming that the majority of offenders were in fact released on parole at the minimum times provided by law for the various institutions in the state. Given these facts, the Massachusetts guidelines researchers took as their Length of incarceration" variable the proportion of the total sentence to Walpole or Concorde prescribed by law as the minimum to parole eligibility given the institution and type of offender in question. Such an approach seems to me entirely reasonable; but the same conditions may well not obtain in other juris- dictions. In Michigan, for example, there is an indeter- minate sentencing system of a fairly conventional kind. In felony cases, judges have discretion either to grant probation or to impose a jail term or a minimum state prison sentence, which by law may not be more than two-thirds the maximum (People v. Tanner, 387 Mich. 683, 199 N.W.2d 202 (1972); see Zalman et al., 1979). Release from prison, at any time between the minimum (less good time) and the maximum, is at the discretion of the parole board. The researchers involved in developing Michigan 'S sentencing guidelines took as their measure of length of prison sentences the minimum terms imposed, on the ground that these reflected the only meaningful use of discretion by judges (Zalman et al., 1979:172). This may be so, but the minimum terms may obviously give only an imperfect indication of the terms that prisoners eventually served. A further practical problem concerns the numbers and kinds of cases selected for study. The sentencing guidelines research done to date displays considerable variation in this respect. At one extreme, data were collected in New Jersey on all persons convicted of

212 crimes in the year beginning December 1976; this yielded a total of about 16,000 cases. This solution, apparently adopted on political grounds,29 has some distinct advantages from a researcher's point of view,3° but it is plainly very expensive and is unlikely to be followed in many other jurisdictions. In the other states in which guidelines research has been done to date, the analyses of past sentencing practice have been based on samples of cases dealt with in, say, a year's time. In Michigan, for example, the sample consisted of 5,909 cases of a total of 26,116 cases sentenced in calendar year 1977 (Zalman et al., 1979). In Pennsylvania, the sample contained about 2,900 cases; in Minnesota, about 2,400; in Massachusetts, about 1,500.31 Although these sample sizes might be thought adequate for many kinds of social research, it seems from project reports that they imposed some constraints on the analyses conducted in most if not all of these guideline projects, not only because of the problems of missing data inherent in court-based records in most jurisdictions but also because of the relative rarity of some kinds of cases that may be of interest. For example, in most states it will be difficult to obtain sufficient cases for analysis in small or rural counties, and, given the usual sex ratios among convicted offenders, it may be difficult to get data on enough convicted females to permit more than the most cursory analysis. Different strategies--none of them entirely satisfactory--have been adopted by different researchers to cope with this problem. In Massachusetts, for example, a few of the smaller counties in the state were simply excluded from the sampling frame;32 in Minnesota, by contrast, several rural counties, including a number with large Native American populations, were oversampled in order to provide enough cases for statistical analysis. In addition, in Minnesota, all of the convicted female felons were included in the sample, but only 42 percent of the convicted males (see Minnesota Sentencing Guidelines Commission, 1980:4). Dispropor- tionate sampling of this kind can cause some statistical problems, although in practice these need not be too serious.33 It cannot be intelligently done, however, unless one has some notion of the kinds of cases that are likely to be of sufficient theoretical or practical importance to require oversampling; and as I have already noted, most guidelines researchers to date seem to have had only the most rudimentary ideas about this. For

213 example, common sense might lead one to suppose that sentencing practice would be different for cases disposed of by trial and cases disposed of by a plea of guilty (whether or not that plea was negotiated). This was in fact found to be the case in Massachusetts, and the guidelines eventually developed in that state were urimarilv to be used in tried cases.34 Yet no special effort was made by the Massachusetts researchers to oversample these cases (which account for less than a tenth of the total); the result was that the guidelines were based on a small number of cases, which may have provided an unreliable or even misleading description of past practice.35 A final problem of sample size concerns the fact that any statistical model or description of sentencing practice should ideally be validated: that is, it should be tested on a fresh sample from the same population, to see how well it holds up. The reason for this is that--especially if one's research is not guided by any kind of theory--the results from the analysis of the first sample may be due in part to some idiosyncrasies of that sample, which reflect nothing more than chance variation. A variety of techniques for this kind of statistical validation exists (see, for a discussion, Mosteller and Tukey, 1977:36-40, 133-63; Larntz, 1980). The problem is that all of these require substantial numbers of cases to be selected in the first place.36 (In this respect, one must envy the New Jersey researchers; political pressures that apparently required them to collect data on all cases sentenced in the year they studied also provided them with the funds to do this.) In fact, few of those who have so far done research on sentencing with a view to developing guidelines have Said any attention to this important issue of validation;3 as the next section discusses, this may account for some of the counterintuitive findings of their analyses, and it certainly leaves room for considerable doubt about the stability of even their apparently reasonable findings. In some cases (e.q. in Minnesota, and probably also Pennsylvania) this may have been due to the fact that research on past sentencing practice was never intended to play an important role in shaping future sentencing policy. But in other cases, guidelines have been said (at least by way of advertisement) to have an empirical basis; even so, no attempt at statistical validation was made. Yet this could easily have been done in New

214 Jersey, where there were over 11,000 cases in hand;38 it could also have been done (though probably not quite so easily) in Michigan, where the initially selected sample contained nearly 6,000 cases. It probably could not have been satisfactorily done in Massachusetts, with a total of only 1,400 eases, 39 but that means that the Massachusetts sample was probably too small--suggesting that the issue of validation was not thought of by the Massachusetts research team before they began their work. (Although, to be fair, the decision to select only a sample of this size may well have been dictated by budgetary considerations. This does not seem to me to be an excuse; those who fund research of this sort ought to be told that they need to spend enough that the research can be done right, and that they should otherwise save their money. Few of us in the business of social research seem prepared to take this hard line, alas.) PROBLEMS OF MODEL DEVELOPMENT Having collected data on past sentencing practice, the next step is to analyze those data so as to come up with a model that satisfactorily describes that past practice, which can in turn serve as the basis of sentencing guidelines. This step can involve a number of technical--mostly statistical--problems, some of them of a quite formidable kind. It is not my purpose to deal with these problems, since they are dealt with at length elsewhere in this volume. But it is perhaps worth expatiating a bit at this point on what a model is, in this context, since to do so may help us to see more clearly where most of the research aimed at providing an empirical basis for sentencing guidelines has fallen short of its goal. By a model in this context is meant a description that shows the ways in which such things as seriousness of offense, vulnerability of the victim, race of the offender, etc., are related to sentences. Such a description, especially if it is the outcome of a statistical analysis, is often presented in the form of an equation (though it is important to note that such equations always can, at least in principle, be trans- lated into words). In an ideal world, as I noted, this kind of model building would be based on some theory or theories about decision making, which would have entailed descriptions about the relationships between independent

215 variables (such as prior record) and sentence outcomes. As should by now be apparent, the world of sentencing guidelines is far from an ideal one. Even so, something can be done to summarize the data in a parsimonious and possibly informative way, to show what variables are associated with variation in sentencing (and thus to show what things are not), and--more important--to say something about the relative strength of the association of each variable, holding constant the effects of the others in the model. The aim is to find the model that accounts for the greatest proportion of variation in the dependent variable (e.g., length of prison sentence) and that includes no variables whose effect is, on average, irrelevant or trivial. Suppose, for example, that we had some data on sentencing in the past in some jurisdiction and that these data included an offense seriousness score of some kind, which ranged from zero (for slitting in the street) to 100 (for multiple rape murders). O Suppose further that we analyzed these data and found that the best prediction we could make of the sentences actually imposed could be obtained by multiplying the seriousness score by 5.5, so that (predicted prison term in case i) = 5.5* (seriousness score in case i). (By the "best" prediction I mean the one that was nearest to being correct, on the average, in a sense to be discussed further below.) Suppose further that including other variables in the prediction equation did not improve its accuracy--perhaps because we had failed to measure the things that are really important determinants of sentence length. This is the kind of result one might obtain by using the statistical technique known as regression analysis; in general terms, the equation representing this result is conventionally written Yi = a + bxi + e (1) where y stands for the dependent variable (in this case, predicted prison term, say in months; x stands for the independent variable (in this case, the seriousness score); a is a constant term that can be thought of as the prison term given to cases with a seriousness score of zero; e is an error term that shows, for case i, how far the prediction "missed" in that case; and, finally, b

216 is a regression coefficient, which shows how much the seriousness score must be weighted in order to yield our "best on average" prediction of prison terms (in the example, the value of this coefficient is 5.5). Graphically the situation is represented in Figure 5-1. Without going into technical detail,4l a few important points may be noted here. For one thing, the relationship between offense seriousness and prison terms depicted here is a straight-line relationship; the score is assumed to have the same effect on prison terms throughout its range (which we have assumed to be between 0 and 100). Second, as might be expected, very few of the cases plotted in Figure 5-l lie exactly on the straight line that represents the regression equation; in some cases the sentence actually imposed was higher than the equation predicted; in some cases, lower. Thus for cases whose actual terms are above the regression line, the error term e would be positive; for those below the line, e would be negative. Third, the mathematics of regression as a statistical technique are such that they yield the coefficients a and b, which will produce the 60 54 48 UJ a m° 30 cat cat UJ 42 36 24 18 12 6 o - 0 50 100 OFFENSE SERIOUSNESS SCORE FIGURE 5-l Illustration of a Hypothetical Relationship Between an Offense Seriousness Score and Jail or Prison Terms Imposed

217 straight line that is "closest" to the observed data points, in the sense that the line minimizes the sum of the squared deviations (roughly speaking, the e's) that represent the extent to which the "best on average" prediction misses its targets. I return to this last point in a moment. Of course, in actuality it would be unreasonable to try to predict sentencing outcomes using only one other variable--even one so reasonable as our seriousness score. In all likelihood, the "best" prediction of sentences might make use of the information from, say, three or four variables--including, say, prior criminal record, race of offender, vulnerability of victim, etc. In this case, the equation whose coefficients would be estimated statistically would take the form y = a + blX1 + b2X2 + b3x3 + b4x4 + . . . bnxn + e . (2) (In this equation I have omitted the subscript i for the sake of simplicity.) With a number of independent variables rather than just one, the mathematics get more complicated (and the computing bill increases); in general, however, the principles are the same as in the one-predictor case. One important difference is that, with more than one independent variable, each regression coefficient represents the effect of its associated variable, holding constant the effects of the other variables included in the equation. That is, coefficient bl, represents the weight to be given to variable xl, controlling for the effects of variables x2, X3, etc.; and the same for the other b's. Again, for purposes of this paper I neglect a good many technical issues. One, however, must be emphasized. Statistical procedures like multiple regression can tell you what things may safely be left out of an equation (or model),42 but they cannot by themselves tell you what variables or sets of variables should be put into such an equation, to be tested against the data, in the first place. Of course, a researcher may try all possible com- binations of the variables in the data, say, three or four at a time; again, however, it would be utterly un- reasonable to do this with, say, the 874 variables in the New Jersey sentencing data.43 Plainly the researcher must make some choices; here again, it would be helpful to have some kind of theory to guide those choices. How have those who have done research on sentencing with a view to developing guidelines handled these

218 matters? Their statistical analyses have not always been as clearly or completely described as one might like, but most of them seem to have proceeded in more or less the way outlined below. pj~E:~l Carry out univariate analyses of all vari- ables for which data have been obtained, omitting those that turn out to have high proportions of missing data,44 and also excluding those with highly skewed distributions, which make them unsuitable for further analysis. For example, dichotomous categorical variables that are split more extremely than 70:30 are likely to give unreliable statistical results (J. Davis, 1971). Interval-level variables that are badly skewed--numbers of prior arrests or convictions, for example--may be transformed by taking logarithms or square roots, so as to make their distributions more nearly normal and thus statistically more tractable.45 Step (2) Test all bivariate relationships among those candidate independent or explanatory variables that survive Step (1) and between those variables and the dependent or outcome variables (e.g., incarceration or not; length of prison term), again omitting all those that show no association with the outcomes one hopes to predict. One is then left with a subset of the original candidate explanatory variables, each member of which has been shown to be associated by itself, more strongly than one might expect by chance,46 with some sentencing outcome or outcomes. Step (3) Attempt to combine the survivors from Step (2) in some kind of multivariate analysis (compare Zimmerman and Blumstein, 1979:10) to find the combina- tions of variables that best predict the outcome vari- ables in which one is interested. This is, of course, the model building process I discussed earlier, aimed at producing something like equation (2) above. This three-step process is by no means unusual in nonexperimental social research; but it can lead to highly misleading results, especially if it is not done with some care and sophistication--qualities that are unfortunately missing from many of the analyses of past sentencing done by guidelines researchers. To begin with, the models used by all those researchers, so far as I am aware, have been of the simplest possible kind; they have, in fact, been linear additive ones like that described by equation (2) above. They assume that the

219 effects of the various independent variables simply add to one another to produce (in effect) a straight-line increase in, say, lengths of sentences. Such simplicity is both pleasing and useful in many contexts. Yet there is surely no reason to believe that judges' sentencing practice is really like that; indeed, there are plenty of reasons to doubt this. For example, such a model has as a consequence the fact that the weight (the b coeffi- cient, estimated from the data) will be the same for all cases, so that, for example, each prior conviction is supposed to have the same average effect on, say, prison terms. It is at least as plausible to suggest that after a certain number, prior convictions have successively less influence on sentences, and that after a certain point--the upper threshold of badness--they cease to have any further effect at all. Moreover, the models used by most sentencing researchers have assumed that the variables used in them have independent effects only; each one exerts its own separate push on sentencing outcomes. It may well be that there are in fact interactions between some vari- ables, so that, for example, if two are present in a particular case they have a greater effect than the sum of what each would have separately. (If, for example, an offender uses a weapon and inflicts severe injury, this may lead to a heavier sentence than the separate effect of either factor would suggest.47) There are statis- tical techniques for detecting this if it happens, but those techniques seem not to have been used by any guidelines researchers, in part because their use requires at least some hunches, if not theory, about the kinds of interactions that are reasonable to look for. Moreover, having thrown out a number of variables at steps (1) and (2) of their model building, they could not have considered some of their interactions. It could be the case that a variable has no correlation with the dependent variable when considered by itself yet will be seen to have an effect if some other variable's effect is held constant; again, however, hunches or theory are required to sort this out.48 Finally, the crudely empirical procedures used by many researchers in this field can lead to apparently nonsen- sical results, especially if (as has generally been the case) no statistical validation is carried out to see if the results obtained may just be the result of chance variation. An example is found in the research done by Zalman et al. (1979) in Michigan as a preliminary to

220 developing that state's guidelines. Zalman and his colleagues assumed that sentences were a function of three kinds of variables: some relating to offenses, some relating to offenders, and another category includ- ing such things as race, region of the state, and social status. They left the third category aside for most of their analyses, on the ground that these were not explicitly considered by the judges (a point to which I return below). They carried out regression analyses for each of 10 major categories of offenses, in which sentences were predicted using whatever offense and offender variables had survived Step (2) of their work. Table 5-2, which is based on data from Zalman et ale (1979:95), summarizes the results they obtained in analyzing the in-out decision for their category of sex TABLE 5-2 Statistically Significant variables in the In-Out Regression Equation for Sex Crimes in Michigan Sentencing Research b beta F Offense variables Seriousness (stat. max.) .0009.186 49.1 Extent of mental trauma .1390.086 12.0 Bodily beatings -.0720 -.080 9.8 Offender variables Number of incarcerations .093 .198 56.5 Relation to criminal justice system .093 .189 55.4 "Good moves" since arrest .204 .218 69.8 Type of work -.085 -.130 25.7 Reason for leaving school .108 .108 18.1 Drug use status .093 .079 9.6 Alcohol use .045 .084 10.7 Number juvenile violent felonies -.318 -.087 12.2 Residential stability .042 .077 8.8 Detainers outstanding .133 .071 7.7 Adjusted R2 = .31 SOURCE: Zalman et al. (1979:95).

221 crimes. The coefficients shown in this table were all statistically highly significant,49 yet it is clear that some of them are counterintuitive if not downright nonsensical. For example, the coefficient for "bodily beatings" of the victim is negative, meaning that such beatings had the effect of reducing sentences; similarly, the negative coefficient for "number of juvenile violent felonies" suggests that the more such crimes the offender had committed, the shorter the sentence received. There is no reason to believe that either of those things is true. These results could have been due to a statistical fluke (since no separate validation was performed); they may have been due to the effects of measurement error or to correlations between the suspect variables and some other things; Zalman et al. seem, however, to have accepted them as being what the data show. The Michigan researchers found that their models did not explain very much of the variation in sentences in their data; indeed, when predicting inrout sentences for sex crimes, they were wrong more often than right (see Zalman et al., 1979:97). They then concluded, at several places in their report, that there was a lot of "disparity" or unjustified variation displayed by sentencing in Michigan (see, e.g., pp. 170, 270-72, 277-78). This sweeping conclusion is not justified by their analyses; that the data did not fit their models may merely have shown that their models were wrong. (The counterintuitive coefficients they found certainly suggest this.) Such findings may furnish a handy stick with which to beat the judiciary, if one is intent on developing guidelines; judges are, after all, typically But Zalman and his colleagues certainly did not demonstrate the exis- tence of excessive or inexplicable variation in sentences in Michigan; more probably, they simply should have rejected their model. unschooled in multivariate statistics. How Many Models? How many models of the kind we are considering need to be developed, in an analysis of sentencing practice that is aimed at the construction of sentencing guidelines for the future? This is a somewhat complex question.50 Reculer pour mieux sautes: The object of the exercise is to identify (without benefit of theory, or of clergy either) those factors that appear to have been important

222 determinants of past sentencing practice so that some of those factors can be incorporated into prescriptive instruments that, if followed, will result in sentences in the future that are more or less like those in the past. This does not mean that the description of past practice needs to be very detailed; indeed, as I noted earlier, it is one of the strengths of the Gottfredson- Wilkins concept of guidelines that it makes do with a relatively small number of offense and offender vari- ables, leaving room within the prescribed ranges for judges to make minor adjustments and allowing them to go outside those ranges in appropriate cases. What is important is that the model(s) on which the guidelines rest should be accurate; that is, they should not omit things that were important determinants of past sen- tences, nor should they include things that were not. Furthermore, the statistical analyses of past sentencing should yield weights that reflect, at least approximately the relative strengths of the "effects" on sentencing outcomes associated with included factors. These weights do not need to be terribly precise, since they will almost certainly be simplified (e.g., rounded to one decimal place) in the guidelines themselves and may be explicitly modified on grounds of social policy.5 They should not, however, be wrong. Unfortunately, a good many of the analyses done by guidelines developers to date do seem likely to have yielded results that were wrong in important respects. I have already noted that most of the statistical "models used by these researchers were of the simplest possible (linear, additive) kind. That apart, it seems to have been thought by many of those working in this field that a single "model" of past sentencing practice will suffice; but there are reasons for thinking that this is probably not the case. To begin with, sentencing involves at least two different kinds of decisions, both of which guidelines may purport to regulate. On one hand, there is the decision whether to incarcerate; on the other, there is the decision, for those to be incarcerated, as to the length of incarceration.52 The two decisions are not psychologically distinct;53 the problem is that they apply to two different sets of offenders, the first-- referred to as the "in-out" decision--being asked for all sentenced offenders, the second arising only for that subset of sentenced offenders who are incarcerated. The first decision thus essentially involves a dichotomous

223 outcome;54 the second, an outcome in numbers of months or years. The optimal statistical machinery for predicting or describing these two kinds of outcomes is different. Ordinary least-squares multiple regression can be used with a dichotomous-outcome variable (such as "in" or "out"); if this is done, then the dependent variable (y, ~ , ~ , , in equation (2) above) is interpreted as a probability of incarceration. Each individual's score on this variable is 1 if he or she is incarcerated, and 0 if not. The regression weights (the b's in the equation) then reflect _, changes in that probability, for unit changes in each independent variable (e.g. number of prior convictions). There are some theoretical objections to this procedure, which can be overcome by using some alternative statis- tical techniques, most of which are less well known, more complicated, and more expensive computationally, than conventional regression; in practice the use of these more sophisticated methods does not seem to yield very different results.55 A more important reason for considering these two sentencing decisions separately is that they may well be governed by quite different factors. Once a judge has decided to incarcerate an offender, he or she may well consider a further set of facts about the case in deciding how long a sentence should be imposed. Even if both decisions are to an important extent influenced by the same factors (e.g., seriousness of offense, however defined), the weights given to those factors--to be estimated by regression equations--may be different; this is especially likely since, as noted earlier, the length-of-sentence decision should be estimated from data only on those offenders incarcerated, and not on all of those sentenced. This point has been neglected by many guidelines researchers. Thus, for example, despite having called attention to the supposedly bifurcated nature of the sentencing decision, Wilkins et al. (1976) in fact fitted models to "the sentencing decision . . . treated as an interval variable'. (1976:84, emphasis added). All prison sentences in their sample were given scores equal to the number of months of incarceration involved, whereas nonincarceration sentences were given a value of zero; the same shiny, it appears, was done by the Massachusetts researchers.5 Of course it may be that in some jurisdictions, the same factors--with the same weights-- apply to both the decision to incarcerate and the "how

224 longs decision. But this, if true, can only be dis- covered by analyzing the two decisions separately in the first place. Similarly, the variables that predict sentences in cases disposed of by pleas of guilty may be different from--or have different weights than--those that predict sentences in cases that go to trial. This seems to have been the case in Massachusetts, where guidelines to be used on tried cases were in fact based on analyses of all cases, including the much more numerous cases disposed of by guilty pleas.57 Another aspect of the "how many modelers question concerns the choice between developing a single guide- lines instrument (like the matrix reproduced as Table 5-1 above) and developing separate offense-specific prescrip- tions for separate categories of offenses. The former strategy is exemplified by the Massachusetts, Pennsyl- vania, and Minnesota guidelines; the latter strategy was employed in New Jersey (McCarthy, 1978) and is currently being tested in Michigan (Zalman et al., 1980). The latter approach has a number of advantages. For one thing, on the assumption that the severity of the prescribed sentence will be some function of the serious- ness of the current offense, this seriousness in turn will be a function of things that are not, or are not necessarily, the same across all categories of offenses. To take an obvious example, the relative seriousness of offenses against the person, such as assault, rape or robbery mav be a function of the de-tree of physical Injury Incenses or inflicted, and the physical vulner _ , . . . . . . . . . . ability of the victim(s); these would not He relevant co most offenses classified and dealt with by the courts as burglary, theft, or fraud. In the latter offenses, however, the value of property stolen or damaged might well be a factor taken into account by the courts, although this would not normally be relevant to crimes against the person. In the Michigan guidelines, for example, matrices are mren~nE - d for 1 1 different categories of offenses (each ~ ~ J at which is in turn the result of a grouping of several similar offenses as defined by statutes). For each category of offenses, the matrix is defined by a number of rows headed "offense severity, n which are in turn defined by the presence or absence of factors relevant to that category of offenses; the columns are defined by categories of "prior record." But the "severity" (row) variables are based on somewhat different factors,

225 depending on the category of offense concerned. In the case of sex crimes, for instance, the "offense severity" variable depends on (1) the presence, type, and use of a weapon; (2) physical attack andVor injury; (3) whether the victim was carried away or held captive; (4) the total number of victims; (5) the vulnerability of the victim; (6) the total number of offenders; and (7) the degree of injury to the victim. These factors are given scores, which are said to be based on the results of the earlier analysis by Zalman et al. (1979) of felony sentencing in Michigan--although, as we shall see, there is in fact little correspondence. In the Michigan guidelines, the prior record variable (which defines the columns of the various matrices) is calculated in the same way across all offense groups. This is obviously a defensible approach to the question, as it can be argued that the number of an offender's prior arrests or convictions is likely to have the same weight in determining the sentence, regardless of the type of the latest offense. However, it might well be that in some cases courts looked not only at the numbers of prior arrests or convictions, but also at the types of those offenses--and regarded repeated convictions for offenses of the same kind (e.g., violence against the person) as more serious than they would an equally lengthy 'mixed record. If so, this should be detected by an offense-specific approach to modeling like that done in Michigan. In the New Jersey guidelines, the "offender" variables included vary for different cate- gories of offenses; even when variables are called the same thing in two or more different cases, the defini- tions of the factors concerned often differ. Here, however, it seems likely that these variations--which purport to be purely descriptive of previous sentencing practice in New Jersey--would not stand up to closer statistical scrutiny (in particular, validation in the statistical sense explained earlier). An analysis that Bridget Stecher and I carried out some time ago showed that the different offender variables used for different offense categories in the New Jersey guidelines did not distinguish patterns of incarceration different from what would have been obtained if the same offender variables had been used in all cases (see Sparks and Stecher, 1979). The offense-specific approach to developing guidelines permits finer discriminations than may be possible with analyses in which all types of offenses are lumped together. Guidelines based on statistical analyses done

226 separately for rape, robbery, burglary, etc. may thus better reflect the prior sentencing practice they are supposed to perpetuate. They have the obvious practical disadvantage that many more cases will be needed for statistical analysis; even with their relatively large sample (about 6,000 cases), Zalman et al. (1979) seem occasionally to have felt the pinch of small numbers, which would have been more painful had they carried out the statistical validation that they should have done. A further advantage of the offense-specific approach is that it makes it unnecessary to develop a measure of offense seriousness that cuts across different categories of crime, e.g. burglary and robbery. If all previously sentenced cases are analyzed together in the model- building exercise, then some measure of seriousness will be needed to discriminate between, e.g., rape and overtime parking--especially since this concept is so widely used, by judges, parole boards, and the public, to justify the severity of sentences. In this case, how might such a measure be devised? There are several possibilities, exemplified by the guidelines so far developed: (1) A score supposed to reflect offense seriousness may be devised by the researcher. This will probably reflect an ordering of a commonsense kind of different categories of crimes, possibly influenced somewhat by statutory maximum penalties. This appears to be what was done in Massachusetts, for example, by Wilkins et al. (1976), and by the Michigan researchers.58 (2) Some more empirically derived measure of perceived seriousness of various offenses may be used, for example, like those derived from survey data by Sellin and Wolfgang (1964), Rossi et al. (1974), or Sparks et al. (1977). However, apart from doubts as to the extent to which such perceptual rating reflect real differences in offense seriousness or sanction severity, and further doubts as to whether they really provide interval-level measures (as some have claimed) rather than mere rank orderings, it is far from clear that there is much consensus in the population--even in a particular jurisdiction or at a certain time and place--as far as such assessments are concerned. If there is not, whose views should prevail?59 (3) The most purely descriptive method of estimating relative seriousness is to create what are called dummy variables for the various offense types, which in effect

227 make it possible to distinguish rape, robbery, etc. from all other offense types, to see how much those categories affect such outcomes as lengths of prison terms. Thus the dummy variable for robbery will have one weight associated with it; that for rape, another, and so on. This procedure, though it has more complications than this description suggests, can work pretty well; it has not, however, been used (so far as I know) by any guidelines researchers. The analysis of prior record poses similar though much less difficult problems, in part because most variables of this kind (e.g., number of prior arrests or felony convictions) come naturally in the form of an interval- level variable. But there may be problems of deciding what to count--do we treat prior arrests, prior convic- tions, or prior incarcerations as the ~best" measure of prior criminality? The answer to this is almost cer- tainly not to throw all three of these things into the same regression equation. Rather, it is better to find the variable or combination of variables that provides the most robust and strongest explanatory power; whether this variable or combination of variables is later included in the guidelines is another matter. What Variables Should be Included? Another question to be asked at the model-building stage concerns the candidate explanatory variables that should be allowed to enter into analyses of past sentencing practice, if the construction of decision-making guide- lines is the ultimate object of the exercise. Should one--following the example of Zimmerman and Blumstein (1979) and other researchers--exclude variables such as sex and race from all modeling efforts, on the ground that such variables are (to put it mildly) unlikely to be regarded as acceptable for inclusion in the guidelines that are meant to be the final product of the analysis? It seems to me that the answer to this question is no, for several reasons. To begin with, if the analysis of past sentencing behavior is to have any point at all in this context, it must surely reflect some degree of fidelity to the data on antecedent sentencing practices; otherwise, why do it? To see this clearly, let us consider a situation in which an unacceptable variable (from a guidelines

228 constructor's point of view) has in fact been influential in sentencing decisions in the past: Race is probably a good example. Suppose that in jurisdiction X data on past sentencing practice are collected and analyzed, and it is found that blacks or other racial minorities were given markedly heavier sentences than whites--controlling for everything else that might be relevant. Surely this is something that morally sensitive guidelines developers ought to be eager to show, in order to promote the case for their brand of sentencing reform? The concept of sentencing guidelines has not infrequently been attacked, on the ground that it will lead to the institutionali- zation of injustices (like racism) that have charac- terized sentencing practice in the past. This criticism loses its force if the distinction between description of (past) sentencing practice and prescription of (future) sentencing practice is recognized and clearly maintained. Moreover, the exclusion of a generally influential variable--even a morally iniquitous one like race--from a multivariate analysis of past sentencing practice may lead to incorrect estimates of the effects of other variables included in the model; any guidelines cons- tructed on the basis of such a model will thus do precisely what is not intended: they will institutiona- lize the effects of race. Thus, to take a simple example, suppose that we fit a linear additive model to the data and find that expected terms of incarceration y* are given by y* = 5 * (Offense Score) + 2 * (Prior Arrests) - 3 * (Race, 1 = white) . This says approximately that, on the average, given comparable offenses and prior records, white offenders receive lighter sentences. Evening up this injustice when constructing guidelines would involve setting the regression coefficient for race to zero, so that whites and nonwhites would get the same expected terms, given their offense scores and prior records. Suppose, however, that race were associated with both offense score and prior record, e.g., that blacks tended to commit less serious crimes but to have more prior convictions than whites. If this is the case, then an equation that does not include race as an independent variable will yield different coefficients for offense score and prior record, from those obtained from an equation in which race is included. This difference is (3)

229 precisely that due to the effect of race on prior sentencing practice. (In the situation just hypothc sized, a model that excluded race would underestimate the effect of offense score and overestimate the effect of prior record, which should obtain if race were ignored. Translation of those effects into guidelines would thus build in an effect of race.) The main objective of this modeling stage, then, should be to try to obtain estimates of the relative effects of the various variables which, in the past, have had an appreciable effect on sentencing decisions. Some of these may be included in the guidelines that will later be developed; some (e.g., race) will not, but care must be taken to exclude the indirect effects of these when it comes time to make up the guidelines themselves. Overall, the statistical models developed at this stage may not account for an overwhelming amount of the total variation in previous sentences, even if the statistical work has been better done than that of many guidelines researchers. This may indeed be because there was not much consistency in previous sentencing practice; but it may also in part be because the models themselves, which deliberately incorporate only a few of the most important determinants of previous practice, can yield only a broad-brush picture of the ways in which sentencing was done in the past. Given the fact that sentencing guidelines (of the Gottfredson-Wilkins type) themselves will have a rela- tively simple structure, containing enough flexibility to permit judges to make finer discriminations on their own, this should not matter. There is, however, a final and important point, which (so far as I can determine) has received no attention in research aimed at developing guidelines but needs careful attention at the model building stage. This concerns the ways in which empirically derived models of past practice have failed to describe it. Suppose, for example, that statistical models have been fitted to length-of-term decisions in some jurisdiction, and the best-fitting model is able to account for 60 percent of the variance in lengths of terms. That means that 40 percent is still unaccounted for; where is it? To answer this question, it is useful to look at the "residuals" (observed sentence minus that predicted by the model), which is often best done by plotting these against the predicted values themselves (compare Mosteller and Tukey, 1977:Ch.16). How does the model miss?

230 . . . · . . . · A< _ _ ~_ Lao.:`,' I I I C , ~o ~At 1_ LO N 1 1 1 . 'at' ~ I______ . 1 1 O Cat O ~Cat O ~ O 10 ~ O ~ _ _ (s4~UoUU) 3ON31N3S 03101a3Ud To . _ V, E To + ' a, v) o u) o at: To a) ·rl P4 ~ 8 U] in a, .., Q o a · 0 · - · V in U] o in · - U] I: a ·,' a 0 ~ .~' v o P' U] 1 U) U~ ~Q ~Q H . ·e 00 - e o U] ·~ · - v . a) u] y u) c) o ]

231 In our reanalysis of the Massachusetts guidelines construction data, Bridget Stecher and I carried out a number of analysis of this kind; the results of one such analysis are illustrated in Figure 5-2. This scatterplot shows, first, that the sentences "predicted" by the Massachusetts guidelines were not all that close to the sentences actually imposed, even in the construction data; most of these residuals are not that near to the zero line. Also apparent from Figure 5-2 is the fact that in a small number of cases--about 40 of over 1,400--the length of term actually imposed was wildly different from that "predicted" by the guidelines model. In other words, there were evidently a few cases in which the sentences actually imposed were very different from what one would predict from the "best" account that could be given of sentencing practice over the sample as a whole. ~ ~ ~ ~ ~ ~ ~ Such extremely deviant cases obviously make an inordinate contribution to unexplained variance. It is very important to ask: What are these cases like? Why do they differ so markedly from the mine-run of cases dealt with? Our approach to answering this question consisted of listing all the salient factors we could think of for each of the cases in question, and eyeballing the data to see if any plausible reasons appeared for such gross departures from the norms applicable to the rest of the sample. In a few cases, we found factors that seemed to supply such reasons; for example, one of the extreme outliers had had no fewer than 19 previous prison sentences. But such satisfying reasons could not be found, at least in the data avail- able to us, for all of the cases in question. The general point here, I believe, is that in esti- mating a model that will satisfactorily describe and/or explain past sentencing practice, it is important to exclude any egregious cases in which the imposed sentence is grossly different from what would be expected, given the general pattern of antecedent sentencing. It seems to me that this is so, whether or not a plausible explanation for those deviant cases can be found in the available data. It would no doubt be comforting to find such a plausible explanation; but in the nature of things, such factors as "judge temporarily insane," n judge had indigestion," prosecutor new to the job," etc. are unlikely to be recorded in the data available for analysis. Despite this, it seems reasonable to regard such gross departures--if any are found--as abnormal in some respect, and therefore to exclude them

232 from an attempt to model the majority of normal cases. A failure to exclude such grossly deviant cases may well result in misleading estimates of the general effect of explanatory variables (such as seriousness of offense and prior record) on the bulk of cases.60 It should be noted that no analysis of residuals--or, analogously, of mistaken classifications along the "in-out" dimension--has been presented by any of those who have so far carried out empirical research on sentencing with a view to developing guidelines. FROM MODELS TO GUIDELINES After an analytical model has been found that reasonably characterizes past sentencing practice, the next step (according to the original Gottfredson-Wi~kins concept of guidelines) is the construction of a prescriptive instrument that can be used to guide sentencing in the future. The various guidelines developed to date illustrate a number of ways in which this has been done; in all of these, however, the results of the empirical analyses have been heavily overlaid with policy consider- ations. Thus, for example, in the Denver demonstration model (Wilkins et al., 1976:41) six independent variables--number of offenses of which the offender was convicted, number of prior incarcerations, seriousness of the offense (as defined by research staff), weapon usage, legal status of the offender at time of conviction, and employment history--were found to be significantly associated with the sentencing decision. The guidelines themselves contained a matrix or grid for each of eight groups of felonies and misdemeanors; within each group, offenses were further classified by estimated seriousness, based on rankings by research staff; to this seriousness rating was added a "har~/loss modifier" ranging in value from zero for a victimless crime to five for death, though injury to victim was not significantly associated with sentence in the regression analysis. The offender score that defined the columns of the matrix was based on prior adult incarcerations, parole or probation revocations, legal status at time of offense, prior convictions, and employment history. The second and fourth of these were not significantly related to sentence in the regression analysis, and the weights assigned to each seem to have been purely judgmental.62

233 The Michigan felony sentencing project (Zalman et al., 1979) produced "empirical sentencing matrices" that tolerably well reflected the regression analyses that had previously been carried out (ignoring, for the moment, the methodological defects of those analyses discussed earlier). These empirical matrices were then used to construct guidelines. In this case, however, the guidelines differ in so many respects--size and shape of the matrices, variables included, weights assigned to them--that the empirical basis is hard to find; so hard, in fact, that a judge or legislator who had been sold such guidelines in part on the strength of their empiri- cal basis might well feel that he or she had bought a pig in a poke instead. For example: an offender convicted of violent rape, who had two prior convictions of which one was also a sex crime, would (on certain not unreason- able assumptions) have fallen into a cell in the appro- priate empirical matrix with a median of 53 months and a range of 6 to 180 months; the same offender would have fallen into a cell in the guidelines that had a median prescribed term of about 108 months, with a "normal" range of 96 to 120 months.63 Other states' guidelines, though yielding less bizarre results, also show substantial departures from the results of empirical analyses, on what are avowedly grounds of policy. Thus, for example, the Minnesota sentencing guidelines were developed after analyses that showed seriousness of current offense (as ranked by the commission) and prior record to be the most important determinants of sentence severity; employment status-- which was "marginally associated" with the decision to incarcerate in the construction data--was deliberately excluded from the offender score used in the guidelines. As noted earlier, the Massachusetts guidelines do not take a matrix form, but consist rather of a fairly straightforward transformation of (unstandardized) regression coefficients into weights that permit calcu- lation of an "expected" sentence. Table 5-3 shows that the weights finally adopted in the guidelines are, with the exception of the one for weapon use, fairly close to the coefficients obtained by regressing sentences in months on those variables, counting (incorrectly) all "out" cases as zero. However, the variables included in the guidelines themselves and the scoring of the "offense seriousness" variable were not purely empirically derived; instead, they were based on policy decisions by the project's judicial steering committee.

234 TABIE 5-3 Unstandard ized Regr ess ion Coef f ic tents From An alys is of Massachusetts Gu idel ine Cons tr uction Data and Weights Given to the Same Factors in the Apr il 1980 Version of the Massachusetts Guidelines Weight in Massachusetts Unstandardized Guidelines Factor b Coefficient (April 1980) Current offense seriousness 1.262.1 Use of dangerous weapon 2.13*9.0 Degree of injury to victim 9.549.0 Seriousness of prior record 1.341.6 (Intercept) -1.18- * p = .118; all other coefficients significant below .OS. I am certainly not suggesting that it is in some sense wrong for considerations of social policy, morality, or whatever to enter into the formulation of guideline-- even if empirical models of past practice are the primary determinants of the sentences the guidelines prescribe (which is, of course, itself a policy decision). Even supposing that the modeling of past practice has been carefully and correctly done, there is bound to be a fair amount of "smoothing" of the results of that modeling exercise involved in the translation of those results into workable guidelines. In particular, with guidelines presented in matrix form (Table 5-1 above), the rows and columns will typically have to be defined by grouped offense and offender scores, so that even quite substan- tial alterations in scoring may have little effect on the classification of cases within the matrix. Technical matters of this kind need not involve explicit alteration of the results of the empirical analyses--like that involved in, say, eliminating the effects of racial discrimination or excessive regional variation that may have characterized sentencing in the past. There are, however, two very fundamental respects in which guidelines--even if they purport to be empirically based in a very strict sense--are necessarily shaped by judgmental or policy considerations. These concern the

235 in-out decision and the width of the prescribed normal range of jail or prison sentences. The Decision to Incarcerate In most if not all of the analyses reported to date, the probability of incarceration increases directly, and in a fairly orderly fashion, with seriousness of the current offense and prior record (see, for example, Zalman et al., 1979; Wilkins et al., 1976; Parent, 1979 [personal communication]; Zimmerman and Blumstein, 1979). But a probability of imprisonment is of very little use, when guidelines are concerned. Suppose that a statistical analysis of past sentencing practice showed that 70 percent of all cases falling within a given cell in a guidelines matrix had in the past been given "out" sentences such as probation or a fine. How can judges be instructed to comply with this finding in sentencing in the future? They cannot send 30 percent of the offender to prison--at least unless more elegant forms of "split sentence" can be invented than now exist in most juris- dictions. ~ ~ ~ ~ ~ to the effect that only 30 percent of the group of offenders falling into that cell in future should be incarcerated. It may be that some further criteria (beyond those used to construct the matrix) can be found that will distinguish the 70 percent of "out" cases in the cell from the 30 percent going "in." This is by no means guaranteed, since the 70-30 split may reflect, e.g., random variation among judges. Nor can they easily comply with a prescription The only purely statistical way of complying with the empirical findings would be to toss a biased coin--designed to come up heads 7 times out of 10, on average--when dealing with cases in that cell; such a procedure is unlikely to commend itself to anyone. The only alternative, however, is to declare that cases falling into this cell shall presumptively be treated as "out" cases. To in possible to do this. and still provide a range of months or years to be served if the presumption is overridden; both Minnesota's and Pennsylvania's guide- lines, for example, do this. The need to rely on a _ _ _ , _ , _ . presumptive "in" or "out" decision, however, does away with the flexibility inherent in the concept of a normal range, which was said earlier to be a distinctive feature of the Gottfredson-Wilkins concept of guidelines (and which of course remains intact in the case of parole

236 decision making, in which the concept was originally developed). Moreover, the choice of which cells of the matrix to treat as "in" and which to treat as "out" is obviously a matter of judgment, not something capable of being settled empirically. (In Zimmerman and Blumstein's (1979) reanalysis of the Denver data, cells containing 51 percent of cases incarcerated were arbitrarily classified as "in" cells in order to test the predictive accuracy of Her morel. ;~ in Al ik~l" that this cutting point would , be accepted in practice.) Finally, even if an analysis of antecedent practice revealed a fairly sharp split between "ins and "out" cases (70-30, say, or even 65-35) it may be difficult to declare that cases receiving the less common outcome after implementation of the guide- lines are departures from the guidelines--unless the grounds for departure are quite strictly specified (as they are, for example, in Minnesota and Pennsylvania). It may be thought that the presumptive character of the "in-out decision can be avoided by designating ~out" sentences as being of zero months and including them in prescribed guidelines ranges; as Table 5-4(a) shows, this is done in the current Michigan guidelines. Similarly, under the Massachusetts guidelines it is possible to have an expected sentence of zero; it (i.e., nonincarceration) is the lower range limit for cases with a guideline score Fee ~ ~- ~ & ~ ~ ~ ~ ~ ~ ~ ~ or expected sentence of between one and five months. But the difficulty with this approach is that it gives virtually no guidance on a crucial question, Should this offender be incarcerated or not? A guidelines matrix containing a range of 0-18 constrains only the upper end of that range; an "out" sentence is by definition not a departure from the prescribed range, but neither is any sentence of incarceration of 18 months or less. Even the New Jersey guidelines, which show the proportions of offenders (in the construction data) who were not incarcerated, do a better job of structuring discretion than this. In summary, the problem is that empirical analysis of past sentencing can yield only probabilities of imprisonment, conditional on various offense and offender attributes; it is difficult to turn these probabilities into effective prescriptions for future sentencing, since it is not easy to follow a rule that says something like , "Do such-and-such 35 percent of the time." It may well be, as Zimmerman and Blumstein (1979) have suggested, that one can identify three groups of cases: a group with very high rates of incarceration (presumptively "in"

237 TABLE 5-4 Michigan Sentencing Guidelines for Burglary and Residuals from an Additive Model Offense severity 0 Prior record A B C D 1-2 3-4 5-6 - E F 7-8 9+ (a) Michigan sentencing guidelines for burglary offenses with statutory maximum terms of 180 months; figures in table indicate minimum sentences, in month se Low (0-3) 0-12 0-18 0-186-24 12-30 18- 36 Medium (4-6) 0-18 0-18 6-2412-30 24-42 36- 48 High 12-30 24-48 36-6048-60 48-60 60-120 (b) Midpoints of ranges in (a) Low 6 9 Medium 9 9 High 20 36 15 20 48 54 9 15 20 26 32 42 54 90 (c) Residuals (in months) from fitting additive model to data in (b), and row and column effects in months Row Effects Low 2.55.5000-10.5 12-5.5 Medium 000.5-0.56.50 17.50 High -20.0-4.02.52.5-2.517.0 48.531.0 Column effects-8.5-8.5-3.03.08.024.5 --17.5 aZalman et al. (1980). in the guidelines); a group with very low rates (presump- tively to be "out"); and a middle group with rates of incarceration around 40-60 percent (in which no presump- tion would be made). The difficulty remains, however, that designating some cases as presumptively "in" or "out" is likely to lead to changes in sentencing prac- tice. Consider a cell in which 80 percent of preguide- lines cases were imprisoned. If this cell is designated presumptively as "in," the proportion of cases imprisoned in this cell after the guidelines are implemented seems likely to rise, unless it should happen that judges will

238 find grounds to rebut the presumption in just 20 percent of the cases; it is not easy to see how they can be given guidance of a kind that is likely to bring this about. Widths of Prescribed "Normal" Ranges There also seems no way to answer the question "How wide should 'normal' ranges be?" merely by an analysis of past sentencing practice. Guidelines developed to date display wide variations in this respect. Those in Minnesota and Pennsylvania, at one extreme, average plus or minus 5 percent or so around midranges; in Massachu- setts, by contrast, the range of permitted variation is plus or minus 50 percent around the calculated guidelines sentences. Simple inspection of the frequency distribu- tions of lengths of terms in particular cells may show that these cluster within a reasonably narrow range an most cells; and an examination of cases falling outside that range may show that they have features that would justify their being treated as "departures. n But it may also turn out that this is not the case; if it is not, then decisions as to the widths of "normal" cell ranges will necessarily be made purely on grounds of policy, unless they are completely arbitrary. In summary, most of the supposedly empirically based guidelines that have been developed to date appear to have modified the results of their empirical analyses, to a greater or lesser degree, in terms of the choice of modeled variables to be included in the guidelines, and the weights used to calculate offense seriousness ratings and prior record scores. It is impossible to say just how different the resulting guidelines are from those that would have emerged from a stricter transformation of empirical models. While seriousness of current offense and length of prior record are the major dimensions of most guidelines developed to date, the definitions of these factors, and the scoring methods used to classify cases into guidelines matrix cells, seem in most cases to have been suggested rather than dictated by the analyses of antecedent sentencing practice earlier carried out. Of course, this is not necessarily a bad thing; careful empirical research on past sentencing can provide valuable guidance to policy makers in a variety of ways, even if the resulting guidelines are shaped by explicit considerations of policy--as was the case, for example, _

239 with the in-out lines in the Minnesota and Pennsylvania guidelines, which were largely determined by a notion of just deserts and a desire to limit incapacitation. But this kind of guidance suggests a very different role for research from that described by some of those who have advocated empirically based guidelines (e.g., Gelman et al., 1977); it also suggests a need for different kinds of research from what has been done for most guidelines that have so far been developed. ASSESSING THE STRUCTURE AND IMPACT OF GUIDELINES Evaluating the impact of sentencing guidelines may mean many things. Perhaps the most obvious of these concerns the question: "Do guidelines make any difference?" That is, if sentencing or parole guidelines have been intro- duced in a particular jurisdiction, do patterns of decision making in that jurisdiction subsequently change in ways desired by those who implemented the guidelines? What other consequences do guidelines have, e.g., on case flow, prosecutorial decision making, police practices, or other phases of the criminal justice system? These are questions of the "wait and see" variety; they entail before-and-after comparisons, of a kind with which this paper is not concerned. There are other evaluative questions, however, which are not of this kind: ques- tions that make no assumptions, or only the simplest assumptions, about the changes in behavior that may not take place after the guidelines are introduced; for example, they may rest on the assumption that the guidelines are strictly and rigorously complied with. Even if this assumption is made, there is still plenty of room for a question of the form, "So what?" In other words, suppose we neglect, for the moment, the variety of techniques discussed earlier in this paper, for constructing guidelines; suppose, moreover, that we assume that the guidelines--whatever form they may happen to take--are rigidly complied with, after their introduction. What can we say now--before the guidelines take effect--as to their likely consequences, under those assumptions? There are in fact several things which may be said relevant to the guidelines as constructed, rather than to the guidelines as they may (or may not) be consistently applied in practice. This section discusses some of these issues and some analy- tical methods that can be used to deal with them.

240 To begin with, how may one assess the structure of a set of sentencing guidelines? Typically, guidelines have taken the form of a matrix with rows and columns, defined by offense and offender scores of some kind, and cells containing "normal" ranges in which incarceration is prescribed. Do these ranges "step up" in a reasonably orderly fashion? Are the effects of offense and offender score reasonably consistent across the matrix--or are there some cells that--for whatever reason--contain ranges that are markedly different from what one would expect? Does the offender score, which is usually largely a function of prior record, have the same effect on prescribed sentences for the less serious offenses as it does for the more serious ones--or is it (for example) having more of an effect when the offense is less serious? It may be that those involved in constructing guidelines will decide, upon reflection, that what seemed like anomalies were in fact justifiable. For instance, it may be that there are some offense-offender combina- tions for which a very much heavier (or lighter) sentence than would be suggested by the general pattern of the matrix is reasonable. But they will not be able to reach this conclusion, unless the apparent anomaly is pointed out to them. And it may not be obvious from simple inspection of the matrix itself. A set of techniques recently developed by Tukey (1977) and his colleagues can be used to address some of these questions. Suppose we represent the ranges stipulated in guideline matrices by their midpoints, on the assumption (which is explicit in the Minnesota guidelines, and not unreasonable in others) that, all other things being equal, cases falling into a particular cell should normally expect to be given a term in the middle of the stipulated range. On this assumption, each cell in the matrix is represented by a single number (the midrange); and we can seek the relations between these midranges, as we move across and up or down the grid. Briefly, Tukey's method involves computing "effects" associated with each row and column of the matrix, and subtracting these from the cell midranges themselves to leave "residuals," which are the (positive or negative) amounts in each cell midrange that cannot be accounted for by the row and column effects. Table 5-4, which is based on the matrix in the Michigan guidelines for burglary offenses with a 120-month statutory maximum, illustrates this procedure. Table 5-4{a) gives the guidelines ranges themselves; Table 5-4(b), the midranges. In Table 5-4(c), the row

241 and column effects are displayed outside the grid; the cells of the grid themselves contain the residuals that are left after these two effects--which, in this case, relate to offense seriousness and prior record--are removed. In essence, the model fitted here is an additive one, in which the midrange for any particular cell can be represented by a row (offense) effect, plus a column (prior record) effect, plus or minus a residual that cannot be accounted for by the simple sum of those effects.64 If this model fits the data given by the midranges of the matrix, then the residuals ought to be more or less zero; and as Table 5-4(c) shows, this is by and large the case. Thus we might say that the Michigan guidelines for this group of burglaries prescribe midrange terms of about 17.5 months, plus or minus an effect depending on the seriousness of the particular offense, plus or minus an effect reflecting the offen- der's prior record, with generally small residuals. For example, for the least serious offenses of this kind, and for offenders with prior records in the "C" category, the middle of the prescribed range is 12 months, minus three months; equivalently, it can be thought of as 17.S months (the middle term across the whole of the matrix), minus 5.5 months for being in the least serious offense category, minus another three months for being in the "Cat prior-record category--in each case, there is no resi- dual, so that the overall effects reproduce the cell midrange perfectly. Analyses of this kind are useful in several ways. For one thing, simple additive models may not adequately reproduce the structure of the cell midranges; instead, the offense and offender effects may be related multi- plicatively rather than additively.55 For another, it may be that in some cells the residuals--that is, the difference between guidelines midranges and what would be expected given the general structure of the table--may be large rather than negligible or small. Inspection of Table 5-4(c) shows that this is the case for the cell for high offense severity and "A" prior record, for which the observed midrange is 20 months less than an overall additive structure for the matrix predicts; similarly, in the cell for high offense severity and "F" prior record, the observed midrange is some 17 months greater than the overall additive model predicts. An analysis like that of Table 5-4(c) readily displays such anomalies, and enables us to ask why they occur and if they are defen

242 sible.66 Parallel analyses can be carried out for other aspects of a guidelines matrix structure, e.g., the ratios of cell ranges to midranges, when these are or purport to be derived empirical!; rather than being laid down by fiat (as in Minnesota).6 The impact of a set of sentencing guidelines on the overall pattern of dispositions in a jurisdiction--even assuming that the guidelines are strictly adhered to--will in part be a function of the structure of the guidelines, e.g., the ranges and midranges prescribed by various cells; in part, however, it will be determined by the numbers of cases falling into the various cells. Thus, for example, Table 5-4(c) suggests that for the Michigan burglary guidelines, the lower righthand cell (high offense severity, prior record category "Fn) prescribes terms that on average are almost a year and a half heavier than the overall structure of the matrix would suggest. As I have noted elsewhere (Sparks, 1981) this tendency to produce guidelines structures that promise to thump the worst cases annears in several in, , _ different jurisdictions; it may be explained by the fact that cases falling in those cells really are much more serious (or at least that they were in construction data); it may, however, reflect nothing more than a guidelines developer's wish to appear suitably ferocious in dealing with arch-criminals. Either way, the fact that that cell prescribes heavier-than-average terms will make no difference, if no cases of that kind are ever dealt with after the guidelines are implemented. The importance of this can be seen by considering the distribution of cases (in the construction data) in the cells of the Minnesota matrix (Minnesota Sentencing Guidelines Commission, 1979). No less than 60 percent of those cases fell into the lowest criminal history category; only 8 percent of those offenders were impris- oned. Similarly, 78 percent of the cases (sentenced in 1978) had been convicted of crimes falling into serious- ness levels 1 through 4--that is, the least serious crimes covered by the matrix. In fact, the 4-by-2 submatrix in the upper lefthand corner of the Minnesota matrix contained almost two-thirds of the felons sen- tenced in Minnesota in 1978. For these cells, and several of their neighbors, the matrix prescribes a presumptive "out" sentence. Elsewhere in the matrix, heavy presumptive terms are prescribed; for example, those convicted of second-degree murder with a criminal history score of six (the worst possible) are presump

243 Lively to be sentenced to 27 years in prison. But such cases are very rare; and that cell will thus have only a slight impact on the overall pattern of sentencing in Minnesota under the guidelines. This fact was well appreciated by the Minnesota commission, whose legisla- tive mandate directed it to have regard to institutional overcrowding in devising the guidelines. A computer program for projecting not only the size but also the composition of the Minnesota prison population was developed by the commission's research staff and was used to illustrate the consequences of different policy choices concerning the n in-out line and lengths of presumptive prison terms (see Minnesota Sentencing Guidelines Commission, 1980); it was thus possible for the commission to choose from the several options available to it and to design guidelines that were consistent with the aim of keeping the prison population at an appropriate level.68 CONCLUSIONS It has not been the intention of this paper to criticize the research that has been done to date by those who have been involved in constructing guidelines; there is little profit, and even less fun, in doing that. It seems more important to ask what the future role of empirical research might be, not only in constructing guidelines but also in sentencing reform generally. It should be remembered that, as originally conceived of by Gottfredson and Wilkins, the notion of decision- making guidelines was a very simple one: all that they wanted to do was to "make explicit a policy that the U.S. Parole Board had in fact (despite its denials) been following: The "policy" consisted of according rela- tively great weight to offense seriousness and prior record in parole decisions. It does not take very elaborate research to show that. The question is, is it worth doing more elaborate modeling of sentencing behavior if the object is merely to develop guidelines and (perhaps) to focus public and judicial attention on questions of policy and principle that may not emerge from data analysis but may be deliberately adopted because they are believed to be just, efficacious, or both? The answer to this question is not clear to me, but the question does seem to have consequences. If it is

244 agreed that such modeling should be done, then clearly the best available research methods and analytical techniques should be employed. This would mean (for example) the prospective collection of data rather than reliance on case records; the use of estimation proce- dures other than ordinary least squares when modeling dichotomous outcomes; and the careful development of some theory about judicial decision making as a preliminary to these and other things. If it is decided that highly rigorous modeling is not necessary, then this does not mean that empirical research has no role at all in assisting sentencing reform. I suspect, however, that different tasks and different techniques will be relevant. For example, more attention may be paid to the residuals from models than to estimation of the parameters of those models; there may be more concern with exploratory data analysis than with statistical inference; and an interest taken in research on the impact of guidelines on the rest of the system-illustrated, for example, by the Minnesota research on projecting institutional populations. The political role of the research that has been done to date, and the importance of providing a seemingly empirical basis for sentencing guidelines, should not be overlooked. It may well be that, without an analysis of past practice as a starting point, the use of guidelines as a technique for structuring discretion would not have achieved even its present measure of judicial and public acceptance. Whether that justifies the research that was done--as distinct from that which could have been done- is not an easy question to answer. NOTES 1. For convenience, I refer for the most part to sentencing guidelines throughout this paper. But as will be seen, guidelines very similar in concept may be and indeed are used for many other decision points in the criminal justice system, e.g. bail and institutional classification; for a detailed discussion, see Gottfredson and Gottfredson (1980). 2. An overview of the history of concern about the control of discretion and "disparity (often defined in rather different ways) is contained in Chapter 2 of the Final Report of the Evaluation of Statewide Sentencing

245 Guidelines Project (henceforth cited as Sparks et al., 1982). 3. Portions of this section are adapted from sparks (1983). 4. The structure of the California law is in fact somewhat more complicated than this brief description suggests; there are three base terms from which the sentencing court may choose, the middle one being the presumptive term subject to rules promulgated by the state's judicial council. In addition, it is possible in certain circumstances to enhance a sentence (i.e., aggravate the chosen base term by adding on extra years of imprisonment), although there are no parallel provi- sions for reducing sentences below the lower base term if the court decides to imprison at all. 5. The list of mitigating and aggravating factors (which is said to be nonexclusive, which may mean nonexhaustive) actually includes four grounds for mitigation and four grounds for aggravation; however, the last of the aggravating factors (which refers to "major economic offenses") requires two of a list of five further conditions to be met. Initially the commission had proposed to specify only the five grounds on which departure would not be permissible; this position was changed early in 1980 (letter from Dale Parent to Andrew van Hirsch dated 24 September 1979). 6. According to s.303.4(e)(1) of the Pennsylvania rules, the departure range for aggravation is limited to one cell in the righthand (heavier) direction, unless the guideline cell is the rightmost in its row; then the movement is one cell above, which is also in a heavier direction. The rules for mitigation are the mirror image of this. So far as I am aware, Pennsylvania's guidelines rules are the only ones that provide for such a limita- tion; in the absence of this kind of provision, of course, a court that decided to depart from the stipu- lated guideline range might impose literally any legal sentence. 7. See Minnesota Laws (1978:Ch.723, s. 244.10). In Massachusetts, appellate review of sentences to Walpole State Prison also exists; at the time of this writing it is not known how these appeals will be affected by that

246 state's guidelines. For a discussion of the Massachu- setts and Connecticut appeal procedures in relation to sentences, see Zeisel and Diamond (1976). 8. There may be other reasons for this belief. For example, both Gottfredson and Wilkins had previously made distinguished contributions to the literature on crimino- logical prediction, and the model building analyses that preceded their formulations of guidelines (and those of others) have many affinities with prediction problems in the field of criminology. - 9. Part of the reason for a belief to the contrary may lie in a bogus distinction between description and prescription. If I say "The stuff in this bottle is poison" or "There is a mad bull in this field" (or even put up a sign saying "Bulls) I am making a descriptive statement that has a truth value, etc., but I may thereby intend to warn others; warning is a species of prescrip- tion (see Sparks, 1979; for a general discussion of the linguistic point see Austin, 1962). The prescriptive nature of guidelines is briefly discussed in Gottfredson et al. (1978:141,159). _ _ ~ ~ 10. In interviews with me in 1979, Blalock asserted that there had not been a deliberate attempt to mirror past practice, on the grounds that there had not been a consistent practice prior to the guidelines. He then explained that the matrix had been constructed in part by reference to the maximum time that an offender would have to serve, given full "good time," and the board's desire to make the longest terms (i.e., those in the lower righthand corner of the matrix) sufficiently shorter to i no Or ironers to leave the institution on parole ~I= rather than "maxing out" without parole supervision. 11. Thus, for example, Zalman et al. (1979), in their study of sentencing in Michigan, came to the conclusion that "there is not much predictability in sentencing, since similar cases are being treated very differently" - (1979:142). As we shall see below, there is good reason to doubt that Zalman and his colleagues did in fact find this; their claim to have done so, however, undoubtedly helped them to argue for guidelines as the best alterna- tive to what they described as The current sentencing morass" in Michigan (p. 17).

247 12. This may have seemed especially important to Gottfredson and Wilkins when they were conducting their initial feasibility study; as each has pointed out to me in a personal communication, there was at that time little prospect of legislative mandate for change (of the kind subsequently to emerge in Minnesota), and self- regulation by the judiciary seemed the best bet--quite apart from the concept (which they considered important on the basis of their work with parole guidelines) of making policy explicit. For a similar statement of the ;mn~rtAn~- of involving judges' see Press (1980). _...= ~ 13. It may be for this reason that the Minnesota legislature directed that state's sentencing commission to ". . . take into substantial consideration current sentencing and release practices . . ." in devising its guidelines (Minn. Laws 1978, cg. 723; Minn. Stat. ah. 244 et seq.; see Minnesota Sentencing Guidelines Commission, 1980:1). It appears that no similar injunction was contained in the Pennsylvania Sentencing Commission's legislative mandate. 14. In at least one state, however (namely Pennsyl- vania), the dissemination of this information appears to have been counter-productive politically: see the discussion in Martin (in this volume). 15. For good discussions of the many ways in which this variety of descriptions may be true, see, e.g., Austin (1961), D'Arcy (1963), Anscombe (1961), wisdom (1959). Lawyers are well aware of this: See the discussion in Hart and Honor e (1959). 16. Both concepts, of course, have clear-cut examples, but both have a large and vaguely bounded middle ground in which there is a lot of room for dispute, not only among lawyers but also among others. For example, does being drunk while you commit a crime mitigate (on the ground of lessened self-control) or aggravate (on the ground that you shouldn't have let yourself get into that state)? Should hitherto blameless characters receive less censure for a first lapse--or more, on the ground that they should be held to the higher standards they have previously shown themselves to have been capable of meeting? Examples of the English courts' different approaches to these and kindred questions are found in Thomas (1972).

248 17. This issue is discussed at greater length in Stecher and Sparks (1982). 18. On the relations between information in presentence reports (and probation officers' preconceptions as well), and the sentences imposed by judges, see, e.g., Emerson (1968), Davis (1971), Cicourel (1968), Carter and Wilkins (1967). Cicourel's work makes clear the advantage to most offenders that they are the primary sources of information about themselves that is likely to play any part in their fates. They, at least, never learned to "interpret" their behavior in the way that many social workers can, and they are sometimes fairly skilled at lying about it. 19. The complexities in question would no doubt be even greater in most states, in which a small group of judges hear cases in a single county or similar jurisdiction only. In Massachusetts, by contrast, there remains something of the circuit system still in use in England and formerly found in many American states. Our obser- vation was that this system was a bit rigid, even in Massachusetts; and of course even judges who travel throughout the state may have modified their sentencing policies in response to what they see as local community attitudes. (The same may be true for public defenders, who in Massachusetts are organized and paid by a state organization; prosecutors, however, are elected at county level.) This may seem to be too microscopic to bother with. I believe it is not, however: Attention to such details might enable us to sort out the consequences of judicial role behavior from those attributed (as too many probably are) to personal idiosyncrasy. 20. For a further discussion of the bargaining processes, which in Massachusetts often led to both prosecutor and defense counsel making recommendations as to sentence, see Sparks et al. (1982:Ch.6). 21. This conclusion was said to be based on inspection of an initial sample of 500 presentence reports and on consultation with probation officers involved in the preparation of those reports (McCarthy, 1978:10-11). This surely illustrates vividly the caution needed in dealing with this information source. 22. A New Jersey judge of my acquaintance once confided that he often decided whether or not to incarcerate a

249 convicted offender by looking at the man's wife or girlfriend. A beatific air usually led to probation, a slatternly look to the jail; this curious rule was based on a theory of sorts about what a "good woman" can do for a man, etc. Stranger theories have been espoused by judges--in books yet (see, for example, Alexander and Staub, 1956). 23. There may, however, be problems of validity sur- rounding the available data on judges' and others' beliefs. There may also be problems concerning the consistency with which such data are recorded. A comprehensive discussion is found in Belson (1963) and Hood (1964); again, Cicourel (1968) has informative illustrations. 24. Analyses of variations in sentencing between judges are reported by Rich et al. (1980) and Zalman et al. (1979); this method of identifying ~disparity" in sentencing was also the focus of the earliest studies in this field, e.g., Gaudet et al. (1933). Studies that have claimed to find substantial variation of this kind have done little to explain why it occurs. For example, do the judges in question differ in their perceptions of certain sorts of cases, in what they believe to be appropriate objectives for those cases, or in their beliefs concerning the sanctions best suited to accomplish those objectives? Interesting discussions of this problem are found in Hogarth (1971) and, concerning juvenile court judges, Wheeler et al. (1968). 25. In addition, of course, in some jurisdictions the minimum term to parole eligibility is determined by the minimum sentence imposed by the judge (with or without allowance for Good time"). Even so, it may be necessary to take into account judges' beliefs about paroling practices in deciding on the appropriate definition of length of term. 26. The extent to which estimates of time served will vary according to whether admission or release samples are used is not easy to predict. There may not be much difference if paroling rates and term-setting policies remain reasonably constant over time. However, since the stock of prisoners available to be paroled depends in part on the numbers and types of prisoners admitted in preceding years, and since these are unlikely to remain constant in most jurisdictions, the times served by those

250 released in any year may still differ from the expected times to be served by those admitted in the same year-- which is presumably what is to be reflected in the sentencing guidelines. Even worse problems of inference will arise if a sample of the prison population is used to estimate lengths of terms; long-term prisoners are even more heavily overrepresented (see Sparks, 1971, for a discussion). There are some demographic methods (e.g., life tables, demographic input-output) that are useful in tackling some aspects of this problem (see, for example, Stone, 1972; Keyfitz, 1977), but these do not seem to me to be of much help in dealing with the issue involved here. 27. These field studies were carried out in June-August 1979 and July-August 1980 (see Sparks et al., 1982). 28. The judicially imposed sentence to Walpole or Concord did not in fact mean that the offender spent time in the designated institution; this was in the end determined by the Department of Corrections. AS an example, we observed a case in which a slightly built white youth was convicted of apparently irrational aggravated assaults with a hammer on a number of persons. Prosecution and defense counsel had agreed on a recommendation of 15 years in Concord, which would have meant that the offender was eligible for parole in about 18 months; the judge sentenced the offender to 15 years in Walpole, which would have meant parole eligibility after 10 years. In an interview after the sentencing hearing the judge stated that he had passed a "Walpole sentence" precisely because of the difference in parole eligibility rules; he was confident that the defendant would not be kept by the Department of Corrections in Walpole State Prison, where (as it seemed to all con- cerned) he might have been subjected to sexual attack, etc. 29. This assertion is based on personal communication with the New Jersey guidelines project director, John P. McCarthy, Esq., at the very beginning of the project; it was thought necessary to base the guidelines on all cases sentenced in the year, rather than on a sample, if the resulting guidelines were to be credible to the state's judges. 30. Especially since it is extremely important when carrying out statistical modeling to validate one's

251 findings in the technical sense of seeing whether they hold up in a fresh sample from the same population; there is always a nonzero probability (which tests of statis- tical significance minimize but do not eliminate) that a model--especially if it is based on little or no theory-- merely reflects some idiosyncrasies of the first sample from which it was derived. Moreover, the larger the sample, the greater the chance that rare events (e.g., multiple rapes, in this context) will be represented in it. 31. In less-than-statewide studies, much smaller samples have been used: e.g., in the Denver study (Wilkins et al., 1976) the analysis was ostensibly based on about 200 cases, though because of missing data the number actually used seems to have been between 50 and 80 (compare Rich et al., 1980; Hewitt and Little, 1981). 32. A sampling frame is technically the list of units from which the sample is chosen; for example, a roster of organizational members, a list of census tracts, or a set of registers containing court convictions. Excluding some blocks of units at random from the frame will not necessarily introduce bias into one's results; doing so in a systematic way (e.g., excluding the small counties in Massachusetts) may well do so, and it is safest to conclude that the findings simply do not apply to the excluded blocks (in this case, the small counties). Since these may well differ in important respects, they ought to be included, and oversampled (as the Minnesota and Michigan researchers in effect did), rather than thrown out. 33. It requires weighting the cases finally selected such a way that they will represent, numerically, the actual population. A careful example is Zalman et al (1979). , - 34. For a further discussion see Sparks et al. (1982 Ch.7-8). As noted earlier, guilty pleas often had negotiated (and sometimes agreed) recommendations for sentence by prosecution and defense. In addition, we were told by a number of judges, during our Massachusetts field work, that they paid little or no attention to information in presentence reports in cases in which there had been a trial, since they felt that by the end of the trial they usually knew what sort of person the defendant was.

252 35. The results of an analysis based on trial and plea cases lumped together will be--as might be expected--an amalgam; in this case, one dominated by the much more numerous cases disposed of by guilty pleas. In Massachu- setts, the differences between the two types of cases were not insignificant (see Sparks et al., 1982:Ch.8). 36. See Gelman et al. (1979), in which precisely the wrong account of this matter is given; the authors confuse statistical validation (which requires a sample from the original population) with checking to see whether things have changed since the first sample. 37. Although it is an issue on which both Gottfredson and Wilkins insisted (see, e.g., Gottfredson et al., 1978; Gottfredson and Gottfredson, 1980; Mannheim and Wilkins, 1955). 38. The figure of approximately 11,000 refers to the main categories of offenses for which the New Jersey guidelines were developed; the remaining 5,000 or so cases were a miscellany, including (if I remember correctly) three cases of "setting fire to paramour's beds 39. The exact total, and the ways in which these cases were selected, are unclear from the Massachusetts projects's reports and the information they provided to us. The figure of 1,400 excludes cases sentenced to Life without parole" and a few others unusable for analysis (see Sparks et al., 1982). 40. Technically, we also need to suppose that this scale is a genuinely "interval-level" one, with properties like those of the natural number system. This assumption is of course often violated (or, as economists tend to say, ~relaxedn) in practice. 41. Clear and concise discussions of regression tech- niques include Blalock (1972), Cohen and Cohen (1975), and Walker and Lev (1953); a more advanced treatment will be found in Mosteller and Tukey (1977). 42. That is, variables whose coefficients are no larger than might have been expected purely by chance (and thus are not statistically significant) thus make no contribu

253 Lion to the prediction when other things are held constant. 43. If all of those variables made sense and had suffi- cient nonmissing values (which, as we have seen, is far from the case), there would be 381,501 different pairs of variables--candidate x's--to be tried; triplets, four- somes, etc., would make matters even worse. There are some sensible techniques for carrying out what Mosteller and Tu key (1977:Ch.15) have called guided regression in situations of this kind, in which one knows literally nothing about what variables ought to be considered. But it is better not to get oneself into such a situation in the first place. 44. Such data should not, of course, have been collected to begin with. The question of what is a missing value can get a little complicated. In the nature of things, there are some stigmata--certain sexual deviations, for example, and gross physical peculiarities--that are apt to be mentioned if present, but whose absence would be pedantic to record. Thus the safe coding of a question- naire item such as "defendant into frottage" or "defend- ant is a Siamese twin" is almost certainly "no" rather than "not known, n if explicit mention is not made. Yet, vagueness aside, what is normal is very much conditioned by the preconceptions of the beholder. Probation offi- cers and other social workers, whose professional train- ing typically contains a healthy if diluted dollop of Freudianism, seem able to see peculiarities that humbler folk do not; conversely, they often display a capacity to explain to their own satisfaction (and thus to treat as normal, at least sometimes) many things on which others would be inclined to comment. An illuminating study of institutional records on this point is Belson (1963); see also Cicourel (1968). 45. See, for a discussion, Mosteller and Tu key (1977:Ch.4-6). In some cases, a logarithmic transforma- tion may be theoretically reasonable--it may be reason- able to assume, for example, that prior arrests or con- victions have a diminishing effect, perhaps after a threshold has been reached. This is one reason why the grouping of such things as prior arrests (which is often accomplished in constructing offender scores used in guidelines) may introduce relatively little error into the calculation of expected sentences. This kind of

254 transformation is to be distinguished from that which is involved if it is assumed that relations between outcome and explanatory variables are multiplicative rather than simply additive (as is the case, for instance, with some kinds of "interactions"--see below). 46. This refers, again, to the issue of statistical significance. It cannot be too often repeated that this kind of significance does not license any conclusions about meaningfulness (see the discussion below of the analyses done by Zalman et al., 1979). 47. Unfortunately, the term interaction is sometimes used by statisticians to refer to other things, in particular the situation in which a set of relationships (e.g., between offense and offender variables and sen- tences) differs between subpopulations (e.g., whites and nonwhites). A situation of this kind, and the example given in the text, are by no means necessarily equivalent. 48. In such a case, the other variable is sometimes called a suppressor (see, for example, Rosenberg, 1968; J. Davis, 1971). But it makes no sense to test all pairs of variables that seem to display no association with each other, to see if this kind of suppression is taking place--not least because it may look that way, purely by chance, if enough candidate suppressors are tested. 49. Almost all of the "significant" relationships repor- ted by Zalman et al. had a probability of occurring pure- ly by chance (according to statistical theory) of less than 1 in 1,000. A more common level of this kind of significance uses a probability of chance occurrence of less than 1 in 20 as a criterion. Neither is proof against nonsense, however. If one looks at 500 bivariate associations, for example, the latter criterion means that one should expect, on average, 25 associations of the requisite strength, just by chance. If one ends up with 26 such associations, which is not just a fluke? . . 50. This question should be distinguished from the ques- tion of the number of alternative but equally suitable models that one should seek for the same decision, e.g., lengths of terms given to those imprisoned after a trial and conviction. Statistical analyses may (and crudely empirical ones almost certainly will) yield several such models of about equal explanatory power (see Gelman et al., 1979).

255 51. Quite commonly, for example, offense and offender variables that emerge from regression models will be com- bined into what are sometimes called Burgess scales (in honor of their use in the first parole prediction study by Burgess et al. (1931)): that is, each included factor will simply be given a score of +1 rather than a weight estimated by regression or some other procedure. The scale scores thus derived may further be grouped into categories (e.g., 0-2, 3-5, etc.) in guidelines. Such scores are quite robust in the sense that they tend to hold up on cross-validation (for a discussion, see Wainer, 1976, 1978) . They obviously permit only crude categorizations of offenders into matrix cells; but, as I noted earlier, the concept of guidelines has enough flexibility that this does not much matter. Such smoothing or rounding techniques need to be distin- guished, however, from modifications of the results of analyses of past practice that are explicitly based on considerations of policy, e.g., removing the effects of racial discrimination or regional variation. 52. Strictly speaking, guidelines may also prescribe the place of incarceration, e.g., jail or prison. The New Jersey guidelines do in fact give a hint to judges about this, although no more (see Sparks et al., 1982) . 53. Wilkins seems to believe that they are (see, for example, Wilkins et al., 1976:2-3: contrast, however, Gottfredson et al., 1978:Ch.5). At any rate, neither h e nor anyone else to my knowledge has presented psycho- logical evidence in support of this view. 54. Further guidelines may be developed to deal with each category defined by the first decision: Thus guidelines that aim to regulate the decision to incar- cerate can coexist with durational guidelines, which may be used by another agency, e.g., a parole board (for a further discussion see Sparks et al., 1982:Ch.2,3,11). 55. For descriptions of some of these methods--LOGIT and PROBIT models, and logistic regression--see Fienberg (1977); Bishop et al. (1975); Cox (1970). Applications to criminal justice problems include Solomon (1976); Larntz (1980); Zimmerman and Blumstein (1979); Gottfredson and Gottfredson (1980). The finding--e.g., by Zalman et al. (1979) and Gottfredson and Gottfredson (1980) that the results of using such procedures do not d iffer substantially from those of simpler and better

256 known techniques--may be largely due to the crudeness with which many criminal justice data are measured (contrast Rhodes, 1981, who takes a different view). 56. It is important to note that this scoring of nonincarceration sentences as zero, at the modeling stage of guidelines development, is quite a separate matter from the use (or the misuse) of zero to represent such sentences in the guidelines themselves. This problem is discussed below. 57. See above, notes 34 and 35. In fact, the Massachu- setts guidelines are (or initially were) ~advisory" in cases in which there was not an agreed recommendation following a plea of guilty. Cases of this type, which would seem to have a sort of intermediate status in the adversary process, might themselves be modeled separ- ately, since the determinants of sentences in such cases could well be different from both those operating in those cases that went to trial and those for which there were agreed guilty pleas. This matter is currently being studied by Bridget Stecher and me, using the Massachu- setts data. 58. In the Michigan study (Zalman et al., 1979) offenses were grouped into broad categories of similar sorts of behavior (e.g., sex crimes); within each of these categories, the various offenses were given a seriousness score that was the maximum sentence provided by statute, in months. 59. Marvin Wolfgang and his colleagues at the University of Pennsylvania have recently completed a survey of perceived crime severity using a large national probabil- ity sample (drawn from respondents in the National Crime Surveys); preliminary results from this study, as yet unpublished, suggest that there is in fact considerable variation in the numerical scores assigned to offense descriptions among subgroups of the population. For the view that such differences may reflect variations in the use of the natural number scale as well as the sparseness of the descriptions typically used in this kind of research, see Shelly and Sparks (1980). 60. The situation seems exactly analogous to that of van Bortkewitsch, who showed that the Poisson distribution fitted the observed distribution of deaths from horse

257 kicks in 10 corps in the Prussian army over 20 years. There were in fact 14 corps, but von Bortkewitsch excluded four that had abnormally large numbers of deaths, thus sparing himself the necessity of fitting negative binomials or something similar instead of the Poisson (see Coleman, 1964:291). No doubt it is nice to have reasons--if not theories--to justify such exclu- sions; the point is that such abnormal cases should be excluded, whether or not an apparent reason for their abnormality is present. The basis for deciding that a case is abnormal is, of course, somewhat subjective if no such theory is available. 61. See, however, the discussions of "inrout~ predic- tions by Zimmerman and Blumstein (1979), Rich et al. (1980), and Zalman et al. (1979), and criticism of their techniques by Sparks et al. (1982:Ch.ll). For several reasons, a cutoff of exactly 50 percent is too peremptory a measure of "in" versus "out. 62. They were agreed after discussion with the project's Steering and Policy Committee, which consisted mostly of judges. No pun is intended. 63. Further details of this analysis are reported in Sparks et al. (1982:Ch.9). It may well be, of course, that such changes in outcome are precisely what is wanted, on grounds of social policy. However, it seems to me important to try to estimate (at a minimum) what the aggregate consequences of such a change in sentencing practice would mean, e.g., for prison populations; as I note below, only the Minnesota researchers have so far considered this issue. 64. The midranges are thus treated as a "response" or dependent variable, which is assumed to be determined by the variables that define the rows and columns; the effect of the technique is thus rather like that of the analysis of variance. See also Mosteller and Tukey (1977); McNeil and Tukey (1975); Fairley (1978); and for applications of this method to parole guidelines matrices see Perline and Wainer (1980); Sparks (1983). 65. A multiplicative model of this kind involves the same techniques applied to the logarithms of the mid- ranges rather than to the midranges themselves (see Tukey, 1977). The value of such a model is that the

258 effect of, say, prior record, differs according to the level of seriousness of the offense one is considering. Both the Minnesota and Pennsylvania sentencing guidelines display such a structure (see Sparks et al., 1982:Ch.9). 66. I am not suggesting that such anomalies must be indefensible; perhaps there really is a case for a very much heavier or lighter prescribed term in this or that cell, than what the best-fitting overall structure would dictate. But if so, why? The point of the techniques discussed here is that they may help to make perspicuous matters that may otherwise remain unnoticed. To the . . extent that they succeed in doing this. they surely contribute to what Gottfredson and Wilkins primarily had in mind when they sought to make paroling policy ., , "explicit," which is not the same thing as "making paroling policy. 67. It is open to argument whether range widths within cells should be evaluated in terms of absolute numbers of months (in which case the heavier midranges will usually seem to have the wider ranges), or in terms of cell ranges standardized by their midranges, i.e., in "plus or minus" percentages around the midrange (in which case the - greatest latitude will often be elsewhere in the matrix, probably in those cells prescribing on average the lightest terms). - ~ For example, in a cell with a pre- scribed range of 12-18 months, an offender getting the maximum "normal" term will serve half again as long as one receiving the minimum; in other words, around the midrange this is equivalent to a olus-or-minus oermis- sible variation of 20 percent. ~ ~ , Compare the situation in a cell prescribing a range of 96-120 months (plus or minus about 11 percent, around a midrange of 108 months). In which case is there more variability? 68. At present, however, this computer program (which takes initial inputs, e.g., conviction patterns, as relatively static) looks forward only five years; longer-term projections are needed for many purposes, including planning for prison capacity. The program is, however easily modifiable to permit this. (Minnesota Sentencing Guidelines Commission, 1981, gives details and a program listing; the commission's research director, Kay Knapp, should be contacted for further information.)

259 REFERENCES Alexander, Franz, and Hugo Staub 1956 The Criminal, the Judge and the Public. Revised edition. Glencoe, Ill.: Free Press. Allen, F. A. 1964 The Borderland of Justice. . of Chicago Press. Anscombe, G. E. M. 1961 Intention. Oxford, England: Austin, J. L. 1961 Philosophical Papers. Chicago: University Basil Blackwell. Edited by J.O. Urmson and G.J. Warnock. Oxford, England: Oxford University Press. 1962 How to Do Things with Words. Oxford, England: Oxford University Press. Belson, William 1963 The Development of Stealing in Adolescent Boys. Unpublished paper. London School of Economics. Bishop, Yvonne M. M., Stephen E. Fienberg, and Paul W. Holland 1975 Discrete Multivariate Analysis: Theory and practice. Cambridge, Mass.: Blalock, Hubert M. 1972 Social Statistics. 2nd edition. New York: McGraw-Hill. Burgess, Ernest W., Andrew A. Bruce, Albert J. Harno, and John Landesco 1928 Parole and the Indeterminate Sentence. Spring- field, Ill.: Illinois State Board of Parole. Carter, Robert M., and Leslie T. Wilkins 1967 Some factors in sentencing policy. Journal of Criminal Law, Criminology and Police Science 58(4):503-514. Cicourel, Aaron 1968 The Social Organization of Juvenile Justice. New York: John Wiley. Cohen, Jacob, and Patricia Cohen 1975 Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. - ~ Wiley. Coleman, James 1964 Introduction to Mathematical Sociology. New York: Free Press. Cox, D. R. 1970 The Analysis of Binary Data. London: Methuen. D'Arcy, Eric 1963 Human Action. - M.I.T. Press. New York: John London: Routledge and Kegan Paul.

260 Davis, James 1971 Elementary Survey Analysis. N.J.: Prentice-Hall. Englewood Cliffs, Emerson, Robert M. 1968 Judging Delinquents: Context and Process in th_ Juvenile Court. Chicago: Aldine. Fairley, William B. 1978 Accidents on Route 2: two-way structures for data. In William B. Fairley and Frederick Mosteller, eds., Statistics and Public Policy. Reading! Mass.: Addison-Wesley. Ferri, Enrico 1921 Report and Preliminary Project for an Italian - Penal Code. Translated by Edgar Betts. London: His Majesty's Stationery Office. Fienberg, Stephen 1977 The Analysis of Cross-Classified Categorical Data. Cambridge, Mass.: M.I.T. Press. Gaudet, F. J., G. S. Harris, and C. W. St. John 1933 Individual differences in the sentencing tendencies of judges. Journal of Criminal Law, Criminology and Police Science 23(5):811-817. Gelman, A M., dark I, ~ Joseph Calpin 1979 Developing Sentencing Guidelines. D.C.: U.S. Department of Justice. Washington, Glueck, Sheldon 1928 Principles of a rational penal code. Harvard Law Review 41(4):453-482. Gottfredson, D. M., P. B. Hoffman, M. H. Sigler, and L. T. Wilkins 1975 Making paroling policy explicit. Crime & Delinquency 21:34-44. Gottfredson, Don M., Leslie T. Wilkins, and Peter B. Hoffman 1978 Guidelines for Parole and Sentencing: A Policy Control Method. Lexington, Mass.: Lexington Books. Gottfredson, Michael R., and Don M. Gottfredson 1980 Decision-Making in Criminal Justice: Toward the Rational Exercise of Discretio_. Cambridge, Mass.: Ballinger. Hagan, J. 1975 Extra-legal attributes and criminal sentencing: an assessment of a sociological viewpoint. Law & Society Review 8:357-383. Hart, H. L. A., and A.M. Honore 1959 Causation in the Law. Oxford, England: Oxford ~ . University Press.

261 Hewitt, J. D., and B. Little 1981 Examining the research underlying the sentencing guidelines concept in Denver, Colorado: a partial replication of a reform effort. Journal of Criminal Justice 9:51-62. Hogarth, John 1971 Sentencing as a Human Process. University of Toronto Press. Toronto, Ont.: Hood, Roger 1964 Sentencing in Magistrates' Courts. London: Sweet and Maxwell. Keyfitz, Nathan 1977 Introduction to Population Mathematics. Reading, Mass.: Addison-Wesley. Kress, Jack M. 1980 Prescription for Justice: The Theory and Practice of Sentencing Guidelines. Cambridge, Mass.: Ballinger. Larntz, Kinley 1980 Linear logistic models for the parole decision- making problem. In S. E. Fienberg and A. J. Reiss, Jr., eds., Indicators of Crime and Crimi- nal Justice: Quantitative Studies. Washington, D.C.: U.S. Government Printing Office. Lizotte, Alan 1978 Extra-legal factors in Chicago's criminal courts: testing the conflict model of criminal justice. Social Problems 5:564-580. Mannheim, Herman, and Leslie T. Wilkins 1955 Prediction Methods in Relation to Borstal Train- inq. London: McCarthy, John P. 1978 Report of the Sentencing Guidelines Project to the Administrative Director of the Courts. Ad- ministrative Office of the Courts, Trenton, N.J. McNeil, D. R., and J. W. Tukey 1975 Higher-order diagnosis of two-way tables, illustrated on two sets of demographic empirical distributions. Biometrika 31:487-510. Her Majesty's Stationery Office. Minnesota Sentencing Guidelines Commission 1979 Summary Report: Preliminary Analysis of Sen- tencinq and Releasing Data. St. Paul, Minn.: Minnesota Sentencing Guidelines Commission. 1980 Report to the Legislature. St. Paul, Minn.: Minnesota Sentencing Guidelines Commission. Mosteller, Frederick, and John W. Tukey 1977 Data Analysis and Regression: A Second Course in Statistics. Reading, Mass.: Addison-Wesley.

262 Perline, Richard, and Howard Wainer 1980 Quantitative approaches to the study of parole. In S. E. Fienberg and A. J. Reiss, Jr., eds., Indicators of Crime and Criminal Justice. Wash- ington, D.C.: U.S. Government Printing Office. Rhodes, William 1981 Comments on the Methodology Used in the Construction of Sentencing Guidelines. Unpublished paper prepared for the Panel on Sentencing Research, National Research Council. Rich, William D., L. Paul Sutton, Todd Clear, and Michael J. Saks 1980 Sentencing Guidelines: Their Operation and Impact on the Courts. National Center for State Courts, Williamsburg, Va. Rosenberg, Morris 1968 The Logic of Survey Analysis. Books. New York: Basic Rossi, Peter H., E. Watie, C. Rose, and R. E. Berk 1974 Seriousness of crimes: normative structure and individual differences. American Sociological Review 39:224-237. Sellin, Thor sten, and Marvin E. Wolfgang 1964 The Measurement of Delinquency. Wiley. Shelly, Peggy L., and Richard F. Sparks 1980 Crime and Punishment. Paper presented at the annual meetings of the American Society of Criminology, San Francisco. Solomon, Herbert 1976 Parole outcome: New York: John a multidimensional contingency table analysis. Journal of Research in Crime and Delinquency 13:107-126. Sparks, Richard F. 1971 Local Prisons and the Crisis in the English Penal System. London: Heinemann Educational Books. Sparks, Richard F. 1979 Prediction and Guidelines. Paper presented at the annual meetings of the Academy of Criminal Justice Sciences, Cincinnati, Ohio. 1981 The structure of the Oregon parole guidelines. Chapter 9 in Sheldon Messinger, Richard F. Sparks, and Andrew van Hirsch, eds., Final Re- port on the Strategies for Determinate Sentenc- ing Project. National Institute of Justice, Washington, D.C.

263 1983 Sentencing guidelines. In Encyclopedia of Crime and Justice. New York: Free Press. - Sparks, Richard F., and Bridget A. Stecher 1979 The New Jersey Sentencing Guidelines: An Unauthorized Analysis. Paper presented at the annual meetings of the American Society of Criminology, Philadelphia. Sparks, Richard F., Hazel G. Genn, and David J. Dodd 1977 Surveying Victims. London: John Wiley. Sparks, Richard F., Bridget A. Stecher, Jay S. Albanese, and Peggy L. Shelly 1982 Stumbling Toward Justice: Some Overlooked Research and Policy Questions Concerning Statewide Sentencing Guidelines. School of Criminal Justice, Rutgers University. Stecher, Bridget A., and Richard F. Sparks 1982 Removing the effects of discrimination in sen- tencing guidelines. In Martin For st, ea., Sen- tencinq Disparity. Beverly Hills, Calif.: Sage. Stone, Richard 1972 Mathematics and the Social Science_. London: Chapman and Hall. Thomas, David A. 1972 Principles of Sentencing. 2nd edition. London: Heinemann Educational Books. Tukey, John 1977 Exploratory Data Analysis. Addison-Wesley. von Hirsch, Andrew 1975 Doing Justice: Reading, Mass.: ~ The Choice of Punishments. New _ York: Hill and Wang. Walker, H. M., and J. Lev 1953 Statistical Inference. l Wainer, Howard 1976 Estimating coefficients in linear models: it don't make no nevermind. Psychological Bulletin 83:213-217. Wheeler, Stanton, et al. 1968 Agents of delinquency control; a comparative analysis. In S. Wheeler, ea., Controlling Delinquents. New York: Wiley. Wilkins, Leslie T., Jack M. Kress, Don M. Gottfredson, and Joseph Calpin 1976 Sentencing Guidelines: Structuring Judicial Discretion. Final report of the feasibility study. Albany, N.Y.: Criminal Justice Research Center. New York: Holt.

264 Wisdom, John 1959 Philosophy and Psychoanalysis. oxford: Basil Blackwel 1. Zalman, Marvin, C. W. ostrom, Jr., P. Guilliams, and G. Peaslee 1979 Sentencing in Michigan: Final Report of the ~ ~ Using, Mich.: Michigan Office of Criminal Justice. Z e isel, Hans, and Shari Diamond 1976 The search for sentencing equity: sentence review in Massachusetts and Connecticut. American Bar Foundation Research Journal 881-940 Zimmerman, Sherwood E., and Alfred Blumstein 19 79 The Construction of Sentencing Gu idel ines . Paper presented at the annual meetings of th e Academy of Criminal Justice Sciences, Cincinnati, Ohio. .

Next: 6 THE POLITICS OF SENTENCING REFORM: SENTENCING GUIDELINES IN PENNSYLVANIA AND MINNESOTA »

Research on Sentencing: The Search for Reform, Volume II (1983)

Chapter: 5 THE CONSTRUCTION OF SENTENCING GUIDELINES: A METHODOLOGICAL CRITIQUE

Welcome to OpenBook!

Get Email Updates