National Academies Press: OpenBook

Criminal Careers and "Career Criminals,": Volume II (1986)

Chapter: 6. Accuracy of Prediction Models

« Previous: 5. The Rand Inmate Survey: A Reanalysis
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 212
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 213
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 214
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 215
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 216
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 217
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 218
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 219
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 220
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 221
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 222
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 223
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 224
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 225
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 226
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 227
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 228
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 229
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 230
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 231
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 232
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 233
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 234
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 235
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 236
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 237
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 238
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 239
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 240
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 241
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 242
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 243
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 244
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 245
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 246
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 247
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 248
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 249
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 250
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 251
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 252
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 253
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 254
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 255
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 256
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 257
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 258
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 259
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 260
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 261
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 262
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 263
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 264
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 265
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 266
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 267
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 268
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 269
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 270
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 271
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 272
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 273
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 274
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 275
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 276
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 277
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 278
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 279
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 280
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 281
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 282
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 283
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 284
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 285
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 286
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 287
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 288
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 289
Suggested Citation:"6. Accuracy of Prediction Models." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.
×
Page 290

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

L Accuracy of Prediction Models Stephen D. Got~fredson and Don M. Got~fredson Any decision made uncler uncertainty with respect to future events, behaviors, activities, resources, trends, demands, or outcomes is a predictive one. If the goal of the decision being made is utilitarian, prediction certainly is critical to the deci- sion-making process. Accordingly, the concept of prediction is central to tradi- tional crime-reduction or crime-preven- tive concerns of the criminal justice sys- tem, such as deterrence, incapacitation, and rehabilitation (S. D. Gottfredson ant] D. M. Gottfredson, 19851. Prediction is implicit in the decisions made but rarely is that explicitly recognized. It is quite possible, however, to characterize the American criminal justice system as a network of interrelated decision points (M. R. Gottfredson and D. M. Gottired- son, 1980b); when this is done, the ubiq- uity of prediction to most of the decisions encountered is made clear. This paper concerns the accuracy of Stephen D. GottEredson is executive director, Maryland Criminal Justice Coordinating Council, Baltimore, Md., and Don M. Gottiredson is profes- sor, School of Criminal Justice, Rutgers University. 2~2 prediction in criminal justice settings and the utility of statistically (leveloped deci- sion-making tools intended for practical implementation. We have been forced to limit our review in several ways. First, our principal focus is the prediction of criminal or delinquent behavior. Thus, we do not address a variety of important criminal justice prediction problems in- volving resource allocation, criminal pop- ulation projections, estimation of rates of offending and the length of criminal ca- reers, and many others, except as they are relevant to assessing the impacts of some proposed decision-making devices (e.g., those proposed for selective incapacita- tion strategies). Second, we omit detailed discussion of work concerning the psychological or psychiatric as se s sment of offenders, even though much of this clearly is of a predic- tive nature. We also give less attention to predicting the behavior of criminal jus- tice system functionaries (e.g., judges, prosecutors, parole board members) than to predicting the behavior of offenders. Since the accuracy of prediction models cannot responsibly be assessed in a vac

ACCURACY OF PREDICTION MODELS uum, however, some attention to the be- havior of functionaries is necessary. Detailed critical reviews concerning several distinct and important issues have been published recently. Given the ready availability of this information, we clo not give detailed attention to the prediction of violence (reviewed by Monahan, 1978, 1981; Monahan and Klassen, 1982), to longitudinal studies bearing on predic- tion issues Reviewed by Farrington, 1979, 1982), or to the prediction of sen- tencing decisions (reviews are available in Hagan, 1974; L. Cohen and Kluegel, 1978; Garber, Klepper, and Nagin, 1983; Hagan and Bumiller, 1983; Klepper, Nagin, and Tierney, 19831. Because insufficient information is available to allow reliable generaTiza- tions, we ignore the areas of policing and corrections, although the nature of cleci- sions macle in these settings often clearly is predictive. Our focus is on bait and pretrial release decision studies and on Collisions involving prosecution, sentenc- ing (although as noted above, we do not provide a detailed review of these). ant] parole. We give attention to efforts de- signed to provide acivice, based on scien- tific principles of assessment and predic- tion, to those confronted daily with the variety of decision-making tasks consid- ered. In the first section of this paper we discuss the nature of decisions generally, and in criminal justice settings in partic- ular. Because the accuracy of predictive decision making is of concern, we discuss some of the issues involved in such as- sessments. In the next section we discuss both descriptive and (where appropriate) normative prediction studies for each of the decision arenas under consideration. Special attention is given to items of in- formation commonly observed to be pre- dictive, the general level of accuracy of these (both in the bivariate case and when considered in conjunction with 213 other predictors), and the general level of predictive accuracy achieved in equa- tions or models of the decisions under consideration. Then, we summarize the preceding discussion by focusing on pre- dictors commonly observed across the de- cision arenas studied. We provide a sum- mary of those variables found to predict the decisions of functionaries and those fount! to predict the behavior of offenders and show how they slider. Next, for each of the decision arenas considered, we examine the efficacy of statistically clevel- oped decision-making tools that are in use, or have been proposed for use, in a number of jurisdictions. Finally, we cTis- cuss ways to improve the accuracy and hence the utility of prediction tools cle- signecI for application in criminal justice settings. PREDICTIVE DECISION MAKING The Logic of Prediction Any decision has three components: a goal, the existence of alternatives, and information upon which the decision may be based (M. R. Gottfredson and D. M. Gottiredson, 1980b). Decisions cannot ra- tionally be macle (or stuclied) if decision- making goals are unstated or unclear. Un- fortunately, goals for criminal justice decisions rarely are explicitly stated, and often they are complex. Rarely is a single goal for a (recision given.) Without alter- natives, there can be no decision prob- lem; and without information on which to base the clecision, the "problem" reduces to reliance on chance. As we shall see, decision makers often are not sufficiently attentive to the relation of information used to the goal clesired, which results in decisions being macle that would have been better left to chance. iSee D. M. Gottiredson and Stecher (1979) for an example within the context of sentencing.

214 It is in the relation of information used to the goal desired that prediction studies are of most value to the criminal justice decision maker. If decision makers desire to minimize errors in the decision proc- ess, prediction studies also are to be de- sirect, for it is this that they are clesignecI to accomplish. In brief, prediction simply refers to the utilization of informational items, singly or in combination, to esti- mate the probable future occurrence of some event or behavior (known as the criterion). Methods of using the informa- tional items (known as independent or predictor variables) may be intuitive, clinical, or subjective, or they may be statistical or "actuarial." If of the latter type, any of a wide variety of approaches may be used. The specification of these is beyond the scope of this paper, but we assume the reader has some familiarity with the more common methocis.2 The Nature of Decisions Decisions involve choice, because of the requirement that alternatives be available. Much of psychology, econom- ics, and philosophy concerns the stucly of choices that people make. What deter- mines the amount of money one will pay this fall for a house? What is responsible for the selection of a Labrador retriever over a Chihuahua as a family pet? Why does one (generally) obey the law? What is the role of unconscious motivation, of altruism, of superstition, of morality, or of value in the choices macle? Clearly, de- tailed discussion of the nature of human choice behavior is beyond the scope of this paper. We do, however, briefly con- sider decision-making study that has as a premise the notion that human decision makers value rationality (for a delightful 2For general discussions of the logic of predic- tion, see Sarbin (l943), Gough (1~362), and D. M. GottEredson (1967~. CRIMINAL CAREERS AND CAREER CRIMINALS discussion of rationality in decision mak- ing, see Lee, 1971~. Following Lee, deci- sion theory considers the rational person to be one who, when confronted with choice, makes the (recision that is "best"; this decision is the optimal or rational one. This decision (1) must be one of those available, (2) will depend on the decision principles uncler study (thus, dif- ferent studies, proceeding from different bases, may identify different optimal choices), (3) may differ among persons (e.g., due to differing utilities assigned to alternatives, differing subjective proba- bility estimates), but (4) must depend on the information available to the decision maker. Behavioral decision theory (Edwards, 1954. 1961; Becker and McCTintock. , , 1967; Rapoport and WalIsten, 1972; SIovic, Fischoff, and Lichtenstein, 1977; R. M. Hogarth, 1980; Einhorn and Hogarth, 1981; Pitz and Sachs, 1984), "cognitive algebra" (Anderson, 1968, 1974, 1979), utility theories (Lee, 1971:Chapter 5), and "game theories" and their assessments of strategies (e.g., minimax and maximin principles) (von Neumann and Morgenstern, 1947; Luce and Raiffa, 1957; Lee, 1971) are examples of general considerations of ways in which one may mode} the choice or deci- sion behavior of the rational person (Lee, 1971, and R. M. Hogarth, 1980, review much of this vast literature). we note tins literature to make two points. First, there is a distinction to be drawn between normative and descrip- tive decision studies (Lee, 19711. Norma- tive studies concern the decisions that people should make in a choice situation, regardless of the decisions that they actu- ally make. Descriptive studies concern the decisions actually made, regardless of those that should be made. This distinc- tion, although clear, may become blurrecl in practice, particularly when the goal is to improve rational decision making. We ~? . .1 · 1 -. .

ACCURACY OF PREDICTION MODELS believe that studies of both sorts may be of considerable value and, accordingly, we report on both in the sections that follow. The second point to be raised is that very often human decision makers do not appear to behave optimally, regardless of the particular strategies under study. We elaborate on this point later; here, we simply suggest that for this reason we believe the provision of decision-making tools for criminal justice applications is necessary and desirable.3 Francis Bacon observed: "We do ill to exalt the powers of the human minc3, when we shouIc! seek out its proper helps" Quoted in R. M. Hogarth, 1980~. Indeed, in most clecision-making situa- tions, it has been found that actuarially clevelopecT predictions outperform hu- man judgments. This is true with respect to psychiatric judgments (e.g., MeehI, 1954; Gough, 1962; Ennis and Litwack, 19741; graduate school admissions (e.g., Dawes ant! Corrigan, 1974; Dawes, 1979~; and in other areas (GoIc3berg, 1970~. Later, we review results of these ant] other studies and suggest how human judgments and actuarial preclictions can profitably be used together; here, suffice it to say that normative decision studies appear to have the potential to improve decisions made in criminal justice set- tings significantly. Although we JO "exalt the powers of the human minc3," we also believe in attempts to provide it with `` ,, proper ae ps. Problems of Measuring "Accuracy" An obvious question to be asked when considering predictive information is "how good is it?" The answer is "it de 3There are other reasons also, such as the desir- ability of making the decision process explicit. See M. R. Gottfredson and D. M. Gottiredson (1980b) for discussion of these. 275 pen(ls." The predictive accuracy of infor- mation is a function of many things: among the more salient are the reli- abilities of the items of information used, the methoc3(s) used to combine items of information, the reliability ofthe criterion variable chosen, the kinds of measure- ments usecI, the base rate, the selection ratio used, and the representativeness of samples employed. Two questions should be acIdressec3: one considers the accuracy of inclivi(lual items of informa- tion; the other refers to the accuracy of items in combination with one another. Our discussion requires that we first out- line the nature of the issues aIreacly raised. Reliability Reliability refers essentially to the sta- bility with which measurements may be made, and statistical validity-here im- precisely considered as "accuracy"-is constrained by the reliability win which both criterion and predictor measure- ments are made. No prediction device can be better than the data from which it is constructed. Often, attention is given to the reliabilities of the predictor items but the reliability of the criterion is ne- glected.4 Methods of Combining Information Many statistical methods have been used in criminological prediction studies, including the simple inspection of cross- cIassification tables (e.g., Wamer, 1923), multiple regression (e.g., D. M. Gottfred- son and Bonds, 1961; D. M. Gottfredson, Wilkins, and Hohinan, 1978), multiple dis- criminant-function analysis (e.g., Brown, 40ne would be wise to view measurements of a table with skepticism if the yardstick used is made of rubber elastic. The careful investigator would want to ensure as well that the table is not elastic.

216 1978),5 multidimensional contingency- table analysis (e.g., Solomon, 1976; van Alstyne and Gottfredson, 1978), tobit analysis (e.g., Palmer and Carlson, 1976), and a variety of clustering approaches (e.g., Ballard and GottEredson, 1963; D. M. Gottiredson, Ballard, and Lane, 1963; Fildes and Gottirec3son, 19681.6 For a variety of statistical and practical rea- sons, one or another approach may be preferred, and the technique used theo- retically could have dramatic conse- quences for the accuracy of resultant pre- diction devices. In criminal justice applications this potential unfortunately remains largely theoretical. Several re- searchers have attempted to demonstrate the relative utility of different statistical approaches to criminal justice prediction problems (e.g., D. M. Gotttrecison and Ballard, 1964a; Babst, Gottiredson, and Ballard, 1968; Simon, 1971, 1972; Wilbanks and Hinclelang, 1972; Far- rington, 1978), and the potential advan- tages of different approaches have been discussed by Wilkins and Mac- Naughton-Smith (1964; see also Simon, 1971; S. D. Gottfredson and D. M. Gott- frecison, 1979, 19801. S. D. GottErectson and D. M. Got~redson (1979, 1980) com- parecT the relative utility of six ofthe more commonly used or promising methods, concluding (as did the other studies cited) that "no clear-cut empirical advantage in prediction is provided by one or another method (1979:631. Reasons for this dis- appointing observation have been sug- gested by Farrington ~ 1978), S. D. Gottfredson and D. M. Gottfredson (1979), and Loeber and Dishion (1983~. In acIdition to serious problems of crite ~It should be noted, however, that when the criterion measure is dichotomous, as in the example cited, Fisher's discriminant function is equivalent (within a transformation) to the multiple regression approach; see Porebski (19661. 6For discussions of clinical methods of combin- ing items of information, see Gough (1962) or Monahan (1981~. CRIMINAL CAREERS AND CAREER CRIMINALS rion measurement, problems of the reli- ability of predictor information and the consequences of this for certain of the methods (particularly least-squares meth- ocls; see Wainer, 1976) especially are (le- serving of mention. Meehl (1954) and Gough (1962) pro- vicle good reviews of specific actuarial methods that have been used widely in the behavioral sciences generally, often with reference to problems and applica- tion in criminal justice system settings. Mannheim and Wilkins (1955), Simon (1971, 1972), and S. D. Gottfrectson and D. M. Gottfredson (1979) have provided reviews of methods typically used in . . cr1m1no ogy. The Base Rate The base rate for any given event is clefinec3 as the relative frequency of oc- currence ofthat event in the population of interest.7 Typically, base rates are ex- pressed as proportions or percentages. In many criminal justice applications, which traditionally have treater] criterion mea- sures as dichotomous, the base rate is found simply through inspection of the appropriate marginal distribution of the expectancy table. The ctifflculty of predicting events of interest increases as the base rate clingers from .50 (Meehl and Rosen, 1955~. Thus, the more frequent or infrequent an event, the greater the likelihood of inaccurate prediction. (While this seems intuitively true for rare events, it must be remem- berecl that the occurrence of very fre- quent events requires the simultaneous occurrence of very rare events unless the probability of an event is precisely O or 1.) As an example of the difficulty of such prediction, suppose that the base rate for failure on parole is .20. Given this information alone, one would make cor 7This discussion is adapted from S. D. GottEred- son and D. M. Gottfredson (19791.

ACCURACY OF PREDICTION MODELS rect predictions 80 percent of the time if one simply predicted that no one will fail on parole. One would also, of course, be wrong 20 percent of the time. (Note that given only the base rate as a guide, there is no way of estimating which 20 percent will fail.) Now assume that a predictive crevice has been developer] that allows one to predict parole outcomes with 78 percent accuracy. Even given this apparently powerful device, one would still be better off in expecting that no one will fail on parole that is, in "predicting" perform- ance on the basis of the base rate alone. Although the predictive device floes beat a naive chance rate (50 percent), the true chance rate is considerably higher, and in fact is greater than the power of the pre- clictive device. Those concerned with the clevelop- ment of predictive tools for use in crimi- nal justice applications (and in other ar- eas) often have failed to consider base rates in the development process ancI, consequently, have made classifications or predictions based on criteria that pro- cluce larger errors than would the simple use of the base rate. In 1955 Meehl and Rosen summarized the consequences of failure to consider base rates and con- cludect that then-contemporary research reporting neglected the base rate, making evaluation of utility cli~cult, if not impos- sible. Although Reiss (1951c) clearly and dramatically illustrated this point more than 30 years ago in a classic review of Glueck and Glueck's Unravelingluvenile Delinquency (1950; see also Hirschi and Selvin, 1967), failure to consider base rates remains an unfortunately common practice (but such studies are now found rarely in the published literature). Selection Ratios 2~7 as belonging to the criterion classification of interest. In delinquency studies, for example, the selection ratio is the propor tion of persons studied and selected as expecter] delinquents by means of some prediction instrument (see Loeber and Dishion, 1983, for a discussion). Thus, the base rate provides one marginal clistribu tion for an expectancy table, and the se lection ratio (essentially) provides the other; together, the marginal clistribu tions determine the chance expectancies for the table. Selection ratios may be altered through manipulation of the cut ting score, which has obvious but some times unrecognized consequences for prediction (Cronbach, 19601. These may be particularly dramatic if the bivariate distribution is heteroskerlastic (J. Fisher, 1959~. Representativeness of Samples If accuracy of prediction is desired, samples used in constructing selection devices must be representative of the population on which the crevice is in tended to be used.8 This ensures that the appropriate base rate is considerecl and minimizes subsequent shrinkage of power from the construction to the oper ational samples. The adage that no two people are ex actly alike properly is extended to groups: no two groups of people are i(lentical.9 If, however, the groups have been selected by some appropriate mechanism (such as random sampling), they can be expecter] to have a great deal in common in terms of both their overall characteristics and the interrelations of various individual char acteristics. It is this similarity of relations within different groups of people on 8Note that this is not the same as saying that the sample must be representative of the population as The selection ratio is simply the pro- a whole portion of tncl~v~cluals or events studied 9Portions of this discussion are adapted from and identified by the prediction method s. D. GottEredson and D. M. GottEredson (1979).

218 which all statistical predictions ultimately rely. If in one group of subjects the young c30 better in relation to some outcome, it can be assumed that in a similar group of subjects the young again will do better. Prediction methods are intended to esti- mate, on the basis of some group of peo- ple available for study, how members of other similar groups will behave. There is a danger, however, of overestimating the extent to which relations founct in one sample can be used to explain relations in a similar sample. Within the original sam- ple alone, there is no adequate way to distinguish how much of the observed relation is due to characteristics and un- derlying associations that wit! be shared by new samples and how much is due to unique characteristics of the first sample. This is because the apparent power of a prediction device clevelopec3 on a sample of observations derives from two sources: the detection and estimation of unclerly- ing relations likely to be observed in any similar sample of subjects and the pecu- liar or indiviclual properties ofthe specific sample on which the device has been cre- ated. Cross-vaTiclation is important in esti- mating the relative importance ofthese two sources of predictive power. This is partic- ularly advisable when the prediction study is intended for practical application in new samples. If not clone, the utility of the in- strument as a predictor in new samples is likely to be overestimated. Cross-Va~iciation Cross-vaTidation is simply an empirical approach to the problem of obtaining an unbiased! estimate of the accuracy of pre- diction (whether based on a single item of information or on some combination of items). Typically, this is accomplished by dividing the sample at hand in two, con- structing the device on one, and using the other to estimate predictive accuracy. Horst (1966) refers to this general proce- dure as the "sample fractionation" ap- proach and argues, quite correctly, that CRIMINAL CAREERS AND CAREER CRIMINALS there are serious clisadvantages to it. First, the stability of estimates is depen- dent on the number of cases on which they are ma(le. Thus, divicling the sample reduces the reliability of the device con- structec3, which, as aIrearly noterl, may reduce validity. Second, the approach gives only one estimate (from a poten- tially large universe of estimates). In ef- fect, one regards coefficients that result from cross-vaTidation as an estimate of the average expected validity in independent samples and expects those vaTiclities to be normally distributed. Accordingly, one is as likely to underestimate as overestimate ton validity but a single sample offers weak empirical evidence of shrinkage (Horst, 19661. There appears to be no "best" answer to the cross-vaTiclation problem; rather, a tracle-off of concerns is raised. Sample fractionation procedures do constrain va- lidity (unIess the sample obtained is very large, which is unusual in cr~m~na~Just~ce research). A single estimate of shrinkage is not optimal, is unlikely to represent the actual mean validity, and is as likely to underestimate as overestimate that value. As noted by Horst, one can obtain two estimates by examining expected valicli- ties from each sample on the other (in the traditional fractionation approach), but one is then left with deciding which of the crevices actually to use. Similarly, one could furler fractionate the sample and develop several empirical estimates. Again, however, one encounters prob- lems of reliability as the sample size de- creases. To meliorate this, one could re- combine the subsamples and create a device on the full sample, relying on the subsample estimators to provide an index of shrinkage (see Horst, 1966:3801. It seems likely, however, that the validity of the device developed in this fashion will be underestimatecl (perhaps seriously) given that the samples from which valid- ity is estimated are much smaller than is the sample on which the final device is constructed.

ACCURACY OF PREDICTION MODELS Some argue for a "IongitucTinal" vaTida- tion approach (e.g., Horst, 1963, 1966) in which one develops a device on the larg- est sample available and applies the de- vice in operational use. Validity is as- sessed over time, and research is integrates! into the administrative proc- ess. It seems to us that the central issue has to do with (1) the types of decisions to be made on the basis of a predictive crevice and (2) the expected validities of the crevices used. For certain relatively benign applications, when expected va- lidities may be relatively high, we would not object to such a procedure. When the decisions to be macle involve conse- quences of liberty, however, ant] when expected valiclities are Tow (as commonly is the case in criminal justice applica- tions), we wouIc! object. Wright, Clear, and Dickson (1984) recently illustrated that the consequences (in terms of re- duced vaTiclities) of the wholesale aclop- tion in several jurisdictions of crevices developecI in one locale can be dramatic. Measures of Predictive Accuracy The issues considered so far can affect the accuracy of a predictive crevice, but . _ 0 > I UJ m 6 i,' 219 we have not yet consiclerecl how best to assess that accuracy. This section focuses on such a consideration. In selection applications, predictive crevices reduce to a dichotomy resulting in a decision situation, with actual out- comes considered, that can be repre- sentec3 by a 2 x 2 contingency table (Figure 1~. The cutting score clecicled on determines the selection ratio ant! the marginal distribution of the columns in Figure 1. The base rate determines the marginal distribution of the rows. To- gether, these determine the distribution of cases within the table, subject to one degree of freedom. They also determine the distribution of cases within the table to be expected by chance. Although statistics such as x2 are useful in assessing inclepenclence in tables such as this, the value of x2 is a function of the dimension- aTity of the table and the number of cases considered, as well as of the relation be- yond that expected by chance. Further, y2 is used to assess statistical significance; directly, it tells the investigator nothing about the magnitude of the effect discov- erec3. It gives an assessment of"accuracy" to the extent that the investigator may be confident of the reliability of the elect False | Positive l Negatives Hits _. , Negative Fa Ise l Hits I Positives Succeed Fail PREDICTED BEHAVIOR FIGURE 1 The selection decision problem.

220 ctiscoverecT, but it floes not depict the degree of relation associated with that elect. A variety of statistics are available to help in this assessment (e.g., the con- tingency coefficient or Cramer's V; see Hays, 1963:60~606), but none com- pletely overcomes the climensionality problem. The use of ~ (phi coefficient) (Hays, 1963:604) is meliorative when used for tables with one degree of freedom. Since the practical application of predictive tools for selection purposes often recluces to such a table, ¢, (which is simply \/X2/N) would appear to be an attractive choice for an index of predictive efficiency. The marginal distributions of a table with only one degree of freedom, however, con- strain ~ by imposing an upper limit on the possible relation observed in the table (Guilford, 19651.~° Moreover, ¢, is subject to a limitation common to correlational measures: it is sensitive to the base rate. As noted by Richardson ~ 1950), the standard error of prediction provides an immediate, but incomplete and poten- tially misleading, answer to the question of the predictive value of a selection cle- vice. This statistic is given by: cry Hi, where of is the standard deviation of the criterion measure. As we have noted, most selection applications of predictive crevices use some cutting score, essen- tially reducing the predictor scale to a dichotomy. As commonly used, however, the standard error of prediction assesses the predictive device and the criterion measured continuously and may, in fact, result in an underestimation of the power of the selection crevice, since the device as used simply is predictive of success or failure. The standard error of prediction, however, is a function also of degrees of i°This does not appear to be true for the point- biser~al, as commonly applied to 2 x k tables (B. F. Green, Jr., personal communication, 1979). CRIMINAL CAREERS AND CAREER CRIMINALS success or failure; that is, it requires an assessment of just how good a success, or how bad a failure, an incliviclual is (Richarclson, 19501. Further, the standarcl error of precliction also is sensitive to variations in the base rate and, hence, may be of little value in assessing the relative merits of crevices used on dif- ferent populations. A number of indices are intencled to provide an estimate of the "proportionate reduction in error" resulting from use of some selection or predictive crevice. In general, these inclices are designed to offer an evaluation of predictive power above that afforded by simple use of the chance rate. OhTin and Duncan (1949), among the first to give practical attention to the problem in the criminal justice fielcl, suggested an "index of predictive efficiency" (see also Horst, 1941; Reiss, 1951a; Goodman, 1953a, b; McCord, 1980; Loeber and Dishion, 1983), which is defined simply as the percentage re- (luction in error gained by use of a pre- dictive device over that achieved by knowledge of the base rate alone. _ . . . The index of predictive efficiency also has the limitation of sensitivity to the base rate. Thus, it has little utility for the ex- amination of accuracy across different situations. Considering specifically cases such as that diagrammecI in Figure 1 (in which one essentially wishes to predict mem- bership in one or the other of two mutu- ally exclusive categories), Berkson (1947) noted that there are utilities, definer] as true positives ant] negatives, as well as costs, defined as false positives and neg- atives, associated with the decision macle. Arguing that predictive devices shouIcI be evaluated with respect to a compari- son of costs and utilities, he developecI an index of effectiveness (which may be used at any utility) called "mean cost" anti clefinecT the "mean cost rating" (MCR) to allow the index to vary from 0 to 1. The MCR is less sensitive to the base

ACCURACY OF PREDICTION MODELS rate than is ~ or the point-biserial coeffi- cients. The index was introclucec] to crim- inologists by Duncan et al. (1952), and it has seen widespread use since as a mea- sure of the predictive efficiency of a se- lection device. It recently was shown that the MCR is related to Kendall's tan, pro- vicling a method of testing the statistical significance of the index (Lancucki ant] TarTing, 19781; and Fergusson, Fifield, and Slater (1977) have shown the relation between the MCR and the familiar pro- portion of area under a receiver operating characteristic (ROC) curve, which pro- vides a grounding for the inclex in the framework of signal detection theory (Green ant] Swets, 1966~. For the two-by-two decision case (which represents the "fairest" test of a predictive device as user] in selection decisions), Loeber and Dishion (1983) cleveloped an index called the RIOC (relative improve- ment over chance), which considers chance occurrence within the table as well as the maximum correct value that prediction could achieve given applicable selection- ratio ant! base-rate conditions. Since this statistic is more recent than others de- scribec3 and less common in the criminal justice literature, we describe it further. The RIOC is cleaned as %IOC RIOC= 100 %MC- ARC where the numerator represents the per- centage improvement over chance (IOC) and the denominator is the difference between the maximum percentage cor- rect (MC) that couIct be achiever! and the percentage required by chance (RC), both given the joint marginals observed. Although not inclependent of either the base rate or the selection ratio, the RIOC correlates much less highly with either than does the simple index of predictive efficiency (Loeber and Dishion, 19831. None of the inclices yet developed, however, can answer completely the 227 question of how accurate a predictive device is. Correlational indices and indi- ces such as the RIOC and the index of predictive efficiency suffer because they are affected by variations in the base rate. Thus, they do not reacliTy allow a compar- ison of crevices (or items) across base-rate conditions. The MCR floes allow this, but it is not often that one wishes to evaluate a specific predictive crevice regarcIless of base-rate conclitions, although this is the most common application of this index (S. D. Gottfredson and D. M. Gottfredson, 1979; Hoffman, 19831. Measures that are sensitive to base rates and those that are not can leacl to dramatically different conclusions con- cerning the value of predictive crevices (Fergusson, Fifielc3, and STater, 19771. The former (e.g., correlation measures) describe the performance of the instru- ment in application with given popula- tions and decision rules; the latter (such as the MCR) essentially give an inclica- tion of the general power of the device without respect to constraints of base rates and selection ratios. Which to use depends on the question at hand. If one seeks to evaluate the relative power of different crevices devel- ope<1 on different populations (for which the base rates may well be different), indices that are less sensitive to base rates would seem preferable. If, however, one wishes to estimate the power of a partic- ular crevice, aclministerec3 with particular decision rules on a particular population, base-rate-clepenclent indices will be more informative. Other Problems Concerning "Accuracy" The practical application of predictive tools in criminal justice raises other prob- lems relatecl to the "accuracy" question. One almost always is attempting to con- struct, validate, ant] assess the accuracy of crevices uncler circumstances that already

222 have required some selection: thus, true base rates often cannot be known, nor "accuracy" assessed relative to them. One cannot, for example, know the true base rate for parole violation for all of- fenclers considered for parole. Since not all are in fact parolecl, one can at best identify the base rate for known viola- tions by paroled inmates. Problems exist also in the area of as- sessing the relative contributions of spe- cific predictor variables to the overall ac- curacy of a predictive or selection crevice. Items that may be highly predictive un- der some base-rate conditions may be much less so uncler other base-rate condi- tions (this is most likely to be the case when the distribution of the predictor variable itself is skewed). Items that may prove predictive for some clefinecl popu- lations may be less (or more) predictive when the composition of the population is different (e.g., the item "race" may be predictive of criminal convictions in some large urban populations and not at all predictive in suburban or rural popu- lations). Items that are predictive during some age ranges may not be predictive if other age ranges are considered. As we have pointed out elsewhere (S. D. Gottfrecison and D. M. Gottfredson, 1979), such issues are meliorated if one remembers that the greatest limitation of prediction methods [is] that the devices . . . are developed and validated win respect to specific criteria, using available data, in a specific jurisdiction, during a specific time period. Thus, any generalizations to over out- comes of interest, or after modifications of the item definitions used, or to over jurisdictions or populations, or to over time periods, are to be questioned. Still, the question of the "best" predic- tors is an important one, both for pro- viding guidance for those who wish to construct predictive devices and for the- oretical (levelopment. Several criteria of CRIMINAL CAREERS AND CAREER CRIMINALS "best" could be consi~lerecl: (1) most powerful (in unique contribution to pre- cliction), (2) most stable (e.g., from popu- lation to population), (3) most reacliTy available (e.g., age, sex), or (4) most ethi- cally or legally defensible. In the cTiscus- sion that follows, each of these will be consiclerecI. The "most powerful" crite- rion, however, is clifficult to apply for several reasons. First, few authors have provided sufficient information to allow a comparison ofthe predictive efficiency of items across an adequate variety of situa- tions. Ideally, one would like to calculate RIOCs or MCRs to assist in this evaTua- tion; the data provi-cled usually are insuf- ficient for this. Second, devices con- structed following a simple unweightec3 linear mode! (and there are many of these) provide no assessment of the rela- tive value of in(lividual items of informa- tion. Third, although devices constructed] using multiple regression methods do provide information for such an assess- ment, studies on which these are based almost always have used a dichotomous criterion. Uncler such circumstances, beta weights are quite unstable (Palmer and CarIson, 1976) and cannot be relied on to provide unbiased estimates of the unique contributions ofthe variables considered. Other regression methods that would be meliorative (e.g., the logistic moclel) are not used often. Two kinds of errors will be macle in any predictive ciecision-making situation: some persons preclictec! to belong to cri- terion classification A in fact will not (false positives), and some persons pre- clicted to belong to criterion classification B in fact will not (false negatives) (Figure 1~. Each of the various indices discussed above considers that the two types of errors are equivalent. In practice, of course, they may not be, whether mea- surecl in monetary, social, or ethical terms. In most practical (recision-making situations, and particularly those in crim

ACCURACY OF PREDICTION MODELS inal justice settings, the social, ethical, or programmatic consequences of one type of error may be dramatically different from the other. Although one typically evaluates crevices without respect to this "weighting" of errors in a statistical fash- ion, political, ethical, and policy argu- ments tend not to ignore the differential consequences of the types of errors made (von Hirsch ant! Gottfredson, 1984~. Loeber and Dishion (1983) have demon- strated that the relative evaluation of pre- dictions made can change ciramatically clepencling on the consequences assignee] to one or the other type of error. Often recommended in personnel selection sit- uations (Cronbach and Gleser. 1957; Borer, Hoffinan, and Hsieh, 1965; Wig- gins, 1973), determining the expected utility of predictive devices based on a differential weighting of errors is com- mon, although not in justice system set- tings. RESEARCH EVIDENCE: THE POWER OF PREDICTION Bail and Pretrial Release Decision/Prediction Studies A number of prediction studies con- cerning bad! and pretrial release/deten- tion have been conclucted. Given the enormous consequences of decisions made at this stage of the criminal justice process, however (see President's Com- mission, 1967; GoIc3kamp, 1979; M. R. Gottfredson and D. M. GottEredson, 1980a,b, for discussion of these), it is somewhat surprising that more attention has not been focused on the area. Gol~kamp (1979), M. R. GottEredson ant] D. M. Got~redson (1980a,b:Chapter 4), and GolUkamp and Gottfredson (1980, 1981a,b) have provided detailec! reviews of most of this literature, and we c3 raw heavily on these reviews in the discus- sion that follows. 223 Descriptive Studies The early "bail reform" movement and subsequent legislation (e.g., as outlined by Freed and Wald, 1964; American Bar Association, 1968; Angel et al., 1971; Na- tional Advisory Commission, 1973; see also GolUkamp, 1979; M. R. Got~recison and D. M. Gottfredson, 1980b; Gol~kamp and Gottfredson, 1981a,b) focuses! atten- tion on factors deemed legitimate or ap- propriate for consideration in bad] and pretrial detention decisions. The iclentifi- cation and specification of these factors prompted several investigators to attempt a determination of the extent to which they actually were considered by judges making these decisions. Bock and Frazier (1977) studied the setting of bonclii in a six-court district in Florida. Five types of informational vari- ables related to the clefen(lant, recom- mencled for consideration by the Ameri- can Bar Association (ABA) ant! the National Advisory Commission on Crim- inal justice Standards and Goals, were studied; these included the length and character of residence in the community; employment status and history and finan- cial condition; family ties; reputation, character, and mental condition; and prior criminal record. Bock and Frazier operationally definecI these rather non- specific recommendations in several ways. In all, 18 variables reflective of the five recommendations were studied, and each recommendation was represented] by at least 2 variables. Five of the vari- ables examined (currently on probation, presence of a juvenile record, the serious- ness of the first charge, clefendant's ap- pearance, and clefendant's clemeanor) were related significantly to the bond i/Operationally defined as release on personal recognizance, win bond set at less than $500, $50~$4,999, and $5,000 or more.

224 decision made; only one of these- the seriousness of the charge-approached a magnitude suggesting that it may be of practical significance (~c = .37; it for the remainder ranged from .12 to .211. The multivariate procedures used were not described in the report, and no overall assessment of the utility of these items of information was given. Neither race, sex, age, adult criminal record, the total num- ber of charges, whether the clefenclant currently was on parole, any of seven indices of the defendant's financial and employment status or condition, nor any offour indices of the nature and quality of the defendant's ties to the local commu- nity were relater! statistically to the bond decision outcome. Assessments of defen- ciants' demeanor and appearance, how- ever, were related statistically to the dis- nosition (¢ = .18 ant! .12, respectively). Camp (1979) examined release and bail-setting practices in Philadelphia using a sequential model. More than 50 variables were available for analysis, and many of these had statistically significant zero-orcler relations with a release-on-re- cognizance criterion (see GoIc~kamp's Ta- ble 7-2, pp. 14~147, for examples). Only five variables, however, adcled at least 1 percent to the overall R2 observed when multiple regression techniques were used. (The best equation developed, us- ing 51 variables, resulted in an R2 of.43; only 2 added more than 1 percent: the seriousness ofthe charge and the serious- ness value of the most serious prior ar- rest.) GolUkamp demonstrated that a probable best estimate of the unique con- tribution of the seriousness of the charge is about 14 percent of the variance in the decision made and that this single vari- able is about seven times as powerful as if_ 1 11 / ~ Ads\ i2The last two measures were based on observa tions made by passive observers. Information con- cerning the reliability of the assessments is not given. CRIMINAL CAREERS AND CAREER CRIMINALS its nearest competitor (which has to do with the seriousness of the clefendant's prior record). The same 51 variables were used to "predict" the amount of cash bad! set for those clefendants for whom such a cleter- mination was macle. Here, only two vari- ables addec] at least one percentage point to the explanation of the amount of varia- tion in bait required beyond that ex- plainecT by the "best" predictor whether there were weapons charges (accounting alone for 23 percent of the variance in bad! amount).~3 GolUkamp was able to demon- strate that, although first-orcler effects were not powerful (for example, using all 51 variables only about 26 percent of the variance could be explained), the inclu- sion of interaction terms, particularly those involving offense characteristics, improved prediction substantially. In a small but carefully designed ant] analyzed questionnaire/simulation study, Ebbesen and Konecni (1975) dicl observe a sizable effect for community ties on the setting of bad] Respondents were 18 members of the judiciary; stimuli were contrived "robbery" cases with a variety of indepenclent variables that were ma- nipulatec3 systematically), and lesser, but statistically significant, effects for prior record and for the bait recommendation made by the district attorney. No effect was observed for the defense attorney's recommendation, nor were any of the interaction terms significant. Ebbesen en cl Konecni followed this simulation study with a passive observational study of 106 cases actually judged by five of the subjects of the simulation study; they ob- served significant effects (on bait amount) for (in ogler of magnitude) the district attorney's recommendation, the defense attorney's recommenclation, the interac ~3These were number of transcripts (indicating extent of criminal processing) and number of prior arrests.

ACCURACY OF PREDICTION MODELS tion of these, the interaction of the cle- fense attorney's recommendation and the seriousness of the charge, and the seri- ousness of the charge itself. By far, the district attorney's recommendation had the greatest effect. Local ties (measured on two levels) were not significantly asso- ciatect with the amount of bait set. In a clever post-hoc analysis, these investiga- tors clemonstrated that the seriousness of the crime and local ties were important to the judges' decisions, but that these also were important to the district attorneys' recommendations. They posit that the judges are aware of this, and that these factors therefore indirectly (through the district attorneys' recommendations do influence the decisions made. In a sample limited to persons eligible for release to one of three "release pro- grams," Bynum (1976) found that only prior record consistently related signifi- cantly to the probability that the defen- dants would actually be released to the programs; demographic variables and community ties were found to have little impact. Similarly, Roth ant! Wice's (1978) large study of some 11,000 pretrial releasees in Washington, D.C., demon- strated that charge seriousness and prior record were significantly relatecl to judges' pretrial release decisions, but that race, sex, age, employment status, and residence were not. In multivariate anal- yses (with a criterion of release without financial conclitions versus financial con- ditions), the charge and prior record re- mained consistently related to the deci- sions made, as dic3 the judge anct the capacity of the District of Columbia jail. Gol~kamp and Gottfredson (1981a,b, 1985) used a large sample (approximately 4,800 defendants appearing before the Philadelphia courts between 1977 and 1979) stratified by decision maker (20 judges) and the seriousness of the charge (six levels, ranging from misdemeanors to felonies). Following GolUkamp (1979), 225 they developed a sequential model of the clecision-making process. In essence the model treats the bait decision as a contin- gent, two-part process, in which "the judge first weighs whether a defendant merits outright release pending trial (ROR); if a defendant does not meet the judge's criteria for ROR, the second cleci- sion task becomes the selection of a par- ticular amount of cash-bail" (1981b:1921. Thus, the ROR decision may be Ire ate ct as binary, and the cash-bai] decision may be treated as a continuous variable. Logit analysis was user! to study the former; multiple regression (on a logarithmic transformation of bait amount) was user] to investigate the latter. Forty-three variables that either had been shown to be relatecl to the ROR decision in prior work or had been pur- portect to play a role in those decisions were examined at the bivariate level anti in combination (via the Togit procedure). Variables considered inclucled victim and offense characteristics, community ties, prior criminal history, and offender cle- mographic characteristics. On the bivari- ate level, information concerning victim characteristics appeared largely unre- latecT to the ROR decision, regardless of the charge category consiclered (the charge category, in this case, largely re- flects seriousness level). Within charge categories, other offense characteristics also appeared largely unrelated to the decision. Evidence (again, at the bivari- ate level) concerning community ties was mixed; some variables examined (e.g., employment, on welfare or not) appeared promising, others did not (e.g., marital status, length of present resiclence). Of- fen(ler (lemographic characteristics were significantly related to the decision for some charge categories but not for others. Only sex appeared rather consistently to be related to the ROR decision regar(lless of the charge category considered. Fi- nally, variables reflective of criminal his

226 tory die! appear to be related to the cleci sion, and typically, this was the case regardless of the charge category consicI- Normative prediction studies concern ered (GolUkamp and Gottiredson, ingbai] have been constrained by a sub l981b:19~1951. The charge category it- stantial base-rate problem. As examples, self, which largely reflects a seriousness failure to appear (FTA) based on officially weighting, strongly influenced the ROR reported rates ranged only from 4 to 24 decision. Based on examination of these percent in the 72 cities surveyed by Wice bivariate relations, eight dichotomous (1973), and almost 90 percent of the juris variables ant! the charge classification dictions samplecI reported FTA rates of were selected for further examination us- less than 10 percent. In a survey of 20 ing the Togit proceclure.~4 These reflected cities that covered a several-year period, race, sex, attachment, employment, ar- Thomas ~ 1976) reported FTA rates of rests. Benching charges. prior failures to from 1 to 15 percent in 1962 (meclian = 6) and 3 to 17 percent for 1971 (median = 111. With respect to a recidivism criterion, Locke et al. (1970) found that 17 percent of those released pretrial in Washington, D.C., were rearrested later, and M. R. Gottfrecison (1974) found that only 5 per cent of those released in Los An~eles CRIMINAL CAREERS AND CAREER CRIMINALS Normative Studies ~ ~ =7 ~ ~ 1~ appear, felony convictions, and charge (six categories). The final reduced model decider! on considered only maTesi5 and heavily weighted the charge. Variables reflective of prior record were also repre- sented in the moclel. Attachment and em- ployment, although represented, were given little weight (see GolUkamp and Got~redson:2061. Apparently, the ROR decision is "based primarily on charge, seconciariTy on prior record, and tertiariTy on community ties" (p. 205~. The mocle! developed was founc! to be a significant predictor of the amount of bait set: a regression equation inclucling (essentially) only the model of the ROR decision and a judge crummy vector ac- counted for about 32 percent of the vari- ance in bait amount set. The regression equation decided on as "best" included the charge seriousness, the number of charges, the seriousness of injury, whether there was a personal victim of the crime, two criminal history variables, age, and a dummy vector for judges it accounted for 48 percent of the criterion variance. 1 tOne practical difficulty is that He dimensional- ity of Be multidimensional cross-classification table quickly can become unmanageable and the proce- dure unstable as the number of empty cells in creases. isomer things equal, females were more likely to be granted ROR. were rearrested] for crimes against the person. Besting a chance rate under these con- ditions has proven difficult indeed. For example, Locke et al. (1970) couIc! not discriminate, among felony and miscle- meanor offenclers, those likely to fait on release based on a variety of background characteristics. Similar results were ob- served by Feeley and McNaughton (1974) with respect to failure to appear for trial and for rearrest while on release. Angel et al. (1971) also hac! little success in studying the predictive validity of the District of Columbia's preventive deten- tion co(les (in Boston). Not only was this study constrained by a Tow base rate, but many of the potential predictors also shower] remarkably Tow variance (see An- ge] et al.:306-3091. The act under study specified a number of criteria that shouIc! be taken into account in detention deci- sions, and Angel and colleagues opera- tionaTizec] them with 26 variables thought reflective of the criteria. No variable con- siclerec3 correlated higher than .23 with crime committed while on bad] (see p.

ACCURACY OF PREDICTION MODELS 392), and more than half (54 percent) correlates] at .10 or less. Variables that correlated greater than this with bad] re- cidivism included age at first incarcera- tion (.23), number of prior defaults (.22), number of charges (.13), "dangerous" crimes in past 10 yearsi6 (.22), years of education (.12), number of misdemeanor convictions (.15), release status at time of initial arrest (.16), amount of prior incar- ceration (.21), number of arrests for drunkenness (.11), age at first court ap- pearance (.16), juvenile record (.15), and "violent" crimes in the past 10 years (.14~. None of the community-ties variables consiclerecT correlated better than .10 with the criterion (indeed, most of these were approximately zero). Multivariate analy- ses suggested that all 26 variables consicI- ered together accounted for only 13 per- cent of the outcome variance.~7 Although the equation results in prediction that is modestly above the base rate (indeed, as Cureton, 1957, has clemonstratect, any vaTicI continuous predictor, properly used, must provide advantage over the base rate), it is far from desirable: uncler assumptions of the equation, one wouIcT have to detain about 10 persons for every pretrial release crime to be prevented. In the Los Angeles study mentioned above, M. R. Got~redson (1974) was able to explain 16 percent of the variation in FTA rates and 21 percent of the variance in arrests on release. The study used a relatively short "release period" (90 clays), however, because time on release and failure were substantially correlated (.53) (see also Clarke, Freeman, and t6These are defined with reference to the District of Columbia act and are somewhat odd. Definition r `` . r . ~ '' . at. r or crimes or violence IS even more peculiar; tor example, burglary is included, while assault and battery is not. Lit is not clear whether the approach used was a discr~minant function or multiple regression. Prob- ably, it was the latter; in any event the two are functionally equivalent in this case. 227 Koch, 19761. The bulk of the power of the FTA equations (lerivecl from variables concerning the present offense or offense history; little weight was given other fac- tors, although some "community ties" variables were predictive (e.g., employ- ment, living arrangements, ant! relatives in the area). The same is true for the equation developed to predict arrest on release. When examined on a validation sample, however, the most powerful model explained only about 3 percent of the variance in outcome considered. Clarke, Freeman, ant! Koch (1976) studied 756 rlefenclants releasecl on bait in Charlotte, N.C., in 1973, and found that court disposition time, defined ... as the amount of time elapsing from the defendant's release until the disposition of his case by the court (or until he fails to appear or is rear- rested, if either of these occurs before dispo- sition) must be considered the variable of greatest importance. Among the defendants studied, the likelihood of"survival" avoid- ance of nonappearance or rearrest-dropped an average of five percentage points for each two weeks their cases remained open (p. 34~. Criminal history (measured in terms of prior arrests) and the form of release (e.g., cash bait, bondsman) also were signifi- cantly associated with risk on bait (con- sidered as either failure to appear or rear- rest). Offense type, seriousness (felony or misclemeanor), sex, age, race, and income were not observed to be related to the outcome. In light of the finding that form of release is associated with outcome, it would be of interest to know what deter- mined that, but the issue was not studied by Clarke and colleagues. Roth and Wice (1978) included a study of the predictors of failure to appear and arrests while on pretrial release in their report concerning Washington, D.C., and found information concerning offense type, employment, and drug use to be associated with the former. Those same variables (with different offense catego

228 ries being preclictive), along with infor- mation about the use of weapons and offense history, were also associated with arrest while on release. In the Philadelphia study described earlier, GoIc3kamp ant] GottErecison (1981a,b) also sought to predict failure to appear and rearrests on pretrial release. Again, logit analyses were used, since the dependent variables considerecI were di- chotomous. Examination of bivariate rela- tions showed that, generally, those vari- ables associated with rearrest also were related to failure to appear, but that the relations usually were not as strong for the latter. Variables related to criminal history were markedly better than those of other types in relation to the rearrest criterion; this clid not appear to be as true for the FTA criterion. Community-ties variables also were related to both crite- rion measures, ant! the relations appeared somewhat stronger for the FTA criterion than for rearrests. Of the personal charac- teristics consiclerecT, only drug use and age appeared likely to prove useful. As aIreacly noted, other investigators (e.g., Roth and Wice, 1978) have observed that the type of instant offense, rather than its seriousness, seems to be related to bait outcomes. This also was fount] in the GolUkamp and Gottfrecison studies; the relations of offense seriousness to the criteria examiner! were inconsistent and weak, while those for type were more consistent and powerful. Multivariate analyses were carrier] out on four criteria: FTA, rearrest, rearrest for "serious of- fenses" (homicide, rape, arson, robbery, burglary, aggravated assault, ant] the manufacture-delivery and sale of drugs), and a combined rearrest and FTA index. As would be expected from examination of the bivariate relations (and from range attenuation in the case of the "serious offense" criterion), the final Togit models (leveloped were to some extent similar, and to some extent different. Type of CRIMINAL CAREERS AND CAREER CRIMINALS offense, age, prior FTAs, pending charges, recent arrests, and the interac- lions of some of these (e.g., over age 44 x prior FTAs) were important in terms of impact on expected Tog odds of the com- bined inclex. Summarizing differences between the models descriptive of the rearrest and FTA criteria, GolUkamp and Gottfredson (1981b:311-312) reported that the two criteria of flight and rearrest did share common correlates; most of the defendant attributes and the prior criminal history vari- ables that were associated with failure to ap- pear at trial were also associated with rearrest. There were, however, some significant excep- tions . . . charge seemed to play a different role in the two phenomena. Gambling charges were indicative of a low FTA probability, but a high rearrest probability. And prostitution charges appeared to be associated with rear- rest probabilities, but failed to reach signifi- cance in the FTA model. Employment corre- lated with rearrest but not with FTA. However, age, pending charges, prior FTAs, recent arrests, and the charge of serious per- sonal offense were all associated both with the probability of FTA and rearrest. When the serious arrest criterion was con- sidere(l, only four items (age, employ- ment, pending charges, ant] recent ar- rests) were significant. Prosecution Decision Studies Despite the enormity and importance of prosecutorial decision making, empiri- cal studies of the charging decision are not common (M. R. GottEredson and D. M. Gottfredson, 1980b; Aclams, 19831. Observational studies (e.g., Miller, 1970), self-report or introspective studies (e.g., Kaplan, 1965), reports based on struc- tured and unstructured interviews (e.g., Cole, 1970; Jacoby, Ratledge, and Turner, 1979), and simulation studies (e.g., Lagoy, Senna, and Siegel, 1976) have given a number of solid clues about

ACCURACY OF PREDICTION MODELS the manner in which prosecutors appear to use information in their decision mak- ing. As noted by M. R. GottEredson and D. M. Got~redson (1980b: 153), however, "if systems are to be clesigned to enhance rationality . . . it is important also to know what factors are the primary influences in most cases. This requires systematic em- pirical study based upon representative samples and quantifiable data." Descriptive Studies In a study of more than 1,200 males arraigned for felonies during a S-month period in New York City, Bernstein, Kelly, and Doyle (1977) attempted to identify factors that influenced decisions to prosecute or to terminate cases by clis- missal. Forty percent of the cases were dismissed (c£ Forst, Lucianovic. and Cox. 19771. Most important to the dismissal decision was a charge-recluction variable: the likelihood of dismissal increased sub- stantially if a defendant's felony charge was reduced to a misdemeanor at the latest possible opportunity (at or after the preliminary hearing). Unfortunately, charge reductions themselves were not the subject of study in this investigation. Also significantly relater] to the dismissal decision was the nature of the offense charged (the likelihood! of dismissal in- creased if the most serious arrest charges were burglary or assault), the total num- ber of arrest charges (those with fewer were more likely to be clismissed), and pretrial detention status (those cletained prior to final disposition were more likely to be dismissed). None of the clemo- graphic variables studied was related sig- nificantly to the decision (these included age, race-ethnicity, time employed, eclu- cation, and marital status), nor were a variety of criminal history variables (e.g., a weighted index of prior convictions, the time elapsed since the most recent prior arrest). Bernstein and colleagues inter 229 prefect these findings as suggestive that evi(lentiary issues primarily were consicl- ered (i.e., witnesses are rare in burglary cases, and a large number of charges may indicate that a strong case can be macle). In a separate study involving both male and female defendants, Bernstein and colleagues (Bernstein, Kicks, et al., 1977) did study the issue of charge reductions. More than 1,400 cases involving burglary, assault, larceny, and robbery charges were studied. The dependent variable, charge reduction, was defined relative to the absolute reduction possible. Separate analyses are reported for cases disposed at first presentation ant] those not so dis- posec3. In neither case was prediction powerful (R2 = .19 for the former and .13 for the latter). Considering only cases disposal at first presentation, seriousness of the first charge, offense type (burglary, assault), weapon charge, age, and crimi- nal record were significant predictors; no demographic variable other than age ap- peared related to the criterion. For cases not clisposec1 at the first presentation, the seriousness of the first presentation charge, resisting arrest, race, and criminal record were significant predictors. In both equations the greater the criminal record, the greater the reduction in charges. These studies raise two intriguing is- sues. These concern the influence of evi- clentiary issues on the charging decision and prosecutorial treatment of recidivistic offenders. Using ciata available through the PROMIS system, Forst, Lucianovic, and Cox (1977) found that 21 percent of the more than 17,000 arrests studied were rejected by prosecutors at initial screen- ing and that witness and eviclentiary rea- sons were given by the prosecutors for about 59 percent of those rejections (cf. Brosi, 19791. For cases dismissed later, i8See Hamilton and Work (1973) for a general description of this management information system.

230 witness problems remained important to the decision (about 13 percent), but evi- dentiary issues infrequently were cited as important to the decision. Also using the PROMIS system, Adams (1983) stucliec3 the relation between evidentiary factors and charge reductions. Significant, but very moclest, effects were observed for the recovery of property or physical evi- dence (¢ = .05),~9 arrest made at the scene ofthe offense (¢ = .08), the relation between the victim and the offender (¢ = .10), and the number of witnesses (Iow or high; ~ = .06~. When considered by of- fense category, relations differecI both in terms of significance and magnitude. lacoby (1977) also has observed that victim-offencler relations, eviclentiary fac- tors, and offenders' prior records are im- portant to charging decisions. Williams (1978), using PROMIS data, has not only shown the importance of victim-offender relations to the charging decision but also that the effect varies with type of offense considered. Forst and Brosi (1977:19~191) exam- ined both evidentiary and recidivistic is- sues in relation to the charging decision and concluder! that the stucly provides strong support for the hypothesis Mat Me prosecutor attaches importance to Me sirens of evidence in a case. More prosecu- dve attention was also given to cases involving more serious offenses, although the prosecu- tor's decision to carry a case forward appears to have been about an order of magnitude more sensitive to strength of evidence than to crime seriousness.... The findings, on the over hand, provide no empirical support to the hypothesis that the prosecutor attempts to give more attention to cases involving defen- dants with extensive arrest records. This conclusion may be questioned, how i9These are approximate values, calculated by us form summary tables reported in the article. CRIMINAL CAREERS AND CAREER CRIMINALS ever, because prior record was included in the "strength of evidence" variable. Normative Studies Normative prediction studies of prose- cutorial decisions are very rare. Given the absence of offender behavioral outcomes to study, the first question to be ad- dressed is "what is it that should be pre- dicted?" If, in general, prosecutors wish to "win" cases, perhaps a criterion of "conviction obtained" is a reasonable one. This issue received attention in a study by Rhodes (1978), who used probit analysis to estimate the probability of conviction given that cases were ac- cepted for prosecution. Once cases were accepted for prosecution, Rhodes found it difficult to predict whether they would lead to a conviction at trial. R2 for equa- tions developed for assault, robbery, lar- ceny, and burglary cases ranged from .10 (for larceny) to .37 (for robbery). Although differences were observed across offense types (see Rhodes: 80), the following vari- ables were found to be significantly asso- ciated with the probability of conviction: age, whether the defendant was arrested the same day that the offense occurred, whether physical evidence was available, the number of charges, whether the de- fendant was arrested at the scene of the crime (although not necessarily at the time of the offense), the number of lay witnesses, whether the defendant was re- leased on recognizance pretrial, whether the defendant was granted a third-par release, if there was corroboration that a crime was committed, and whether excul- patory evidence was present. Sentencing Decisions It is in the sentencing of convicted offenders that discretionary decision mak- ing in the criminal justice system is most

ACCURACY OF PREDICTION MODELS publicly apparent, and it also is in this area that the relation of desired goals to decisions macle can most readily be expli- catecT (M. R. Gottfredson and D. M. Gottiredson, 1980b). There is a large and controversial literature on the goals and proper purposes for the sentencing of criminal offenders (cf. H. L. A. Hart, 1968; Kleinig, 1973; Morris, 1974; Dershowitz, 1976; von Hirsch, 1976; Mueller, 1977; Grossman, 1980~. Four traditional goals have been central to this debate: rehabil- itation or treatment, clesert or retributive punishment, deterrence (general or spe- cific), ant] incapacitation. Each has a long history in practice, in moral philosopy, and in legal discussion and debate. Philo- sophical and legal debate concerning sentencing purposes and practices, how- ever, is far more extensive than research on those purposes and practices. Al- though considerable research has been focused on the correlates of sentencing decisions (e.g., Galton, 1895; Gauclet, Harris, ant] St. John, 1933; J. Hogarth, 1971; Pope, 1976, 1978; Sutton, 1978; see reviews in Hagan, 1974; L. Cohen and Kluegel, 1978; Garber, Klepper, and Nagin, 1983; Hagan ant] Bumiller, 1983; Klepper, Nagin, and Tierney, 1983), rather less has been focused on the pur- poses and consequences of those cleci- sions. Of the goals cited, only one does not require prediction. The goal of cleter- rence involves the prediction that punish- ment of known offenders will discourage others from crime, or, in the case of spe- cific deterrence, that the offender pun- ished will be cleterrec] from future crimi- nal involvement. The goal of treatment or rehabilitation involves the prediction that offenders may be changed to reduce the likelihood] of repeated offending; and that of incapacitation requires the prediction of new offenses if offenders are not re- strained from committing them. Only the 23] goal of desert (the application of punish- ment in proportion to the gravity of the harm done and the culpability of the of- fencler) seems to require no prediction (S. D. Gottfredson and D. M. Gottfredson, 1985~.20 As noted earlier, this paper is con- cerned primarily with the prediction of offencler's inclividual-level behavioral outcomes. It is possible, we believe, to treat sentencing decisions within a selec- tion framework, but this is not often done. For selection to be effective, the goal of the selection decision must be explicit. Ideally, decision makers would agree not only on the goal for the selection deci- sion, but also on the criteria on which the decision will be based. One has but to review the literature cited above to real- ize quickly that no such agreement exists. We do not fire] it surprising, therefore, that evidence concerning the effective- ness of rehabilitation or treatment efforts has proven (liscouraging (Lipton, Martin- son, en cl Wilks, 197S; Sechrest, White, and Brown, 1979; cf. M. R. GottEredson, 1979b) or that the efficacy of deterrence and incapacitation has proven difficult to estimate (Blumstein, Cohen, and Nagin, 1978). Rarely are the intents of a sentencing decision unitary. Not only clo judges ap- parently seek to cleter some offenders, punish others, incapacitate some, ant] re- habilitate still others, but also these "sim- ple" intents may in fact be melded in a sanctioning decision even with respect to a single offencler. These need not be and probably are not indepenclent con- cerns on either the aggregate or the incli- vidual level. D. M. Gottfredson and Stecher (1979) studied the purposes Bother less commonly cited goals, such as retri- bution or retaliation, also do not appear to us to require prediction (O'Leary, GottEredson, and Gelman, 1975).

232 given by 18 judges in imposing criminal sanctions on almost 1,000 adult offenders. The judges usually clic3 not assign any single goal as the purpose for the sen- tence imposed; rather, they generally dis- tributed the sanction among several pur- poses.2i Rehabilitation was the purpose given the principal weight in the largest proportion of cases (36 percent), followed closely by "other purpose, including gen- eral deterrence" (34 percent). Retribution was assigned the principal weight in 17 percent of the cases; special deterrence, in only 9 percent.22 Surprisingly, only 4 percent of the cases reportedly had inca- pacitation as a primary intent (although imprisonment was not, of course, the only sanction applied. Based on multivariate analyses, however, the authors (D. M. Gottfrecison and Stecher, 1979:179) re- port: The one item that appeared from the discriminant analysis to have the strongest association with the choice of primary aim (in the context of all the items included) was the judge's prediction of recidivism by an offense against persons. This suggests that the rela- tively infrequent selection of incapacitation as a principal goal may be misleading and that judges may employ this concept without nec- essarily labeling it as such. Alternatively, it may suggest that, for those judges, utilitarian purposes may provide a partial justification for retributive aims. In any case, these data support the conten- tion that all the main purposes of sentencing play a role in the choice of alternative sanc- tions. The specific purposes related to judg- ments are rarely specified explicitly, however, and such identification is required if it is desired to learn how the rationality of such decisions can be improved. 2iThe judges were asked to distribute 100 points among several commonly cited purposes or to as- sign this value to any single purpose. The only constraint imposed was that the total points as- signed sum to 100. 22The judges had decided on the purposes to be studied. CRIMINAL CAREERS AND CAREER CRIMINALS Regardless of the actual proportion of cases for which an incapacitative intent is primary, it is clear that judges can rather easily apportion a sanction in terms of its compound intents. Further, it is clear that at some level at least, judges make an intuitive or clinical judgment ofthe risk particularly risk associated with reciclivis- tic harm to persons associated with the offender. S. D. Gottfrecison and Taylor (1983) recently clemonstrated that in a sample of 86 criminal court judges, half (51.4 per- cent) reported that rehabilitation should be the principal purpose for sanctions imposed; the remaining half, however, were as likely to avow any one of the remaining goals studied (incapacitation, retributive punishment, or general deter- rence) as any other. Hale (1984) subse- quently clemonstrated that these "goal preferences" are relatecl to the lengths of terms imposed on offenders even after controlling for offense and offender char- acteristics.23 Not surprisingly, interac- tions of goal preferences and offender and offense characteristics also were identi- fiecl as determinants ofthe term imposed. Descriptive Studies Given the recent and extensive re- views of the correlates of judicial deci- sions (Blumstein et al., 1983; Garber, Klepper, and Nagin, 1983; Hagan and Bumiller, 1983; Klepper, Nagin, and Tierney, 1983), we do not consider de- scriptive studies in detail here. Our own reading of the literature, however, leads to agreement with Garber, Klepper, and Nagin (1983:133-134~: The conclusions ofthe various studies of final case outcome can be summarized as follows. First, virtually all the studies that include a variable measuring the charge found that the 23Information concerning sanctions other than imprisonment was not available.

ACCURACY OF PREDICTION MODELS seriousness of the offense is the most impor- tant factor affecting case outcome. This is most evident for studies that analyze only convic- tions. Second, all Me studies conclude that the prior record of the defendant is important. Third, all the studies that include a variable denoting whether the defendant makes bail infer that it is an important factor in case outcome. Fourth, most of the studies Mat in- clude legal representation found that it affects case outcome, but Me nature of this effect varies considerably among the studies.... Fit, type of conviction generally seems to be important: Defendants who plead guilty fare worse on average Man those who plead not guilty . . . but fare better than defendants who are convicted at trial.... The inferences con- cerning Me role of extralegal characteristics [e.g., race, socioeconomic status] differ consid- erably across studies. One point of agreement is that if extralegal characteristics affect out- come, their quantitative significance is small compared with over factors discussed above. Despite the consistency of observed ef- fects, particularly for offense seriousness ant] prior record, the bulk of the variation . · · In sentencing c Recisions remains unex- plainecI; studies in which R2 exceeds .30 to .35 are uncommon. Normative Studies Given the lack of clarity of goals that was cliscussec] above, it is clifficult to con- ceive of the optimal normative sentenc- ing-clecision stucly. With respect to the goal of rehabilitation, one couIcT attempt to assess offenders with respect to amena- bility for treatment,24 and selection de- vices then could be developed and their accuracy and operational efficacy as- sessed. With respect to the goal of specific deterrence, which may be considered a subproblem within the rehabilitation ori- entation, the operational definition of an adequate criterion measure is exceecI 24This, of course, quickly could become complex given the wide variety of rehabilitative treatments that have been proposed. 233 ingly complex (Manski, 19781. In prac- tice, it likely would reduce to an unsatis- factory recidivism measure of some sort. How one would set about assessing of- fenclers' clifferential amenability to a spe- cific-deterrence effect is not clear to us. But it should be notes! that the general selection problem is the same whether persons are to be selected, on the basis of amenability, for the treatments of con- finement, education, therapy, or some other procedure intended to modify the criminal behavior of the offender. It is with respect to the goal of incapac- itation that normative prediction studies may be of most value. (Or at least, most immediate value we continue to cling to a concern for the goal of rehabilitation, for which such tools can be important.) Judges do appear to include a risk consicl- eration in the setting of sanctions, ant! we c30 know something (unfortunately not enough) about the assessment of risk. Indeed, recent proposals for "selective incapacitation" (Greenwood ant! Abra- hamse, 1982; Forst, 1983; cf. also Green- berg, 1975; von Hirsch and GottTredson, 1984) rely heavily on statistical assess- ments of the risk of recidivism. Accorc3- ingly, these (anal other) studies may prop- erly be treated within our normative decision-study framework. Examination of the efficacy of the proposals, however, depends heavily on critical estimates of rates of offending (Blumstein and Cohen, 1979; Blumstein and Graddy, 1982; I. Cohen, 1983b). In other portions of this paper, we summarize what is known about the accuracy and validity of norma- tive recidivism-precliction studies, and we also consi(ler proposals for selective incapacitation in detail in a later section. Parole Prediction-Decision Studies As we have noted, prediction studies involving criminal populations or relating in some way to concerns of the criminal

234 justice system are voluminous. This is especially true of normative studies con- ceming paroling decisions.25 SchuessTer (1954) outlines the historical clevelop- ment of such studies from the early 1920s (beginning with the work of H. Hart, 1923) through the micI-1950s (em., Glaser, 1954; Kirby, 1954~. ~- .~` . / ~ ~ _~\ . an, , Mannheim and Wilkins ( BYTE) review research efforts to about 1953, and Rose (1966) ant! D. M. Gottfredson (1967) summarize research in parole prediction through the micI- 1960s. Simon (1971) offers a very careful and detailed review of more than 40 of the more prominent studies (e.g., VoIc3, 1931; Glueck and Glueck, 1950; OhTin, 1951; Mannheim and Wilkins, 1955; D. M. Gottfrecison and Beverly, 1962; Glaser, 19641. Mannheim and Wilkins (1955) and D. M. Gottfredson, Wilkins, and Hoffman (1978) provide brief histor- ical reviews that show the parallel clevel- opment of such efforts in the English- speaking and European literatures (e.g., Shiedt, 1936; Trunck, 1937; KohnIe, 1938; Meywerk, 1938; Gerecke, 1939; Frey, 19511; the 1978 review includes some detail concerning developments during the 1970s. Descriptive Studies Descriptive studies of parole decision makers are rare and have tended to be primarily ethnographic (e.g., Dawson, 1969~. The earliest such effort was that of Wamer (1923), in which tables summa- rized the relations of 67 items of informa- tion (then available to decision makers at the Massachusetts Reformatory) to the parole decision and parole outcome. Wamer did not test the significance of any of these relations, yet concluclec3 that the decision makers atten(lec3 well to salient information and that "poor as the criteria 25Savitz (1965) compiled a bibliography of such studies that contains more than 600 envies. CRIMINAL CAREERS AND CAREER CRIMINALS now used by the Board are, the Board would not improve matters by consicler- ing any ofthe sixty-ocld pieces of informa- tion placecI at its disposal, which it now ignores" (Warner: 196~. In a quick rebut- tal, H. Hart (1923:405) suggester! "that the percentage of violations of paroles among men parolee] from the Massachu- setts Reformatory couIc3 be reduced one- half through scientific utilization of data . . . is the conclusion which should have been reached by the analysis of statistical data presented by Professor Warner." In fact, it is quite likely that neither Warner nor H. Hart was correct: Warner had sys- tematically sampled equal numbers of successes and failures and examiner! "only 80 cases of prisoners not paroled . . . because a larger number of cases with complete records could not be found" (p. 176, footnote 3~. Although one might be able to reweight cases from other infor- mation presented by Warner, the rela- tively small sample sizes, particularly of persons not paroled, probably would make this risky. In any event neither Warner, in his analyses by inspection, nor H. Hart, who made use of very recently cleveloped statistical methods, attended to the base rate and other sampling con- cems. Still, H. Hart is usually, and appro- priately, creclite(1 with first introducing the concept of the experience table for parole prediction (Schuessler, 1954~. Warner was, we believe, the first to at- tempt to compare the practices of parole decision makers with the potential power of "statistics." Although he (lid not specifically ad- clress the question of factors apparently used by (recision makers, Glaser (1955, 1962) demonstrated the relative superior- ity of an actuarially derivecl predictive device to decisions made by sociologists and psychiatrists. The prognostic judg- ments were of likely parole outcome; ac- tual parole outcome was the criterion. Predictions ma(le by the sociologists

ACCURACY OF PREDICTION MODELS studied were marginally more accurate than those made by the psychiatrists (MCRs were .l9 ant! .14, respectively); and the decision makers' overall assess- ments were more accurate than was a classification based on ratings of a num- ber of personality factors. Still, a simple statistical combination of items was most accurate (MCR = .351. Similarly, D. M. Gottiredson (1961, D. M. Gottiredson and Beverly, 1962) clemonstrated that, al- though both subjective prognostic parole judgments and a simple actuarial device correlated significantly with actual out- comes, the device was the more powerful predictor (r = .48 versus .201. Further, when the subjective judgments and the statistical information were combined, "the subjective ratings aciclec3 nothing to the predictive accuracy of the simple checklist" (1962:5S). There is evidence to suggest that, when differences in cases judges! are con- trolled, parole decision makers tend to make very similar decisions (D. M. Gottfrecison and BalIarcI, 19661. Whether this results from the similar subjective treatment of similar items of information was not investigated. Parker (1972, cited in Kastenmeier ant] Eglit, 1973) surveyed parole board members for opinions of "the general worth" of a variety of oris- oner characteristics for "predicting the success of a man on parole," and com- pared those opinions with the ranges of actual success rates of parolees showing these characteristics (relative to the base rate). Characteristics thought to be prog- nostic of parole outcome inclucled a his- tory of frequent intoxication, age (but only in one direction; the decision mak- ers correctly believer! that older inmates tend to succeecI, but failed to report that younger inmates tend to fait as they do), juvenile record, whether the inmate left home at an early age, whether the in- mate's family showed active interest in the inmate during his imprisonment, nar 235 colic addiction, employment history, con- structive use of prison time, whether the inmate was a "leacler" in the commitment of the crime for which he was impris- oned, probation violations, ant! offense type (they were wrong more often than right, with respect to the latter). How these judgments related to actual cleci- signs made is not known. Scott (1974) studied parole decisions in a "midwestern state" during 1968, a pe- riod in which indefinite or indeterminate sentencing was in effect. Thus "not only Edit] the parole board have the responsi- bility of determining the proper length of incarceration for each offender Egiven] an indefinite sentence, but. . . they shad] the prerogative to overrule legislatively en- actecl minimum sentences, or judicially imposer] minimum or definite sentences, anti release inmates when they EfeTt] the inmates shouIc! be released" (p. 2151. By studying the factors associated with time served, Scott was in effect studying parol- ing decisions, with the advantage that a continuous outcome criterion couIcT be used. Six of the variables studied hac3 significant zero-orcler correlations with time served: the seriousness of the crime (defined as the legal minimum sentence, in months, imposed by the court; .84), disciplinary reports (the number received while incarceratecl; .24), age (.59), educa- tion (-.27), IQ (as measured by the re- visec! beta; -.16), and sex (females served less time; -.161. Practice in this jurisdic- tion was such that only inmates' files were reviewed in making a paroling de- cision; inmates cliff not appear before the board until after the decision hac! been made. Of the factors available in the files and stuclie(1 by Scott, only those listed were significantly related to the decisions macle. When they (and others) were stucl- ied in a multiple regression framework, the zero-order effects for education and IQ (lid not hoIcl up, and effects (in order of relative magnitude) for socioeconomic

236 status, marital status, ant] prior record were observed. The remaining zero-order effects remained significant in the multi- ple regression equation (R2 = .791. By far, the measure of offense seriousness used had the greatest effect on time served (beta = .64), followecl by age (.31), disci- plinary reports (.18), sex ~-.17), socioeco- nomic status ~-.10), marital status (.08), and prior record (-.06~. The negative ef- fect observed for prior record reportedly was due to a policy of paroling inmates against whom Retainers had been filed; these inmates also typically hac] longer records. Evidence that parole decision makers are influenced by institutional variables (e.g., punishment received for infractions while incarcerated, escapes) also is avail- able. Using data from a series of studies concerning federal parolees (D. M. Got~redson, Hoffman, et al., 1975), M. R. Gottirecison (1979a) assesses] the effects of these variables on the length of time served, after this had been residuaTized with respect to the original sentence length; both the number of"prison pun- ishments" received and escape history explained significant proportions of the variation in time served, once this tract been resicluaTizec] for the term set. Using both items, 28 percent of the remaining variance in time served was explained. Elion and Megargee (1979) studied pa- role decisions made relative to 958 black and white men incarcerated at the FecI- eral Correctional Institution in TalIahas- see over a 4-year period (197~1974~. Us- ing multiple discriminant function analysis, they found that the maximum term imposed by the court (Wflk's A = .84), a scale reflecting adult maladjust- ment and deviance (.79), a rating of the violence of the instant offense (.75), the rate of disciplinary reports (.72), and juve- nile conviction record (.71) significantly preclictec3 the parole decisions made. Complete data were available for only CRIMINAL CAREERS AND CAREER CRIMINALS 310 offenders, but the function correctly iclentified 79 percent of them. Adapting Wilkins' "information board" memos (Wilkins ancL Chandler, 1965), D. M. Got~redson, Cosgrove, et al. (1978) sought to understand parole decision makers' use of case-file information. Only three items of information were always requested by decision makers: offense, age, and alcohol history; the first two typically were requested early in the de- cision process, the third typically was used later. In general, decision makers "paroling" and those "not paroling" sought clifferent informational items. Fur- ther, "the same decision often was made on entirely different bases; that is, dif- ferent information was used by clifferent people to arrive at the same conclusion" (p. 182~. In a separate analysis, D. M. GottEred- son, Cosgrove, et al. (1978) used multiple regression methods to examine the influ- ence of decision makers' subjective judg- ments of the seriousness of the instant offense, institutional program participa- tion, the offencler's institutional discipli- nary record, and risk of parole violation on two decision criteria: continuance (in months, with "parole" Ire ate cl as zero) and a recommendation of time to be served prior to the next review. Neither the judgments of disciplinary record nor program participation (which were highly correlated) were significant predictors of each decision. The subjective assessment of Me seriousness of the commitment offense and the risk prognosis together explained about half the variance in each decision stuclie(l; but offense seriousness alone accounted for a vastly dispropor- tionate amount of that variation. Simi- larly, Daiger et al. (1978) found a measure of offense seriousness and predictions of future behavior to be related to parole (recisions. Carroll and colleagues (Carroll, 1977, 1978a,b; Carroll and Payne, 1976, 1977a,

ACCURACY OF PREDICTION MODELS b; Carroll et al., 1982) have stucliec3 parole decisions from the framework of attribu- tion theory. One study (Carroll and Payne, 1977a) involves! tape-recording parole de- cision makers as they "thought out loud" about the cases being reviewed. At- tributional statements represented the single largest category of statements made (beyond the factual information reacI). Often, these were causal attribu- tions concerning the "instant" criminal event or the offenders' criminal histories (see Carroll, 1978a). These causal attribu- tions were found to be significantly asso- ciated with decision-making outcomes: offenders whose crimes were attributed to stable, enduring causes (e.g., serious drug abuse) were considered worse pa- role risks than other offenders and re- ceived less favorable parole consicler- ation (Carroll, 1978b). Carroll et al. (1982) found that for the Pennsylvania Parole Board, institutional behavior and "preclic- tions" of future risk and rehabilitation, in addition to causal attributions, were im- portant to paroling decisions. On follow- up, however, these predictions were fount! to be virtually unrelated] to actual post-release outcomes. A descriptive study of parole board de- cisions in California, a setting character- izec] at the time of the study by wide indeterminacy in sentencing and broad authority of the board to set terms and to grant or withhoIc] parole, was completed by D. M. Gottfredson ant! BalIarcI (1964b). Various decision outcomes were modeled for mate and female offenders (who had separate parole boards) in terms of attributes of the offenders. The cleci- sion outcomes used as criteria incluclect: total terms set, months to be server! in prison, months to be served on parole, and months to be served in prison after the minimum parole eligibility clate. The minimum parole eligibility date was a legal constraint, varying among offenders and determined by the law, on the time 237 the offender would be required (by the parole board) to remain in prison. Thus, the last criterion listed above is of most interest in terms of the discretion of the board. For males, an R2 of .45 was found, in a validation sample, for prediction (by mul- tiple regression) of prison sentences be- yonct the legal constraint. Items most closely associated with that criterion were classification of the legal offense of conviction, an offense seriousness rating, the number of prior prison confinements, and a history of opiate drug use. Based on a clustering method that suggested a market! decrease in heterogeneity of the sample when offenders were classed as with ant! without prior prison terms, sep- arate equations were clevelopec! for those two groups. This improved prediction overall. For men who had been in prison be- fore, the legal offense class ant] the num- ber of prior prison confinements were most closely associated with the criterion (prison time beyond minimum). For men not in prison before, the best predictors were the offense seriousness rating and the history of opiate drug use (although the record of prior incarcerations also was found to be a useful predictor). For offenders generally, when the length of time required on parole (for those who were paroled) was the crite- rion, the best predictors were the number of months required before the minimum eligible parole date (the legal constraint on time to be served in prison but not on parole) and the history of opiate use. For female offenders, separate analyses were done with ciata for three groups of women. These groups were cleaned by a clustering method (intended! to reduce the heterogeneity of the total sample) that resultecl in three subgroups (D. M. GottEredson and Ballard, 1965a). These were calle(l: "conventional offenders" (women with no prior incarceration in

238 either jails or prisons); "persistent offenc3- ers" (women with prior incarceration but no history of heroin use); and "persistent offender-users" (women with prior incar- ceration and a history of heroin use). When prediction models were clevelopec] for the total group and for the three groups separately, the offender character- istics studied accounted for about one- third of the variation in terms beyond the legal constraints on the boarcl's decisions. The three groupings by themselves were good discriminators of both the parole board's decisions as to time required to be served and a recidivism criterion. Also studied were decisions as to the granting of parole. If parole was not granted, consideration was usually post- poned, although the person council, in most cases, be discharged. (Of 14,682 men who appeared before the board in the fiscal year 1962-1963, parole was granted to 39 percent, consideration was postponed for 57 percent, and 4 percent were discharged.) Differences among the groups paroled and not paroled included, for example: the type of commitment (original, parole violator), the legal of- fense, prior board appearances, assaultive history, use of weapons, opiate use his- tory, custody classification, disciplinary infractions, work assignments, participa- tion in various institutional programs, and aspects of the person's parole plans. Analyses aimed at modeling the parole board's decisions in North Carolina, Vir- ginia, Louisiana, Missouri, California (Youth Authority), Washington, ant! New Jersey had somewhat similar results, al- though these varied by jurisdiction (D. M. GottEredson, Cosgrove, et al., 1978~. Case evaluation forms were completed by the decision makers at the time of the hear- ings, and a number of items reflected their subjective judgments (e.g., "parole prognosis," an estimate of the risk of pa- role violation if paroled). In Norm Carolina the following corre CRIMINAL CAREERS AND CAREER CRIMINALS lates (point biserial coefficients) of deci- sion-maker ratings with the clecision to parole or not were observed: parole prog- nosis (.60, N = 2,968), institutional dis- cipline (.49, N = 2,968), program partici- pation (.53, N = 2,520), social stability (.39, N = 2,974), prior record (.32, N = 2,980), assaultive potential (.27, N = 2,963), and prior criminal record (.32, N = 2,9801. The rated seriousness of the offense, the maximum sentence, the num- ber of prior hearings, and the time already server] were not related to the decision to grant or deny parole (p. 421. Similar results were observed in Vir- ginia. Decision-maker ratings were corre- lated with the decision to grant or cleny parole as follows: parole prognosis (.77, N = 1,685), institutional discipline (.39, N = 1,641), program participation (.38, N = 1,532), social stability (.37, N = 1,663), prior criminal record (.33, N = 1,680), and assaultive potential (.28, N = 1,6701. Ratings of the offense seriousness were correlated with the decision outcome also, but slightly (.08, N = 1,688~. The time served, the maximum sentence, and the number of prior hearings were not correlated with the decision (D. M. Gottiredson et al. :751. Finclings from Lou- isiana and Missouri were similar to those just notes] (Gottfrecison et al.:107-108, 13~1361. In Washington state, for reasons associ- ated with the legal structure at the time of the study, which resulted in wide discre- tion for the parole board, the analyses focused on the setting of the minimum sentenced and the time required to be served in prison. A multiple regression equation to predict the minimum sen- tence set, including classifications of the offense ant] maximum sentences, to- gether with ratings of the seriousness of the offense, resulted in a stu(ly sample R2 of.63 (N = 502~. An equation to preclict the time server] by offenders paroled, which included four items, resulted in an

ACCURACY OF PREDICTION MODELS R2 of.43 (N = 5301. The four items were drug sales offense with a maximum sen- tence of 20 years or more, nonviolent offense with maximum sentences of more than 1G but less than 20 years, and deci- sion-maker ratings of the seriousness of the offense and the prior criminal record (D. M. GottEredson et al.:223-224~. In New Jersey, multiple regression equations were calculated for the clepen- dent variables "months served in prison by offenders parolecI" and "parole grant/deny." In the case of the first, five items provided an R2 of.88 (N = 233) in the study sample. These were maximum sentence, rated offense seriousness pro- gram participation, prior criminal record, and parole plan. With the second crite- rion (parole or not) an R2 of.48 (N = 504) was found when these items were used: maximum sentence, rate c3 offense seri- ousness, parole prognosis, program par- ticipation, and quality of the parole plan (D. M. Gottfredson et al.:24~2491. Although the correlates of parole board decisions vary among jurisdictions (as do legal structures and paroling authority goals), common correlates include deci- sion makers' judgments about the offenc3- ers' prior criminal records and institu- tional adjustment, whether the latter is assessed in terms of disciplinary infrac- tions or participation in programs or both, ant] about the likelihood of new offenses if paroled (particularly the estimated probability of violent crimes). Often, and differing by jursidiction, ratings of the seriousness of the offense of conviction are correlates! with decision outcomes, as is the time that already has been served when the decision is made. Normative Studies Given the really availability of several detailed reviews of this voluminous liter- ature (e.g., Mannheim and Wilkins, 1955; Rose, 1966; D. M. Gottiredson, 1967; 239 Simon, 1971; D. M. Gottfrecison, Wilkins, and Hoffinan, 1978), we will not repeat that effort. Rather, we focus in this section on two issues: the identification of spe- cific variables that have been found to have predictive utility across a range of samples ant] studies and a consideration of the general degree of accuracy ob- tainted in such studies. We therefore do not give cletailec! attention to individual studies (as in previous sections). We were greatly assisted in this effort by the re- views cited, particularly those of Simon (1971) ant! D. M. GottErecison (1967), and by a comparative summary prepared by Glaser and O'Leary (1966~. Our focus here is on behavioral and demographic correlates; thus, we largely ignore several extensive research tradi- tions, which also largely have been ig- norec] in previous reviews. In particular, we do not treat research relating to psy- chological or psychiatric prognostica- tions, tests, or other personality assess- ments. Nor (lo we treat research concerning the impacts of large-scale so- cial and economic forces (e.g., Ehrlich, 1973,1974; Forst, 1976; Vandaele, 1978~. Finally, we do not review research con- cerning the areal or ecological correlates of crime and recidivism, despite growing evidence that inclusion of these factors may JO much to improve the prediction of recidivism (S. D. GottErecison and Taylor, 1986~. For reviews, see Baldwin (1979) and S. D. Got~reclson and Taylor (1986~; for suggestions ofthe likely importance of situational factors, see Monahan (1978. 1981) and Monahan ant! Klassen (1982~. Past Criminal Behavior. It is a psy- chological truism that the best predictor of future behavior is past behavior. Not surprisingly, one of the best predictors of future criminal con(luct is past criminal conduct, and the parole-prediction litera- ture amply supports this fact. From the earliest studies (e.g., Burgess, 1928; Vold,

240 1931) to the latest (e.g., Palmer and CarIson, 1976; D. M. Gottfredson, Cos- grove, et al., 1978; Schmidt and Witte, 1979; Carroll et al., 1982; S. D. Gottfred- son and Taylor, 1986), indices of prior criminal conduct consistently are found to be among the most powerful predictors of parole violations, arrest for the commit- ment of new offenses, and conviction and reincarceration for these. This generalization tends to hold re- gardless of the measure of prior criminal conduct used or of specific operational definitions of that conduct. For example, the previous arrest history, the prior con- viction history, the record of commit- ments to jai] or to prison, the length of "gaps" in the arrest or conviction history (e.g., time free without arrests), the his- tory of prior probation or parole viola- tions, the age at first arrest, the number of commitments to correctional institutions, the number of prior court dispositions of any type, and the types of prior offenses all provide examples of variates often found predictive of future arrests or con- victions. The apparent strength of associ- ations with the criteria of interest vary among samples and criteria, but it is nev- ertheless commonly found that such items are among the best predictors iden- tified. Some are more reliable than others, some are more readily extracted from the records, and some depending on the in- tended application-present legal or eth- ical objections. All these factors would, of course, be important to consider in the selection of predictor candidates. Although the means of assessing prior criminal involvement have varied widely, we know of no prediction study in which a measure of criminal history, if available for assessment, did not emerge as significantly associated with the out- come criterion (which also has varied widely). In most studies, prior record ap- pears to be the most powerful of the variables examined although this leaves much to be desired. Because few studies CRIMINAL CAREERS AND CAREER CRIMINALS have used common criteria or definitions, it is difficult to provide an adequate sum- mary of the relation between past and future criminal behavior; this difficulty is exacerbated by the fact that samples also have varied. Finally, a wide variety of methods have been used to examine these relations, and they often are not readily comparable. As examples: Mann- heim and Wilkins (1955) used a contin- gency coefficient adjusted for restriction and observed values of from .31 to .24; Vold (1931) used unadjusted contingency coefficients and observed a value of .28 for the relation of prior record and parole outcome. This index may be readily cal- culated from data given by D. M. Gottfredson, Wilkins, and Hoffman (1978) and results in coefficients of.23 to .21, depending on the item assessed. Tibbets (1931) and Borden (1928) reported values of Pearson's r of between .15 and .20, depending on the definition of prior crim- inal conduct used. Several authors report values of MCR for items Le.g., Glaser, 1955 (.21 to .201; Babst, Inciardi, and laman, 1971 (.2211; others report univari- ate F-ratios, discriminant weights, or as- ymptotic t-ratios (e.g., Kirby, 1954; Palmer and Carlson, 1976; Brown, 1978; Schmidt and Witte, 19791; some report no indices at all (e.g., Hakeem, 1948~. In general, considering adult samples, the relation between prior record and future criminal activity, both measured variously, appears to be on the order of.2, whether assessed by the correlation coef- ficient, by a related contingency coeffi- cient, or via the MCR. The relation changes little whether only men are stud- ied (e.g., Borden, 1928; Tibbets, 1931; Kirby, 1954; Glaser, 1955; Babst, Inciardi, and Jaman, 1971) or if women are in- cluded in the sample (e.g., Brown, 1978; D. M. Gottfredson Wilkins, and Hoff- man, 1978; Carroll et al., 19821. Restrict- ing the sample to certain types of of- fenders, however, appears to reduce the effect. For example, Babst, Koval, and

ACCURACY OF PREDICTION MODELS Neithercutt (1972) studied a large na- tional sample of paroled burglars and ob- served MCRs relating prior record and parole outcome of from .08 to .14 (de- pending on the definition of prior record used). In a stucly of institutionalized nar- cotics addicts, Inciardi (1971) did not find prior criminal record to be among the salient predictors of parole outcome. In further support of the truism noted ear- lier, however, the variable "number of previous treatments for narcotics use" was fount! predictive. Prior record is similarly predictive of pro- bation outcomes (e.g., Monachesi, 1932; Caldwell, 1951; Simon, 1971~. For both probation and parole, such variables are founct predictive in American, British, and European (e.g., Shiedt, 1936; Trunck, 1937) samples and for youths (e.g., Mannheim and Wilkins, 1955) as well as for adults. Age. Information concerning offender age appears consistently to be related to parole outcomes, although there are con- trary examples. Age alone, usually mea- sured at or shortly before release, has variously been found positively related to outcome (studies fincling that older releasees more often are successful in- clude, as examples, Burgess, 1928; Kirby, 1954; Palmer ant! CarIson, 1976; Brown, 1978; Schmidt and Witte, 1979~; unre- lated to outcome (studies fincling no, or very little, relation include Borden, 1928; Void, 1931; Babst, Inciardi, and laman, 1971; Simon, 1971; Babst, Koval, and Neithercutt, 1972; S. D. Gottiredson and D. M. Gottfredson, 19794; and even neg- atively related to outcome (e.g., Tibbets, 1931~. When found to be positively re- lated with release outcome, the effect of age usually is small, although statistically significant in the studies cited. The zero- order correlation reported by Kirby is 0.08; the mean difference of about 25 months reporter! by Brown is associated with an F-ratio of 70.5 on 1, 638 clegrees of freedom (half that of the most powerful 247 zero-order predictor, which was offense type); in the multivariate moclel, how- ever, it emerged as the most salient pre- dictor. Age at release hac! by far the small- est coefficient in Schmidt and Witte's (1979) truncated log normal analysis, ant] one of the smallest in Palmer and CarI- son's (1976) study, which used the same memos. Studies that we have cIassifiecT as showing no relation actually do show small, nonsignificant, but positive coeffi- cients (.004 to about .06 to .081; the signif- icance of the single negative relation noted was not assessed, and inspection of the distribution shows it to be slight and inconsistent (Tibbets, 1931:371. To summarize, the evidence available seems to suggest that age, usually mea- sured at time of release, is positively associated win outcomes, but that the relation is slight, particularly when con- siderec3 in multivariate contexts. In the literature reviewed, its statistical signifi- cance often appears largely to be a func- tion of sample size. Babst, Koval, and Neithercutt (1972) found no zero-order effect for age, but clic] find that the inter- action of age with other variables (drug or alcohol abuse and criminal record was highly significant (although still only mar- ginally preclictive). Many studies have examined the age variable in relation to the onset of noticed (or official) criminal behavior, and here, the evidence is compelling: the earlier the onset of criminal activity, the poorer the prognosis. Kirby (1954) reports a correlation of .21 between age at first arrest and failure on parole; we calculate a contingency coefficient of.14 between age at first commitment and failure from data presented by D. M. Gottfrec3son, Cosgrove, et al. (19781; Mannheim and Unofficial delinquency proxies also have been used. For example, Glaser (1954) reports an MCR of .22 for the relation between the age at which the offender first left home for a period of at least 6 months and failure on release.

242 Wilkins (1955) report an adjustect contin- gency coefficient of.l9 between age at first finding of guilt ant] failure; Simon (1971) reports a ~ of .13; S. D. Gottfreclson en cl D. M. Gottirec3son (1979) report point-biserial correlations of .18 for age at first arrest, .17 for age at first conviction, and .18 for age at first commitment. Al- though not large, the effect is at least consistent (ancT is not remarkably smaller than zero-order effects cited above for criminal history variables). When exam- ined in multivariate contexts, the relation usually remains significant, although the unique contribution is small (S. D. Gott- frecison and D. M. Gottfredson, 19791. Marital Status. Marital status occa- sionally has been fount! predictive of pa- role outcomes; single offenders JO more poorly on follow-up (Burgess, 1928; Voicl, 1931; Kirby, 1954; S. D. GottEredson and D. M. GottEredson, 1979~. The zero-order relations are slight (the correlations are about .15, varying, of course, with the stucly), and usually, but not always, cTisap- pear in multivariate analyses (S. D. Gottfredson and D. M. Gottfredson, 1979; cf. Kirby, 1954; Palmer and CarIson, 1976~. Marital status is colinear with age variables (which are rather more power- ful) and with variables that assess release plans (e.g., planned living arrangement). Simon (1971) found no effect for marital status, but her sample was very young. In general, the unique contribution of mari- tal status appears modest in relation to the assessment of parole outcomes. Sex. Most studies reported in the lit- erature have been restricted to samples of males. Those that included both men and women (e.g., D. M. Gottfredson, Wilkins, and Hoffman, 1978; S. D. Gottiredson and D. M. Gottfredson, 1979; Schmidt and Witte, 1979; Carroll et al., 1982) either find or report no significant effect for sex. An exception is Brown (1978), who found CRIMINAL CAREERS AND CAREER CRIMINALS that sex remained statistically significant in a multiple cliscriminant function anal- ysis. The variable's unique contribution, however, is very slight (see p. 981. S. D. Gottfredson and D. M. GottEredson (1979) systematically studier! the effect of sex and found it to be negligible. In part, this likely is clue to the small number of women available for stucly, even when overall sample sizes are large. Race-Ethnicity. Although some of the earliest studies paid cletailed atten- tion to race or ethnicity (e.g., Tibbets, 1931, studier! the zero-orcler relations be- tween 20 racial en cl ethnic classifications en c! parole outcome), few later studies specifically report on or appear to have examined these variables. Either the vari- ables were not available for study (e.g., Brown, 1978), or investigators appear to have ignored them. It also may be that investigators simply have not reported finding no effect. Some (e.g., S. D. Gottfrecison ant] D. M. Gottfrecison, 1979) hac3 an expressed goal of developing op- erationally useful prediction tools anal, hence, excluder] the variable from con- sideration. (We consider the iTI-advised wisdom of this in a later section.) In one multivariate study (Schmidt and Witte, 1979), a zero-order race effect failed to reach significance when considered in combination with other factors; in others (Kassebaum, Ward, ant] WiTner, 1971; Palmer and CarIson, 1976) the effect (sub- stantially climinishecl) remained signifi- cant. Perhaps the best that may be said at this point is that race and ethnicity effects appear to have been unclerstudiec! in re- lation to parole outcomes.27 skin a descriptive parole-prediction study, Elion and Megargee (1979) found little evidence for the effect of race on parole decisions made, but more evidence for racial differences in the severity of sentences imposed.

ACCURACY OF PREDICTION MODELS Employment History. Employment history consistently is found predictive of parole outcomes (although there are ex- ceptions, e.g., Tibbets, 19311. The zero- order relations are modest (correlation coefficients of .21, .12, .17 to .14, .17, and .13 to .16 have been reported by Borden, 1928; Vold, 1931; Kirby, 1954; Simon, 1971; ant! S. D. Gottfrecison and D. M. Gottfredson, 1979, respectively; contin- gency coefficients of.25 to .22 and .12 were observed by Mannheim and Wilkins (1955) and by D. M. Gottfredson, Cosgrove, et al. (1978), respectively; and an MCR of.l7 was reporter! by Glaser, 19541. In general, variables that measure the stability of employment appear to be modestly more predictive than do other means of assessing employment history (Simon, 1971; S. D. Gottfredson and D. M. Gottfrecison, 19791. Employment history variables generally retain a unique contribution in multivariate anal- yses, but the effect is small. Occupational classifications may be somewhat more powerful (Palmer and CarIson, 19761. Offense. The nature of the commit- ment offense and, in some studies, the nature of the offencler's offense history consistently are predictive of parole out- comes: those who offend against property are poorer risks than are those who have offended against persons (VoIcI, 1931; Kirby, 1954; Mannheim and Wilkins, 1955; Babst, Inciardi, and [amen, 1971; Palmer ant] CarIson, 1976; Brown, 1978; D. M. Gottfredson, Wilkins, and Hoff- man, 1978; S. D. Gottfredson and D. M. Gottfrecison, 1979; Schmidt and Witte, 1979; Carroll et al., 1982; cf., however, Simon, 19711. Brown ~ 1978) systemati- cally studied a number of offense cIassifi- cation schemes, finding that a simple "person/property" dichotomy was about as efficient as any other. Such a measure is most commonly used, although some (e.g., D. M. Gottfreclson, Cosgrove, et al., 243 1978; S. D. Gottfrecison ant] D. M. Got~redson, 1979) have found specific combinations of property-type offenses to be predictive of parole outcomes. Zero- orcler relations typically observed are in the .15 to .25 range (cf., Mannheim and Wilkins, 1955; D. M. Gotttredson, Cosgrove, et al., 1978; S. D. Gottfredson and D. M. Gottiredson, 19791. When con- siclerec3 in multivariate models, offense type typically does make a unique, but small, contribution to explained variation in outcome (cf. Kirby, 1954; Brown, 1978; S. D. GottEredson and D. M. Gottfredson, 1979; Schmidt and Witte, 1979; Carroll et al., 1982~. Alcohol and Drugs. A history of prob- lematic alcohol use is correlated with pa- role outcomes (Void, 1931; Hakeem, 1948; OhTin, 1951; Mannheim and Wilkins, 1955; Glaser, 1964; D. M. Gottiredson and BalIarcI, 1965b; D. M. Gottfrec3son, 1967; Babst, Koval, and Neithercutt, 1972; Palmer and CarIson, 1976; Brown, 1978; S. D. GottEredson and D. M. GotiEre(lson, 1979; Schmidt and Witte, 1979), but the relation is slight. In multivariate models, variables indicative of alcohol use occasionally make small unique contributions (e.g., D. M. GottEre~lson, 1961; Palmer and CarIson, 1976; Brown, 19781; just as often, how- ever, they appear to share sufficient vari- ance with other (more highly predictive) variables that no multivariate eject is observed (Schmidt and Witte, 1979; S. D. Gottiredson and D. M. Gottiredson, 19791. The evidence about c3 rug abuse, partic- ularly of natural or synthetic opiates, is less mixecl. Most studies investigating the issue observe statistically significant, al- though mo(lest, zero-orcler ejects (e.g., VoIcI, 1931; D. M. Got~redson and Bonds, 1961; Babst, Inciarcli, and Jaman, 19711. In large samples of federal offen(l- ers (e.g., D. M. Gottfrecison, Cosgrove, et

244 al., 1978; S. D. Got~recison anct D. M. Got~redson, 1979), in extremely large samples baser] on the Uniform Parole Reports ciata base (e.g., Babst, Inciardi, and laman, 1971; Brown, 1978), and in a sizable Michigan sample (Palmer and CarIson, 1976) variables reflective of drug usage clo make a moclest unique contribu- tion; in one sample, however, (lrug usage did not remain significant when tested in a multivariate model (Schmidt and Witte, 1979). Education. Education (variously de- fined and stuctiecI, but most typically measured in terms of attainment) seems to be associates! with parole outcomes in the bivariate case (e.g., VoIcI, 1931; Kirby, 1954; Glaser, 1955; Babst, Inciarcli, and laman, 1971; D. M. Gottfredson, Wilkins, and Hoffman, 1978; S. D. Got~redson anct D. M. Got~redson, 19791.28 Multivariate models suggest that the contribution to explained variance macle by education is negligible (e.g., Kirby, 1954; S. D. Gottfredson and D. M. Gottfrectson, 1979~. Other Predictors. Dozens of other variables have been examined for associ- ation with parole outcome, ant] they usu- ally provide support for the null hypothe- sis. For listings of many of these, see Mannheim and Wilkins (1955), Simon (1971), or S. D. GottEredson and D. M. Gottfredson (19791. A few have shown sufficient promise to mention here, al- though they often are supported by few studies. A record of punishments (repri- mands, reports, misconduct citations, et cetera) received while incarcerated has proven prognostic on occasion (e.g., Borden, 1928; Tibbets, 1931; Vold, 1931; 28However, Simon (1971) observed no zero-order relation between education and outcome. A mea- sure of school conduct, however, was modestly correlated win recidivism. CRIMINAL CAREERS AND CAREER CRIMINALS Kirby, 1954; Mannheim and Wilkins, 1955; S. D. Gottiredson and D. M. Got~redson, 1979; Carroll et al., 19821. Zero-orcler relations are Tow to moderate (.03 to .23 range), but multivariate analy- ses suggest that the small contribution made is relatively unique. Whether the offender acted alone in the commitment offense or acted with accomplices has been found modestly predictive in some studies (e.g., Tibbets, 1931; Kirby, 1954~; association with criminal gangs appears moderately more predictive (Simon, 1971), and the latter remains predictive in a multiple regression framework. A vari- ety of"assessment scales" have proven predictive in some studies [e.g., Burgess's "social types"; see Burgess, 1928; Hakeem, 1948; OhTin, 1951; or Glaser's (1955, 1964) "social develop- ment pattern"] but have proven difficult for others to score reliably. COMMON CORRELATES Our review of descriptive and norma- tive decision studies across a variety of criminal justice system settings suggests that decision makers tend to rely with some regularity on a few common items of information regardless of the decision being made. Likewise, there is consider- able commonality among items found useful in nonnative prediction studies again, regardless of the decision for which the prediction is made. Finally, it appears that the descriptive and norma- tive studies seem to recommend different items of information as predictive. Table 1 provides a general summary of those variables found to predict the deci- sions of functionaries and those found to predict the behavior of offenders for a variety of criteria and across the decision arenas studied. Some caveats with respect to this sum- mary are in order. As discussed earlier, few of the studies we reviewed provided

ACCURACY OF PREDICTION MODELS TABLE 1 Common Correlates of Criminal fustice Decision Making 245 Decision Stage Criterion Salient Predictors Descriptive Studies Normative Studies Bail; pretrial Failure to appear release (FTA) for trial Cash bail Recidivism on pre- trial release Failure to appear or · 1 rec~a~v~sm Prosecution Charge Sentencing VariousC Charge reduction Prosecute fully or dismiss Conviction obtained Parole Time served Seriousness of charge, seriousness of prior charges, prior record, `` . . .. community ties Seriousness of charge, weapons charge, juvenile record, age, personal victim of crime?, "community ties,"a D.A. rec- ommendation,a defense attorney recommendations N.A. N.A. Witness and evidentiary factors, victim-offender relation, serious- ness of charge Seriousness of offense, type of of- fense, age, prior record Charge reductions, offense type, number of charges, pretrial de- tention status N.A. Seriousness of offense, prior record, pretrial status, counsel and representation, type of con- viction, various extralegal factors Seriousness of offense, maximum term set, subjective risk assess- ment, institutional behavior, pri- or record, age, sex, socioeconom- ic status, marital status, Offense type, prior record, `` . . .. . community ties, clrug use, prior FTAs, pending charges N.A. J Offense type, prior record, em- ployment, age, "community ties," weapons use, pending charges, prior FTAs Type of release,b court dispo - sition time,b offense type, age, pending charges, recent offense history, prior FTAs N.A. N.A. N.A. Offense type, evidentiary and witness factors, pretrial sta- tus, age N.A. N.A. juvenile record Parole/no parole Seriousness of offense, subjective Prior record, offense type, age, risk assessment, prior record, at- particularly "age at onset," tributions regarding offender and employment, marital status, offense, institutional behavior, al- alcohol-drug use, education, cohol history, age institutional behavior, crimi nal associates NOTE: The first two or three entries in each cell represent, in order, the most powerful predictors. Subsequent factors vary sufficiently from study to study to prohibit conclusions with respect to relative accuracy. aBased on a simulation study. bNot deemed useful for most practical applications of prediction tools. CThe most powerful predictors appear to be seriousness and prior record, regardless of the particular criterion used (e.g., sentence type, sentence length, measures of sentence "severity"). Accordingly, we have not differentiated criteria for purposes of this summary table.

246 sufficient detail to allow us the degree of specificity desired. Some studies pro- vided detailed information concerning bivariate relations, but no (or little) infor- mation concerning those relations in a multivariate context. When the latter was provided, often the former was not. Com- parable statistics are not reported for many studies, whether bivariate or multivariate in nature. We are not the first reviewers to make this lament (see Hagan and Bumiller, 1983, for a discussion of the cliff?iculties of cumulating information from a variety of studies) nor, we are certain, veils we be the last. With Hagan and Bumiller, we note that we intend no criticism of the authors whose reports we have re- viewed- indeed, on occasion we found ourselves among the worst culprits. We do believe that there remains promise in meta-analytic methods (Glass, 1976; Glass, McGaw, ant] Smith, 1981), despite well-recognized difficulties E.. B. F. Green and Hall, 1984), ant] had hoped to provide a "quantitative literature re- view." Unfortunately, we cannot. Entries in Table 1 are intended to rep- resent constructs, and it should be re- memberec! that these have been opera- tionally defined in many ways in the literature reviewed. This is true both for entries uncler the heeling "Salient Pre- dictors" and for those listed under "Cri . ,, person. The first two or three entries in each cell of the table represent, in order, the most powerful predictors of the relevant criterion. The power of variables repre- sented by subsequent entries varies suf- ficiently from study to study to prohibit conclusions with respect to relative accu- racy. We already have noted cTi~culties encountered in attempting to assess the predictive accuracy of items of informa- tion across (anct often within) studies. Accor(lingly, with the exception of the first two or three entries in each cell, we CRIMINAL CAREERS AND CAREER CRIMINaLS do not have conficlence in the relative ordering of predictive factors listed. These caveats made, the table rather clearly shows our original impression to have been more or less correct. Reading down columns in the table, items of pre- clictive information are remarkably con- sistent across decision settings (with He possible exception of the prosecutorial stage, at which evidentiary factors be- come important both to decisions made and to trial outcomes). This is true for both descriptive and normative studies. Reacling across rows, however, the de- scriptive and normative studies regularly tend to recommend that attention be paid to different items of information. This is particularly true with respect to informa- tion concerning the offense: decision makers tend to focus on seriousness (which generally is not predictive of be- havioral outcomes), while normative studies focus on offense type, which is predictive of behavioral outcomes. lIOW SUCCESSFUL ARE PREDICTION-BASED SELECTION RULES? The evidence just summarized sug- gests that with respect to the criteria in- vestigated, at any rate, criminal justice functionaries likely do not make optimal decisions. We have noted that the norma- tive studies also harclly may be said to be optimal, in that by far the largest propor- tion of criterion variance remains unex- plained. Still, we have iclentifiec3 a num- ber of factors that appear to have some predictive utility across a variety of set- tings, and it appears that decision makers do not pay heed to those factors. Rather, they appear to focus on items of informa- tion that demonstrably are not statistically relatecl to the behavioral outcomes of in- terest. Despite substantial base-rate prob- lems, most investigators have achieved normative prediction that exceeds the

ACCURACY OF PREDICTION MODELS chance rate and that, if implemented, shouIcI improve criminal justice decision making.29 In virtually every clecision-making sit- uation for which the issue has been stuc3- ied, it has been found that statistically developed predictive devices outperform human judgments (reviews are available in MeehI, 1954' 1965; Gough, 1962; Goldberg, 1965' 1968' 1970; Sawyer, 1966; Dawes and Corrigan, 1974; Dawes, 1979 This is one of the best-established facts in the decision-making literature, and to find otherwise in criminal justice settings would be surprising (at best) and suspicious or very likely wrong (at worst). Meeh! (1954) originally established the `` l ', r . . i. . rules tor ma ring comparisons 0 cl1n1- cal and statistical predictions, which re- ally are minimal. One rule is that both the clinical predictions and those of the sta- tistical moclel are to be made on the basis of the same information (for obviously, the statistical model is clisadvantagec3 if information is not to be made available to it). In fact, this "rule" may not even be necessary, since even when it is disre- garclec3, the models almost always are more vaTic] than clinical preclictions. Even "bootstrapping" studies, in which a statistical model of clinical assessments is constructed, show that the moclels clevel- opect- even though they are of the cleci- sion makers' judgments outperform the original judgments often by substantial amounts. The limitecl information available con 29It is important to remember the cautions of previous sections: implementation of prediction in- struments may conflict, wholly or in part, with other objectives of the decisions being discussed. Those objectives are multiple, often conflicting, and usu- ally poorly articulated. It is because prediction of "risk" (of failure to appear for trial, or of new offenses, or of parole or probation violations) is only one of the apparent objectives of decisions that the question of "improvement" of criminal justice deci- sion making is problematic in relation to prediction alone. 247 coming criminal justice settings would not, we think, disappoint those on the "statistical" side of this continuing (but unproductive argument. Already notes] were the studies by Glaser (1955, 1962), in which an actuarially derived crevice was shown superior to prognostic judg- ments macle by sociologists and psychia- trists relative to a parole-violation crite- rion, and those of D. M. Gottfredson (1961; D. M. Gottfrec3son and Beverly, 1962), in which a statistical combination of items proved substantially more accu- rate than judgments made by parole board members. Recently, Holland et al. (1983) found that a statistical composite consistently outperformed mental health professionals and correctional case work- ers in the prediction of reciclivism.30 Car- roll et al. (1982) found that parole board members' judgments of risk to be virtu- ally uncorrelatec3 with offender behav- ioral outcomes and that a simple statisti- cal morlel, although not powerful, outperformed the decision makers. The relative superiority of statistical to intuitive methods of prediction is clue to many factors. For example, human cleci- sion makers often do not use information reliably (e.g., Ennis and Litwack, 1974), they often do not consider base rates (Meeh! and, Rosen, 1955), and this has been specifically illustrates! in criminal justice (recision making (Carroll, 1977~; they may inappropriately weight items of information that are predictive, or they may assign weight to items that in fact are not predictive (as our review shows; see also Ebbesen and Konecni, 1981~; and they may be overly influencer! by causal attributions (e.g., Carroll, 1978a) or spuri- ous correlations (Monahan, 1981~. In fair- ness, it should be pointed out that there 30However, after a correction for range restriction was applied, the human judges did better than the instrument in identifying indices of violent recidi- vism.

248 may be advantages to intuitive judgments as well. For example, human decision makers can make use of information that cannot be macle available to a statistical device (at least readily). Demeanor clur- ing an interview may be one such exam- ple. Other factors in favor of intuitive judg- ments are reviewed in Dawes (1975~.3i Due in part to the demonstrable supe- riority of statistical prediction methods, a great deal of effort has been expended in attempts to provide criminal justice func- tionaries with tools to aid them in the decision-making process. We review sev- eral of these in the next section. APPLICATIONS OF PREDICTION IN STRUCTURING DISCRETION This section focuses on recent attempts to provide structure for a variety of discre- tionary criminal justice decisions. Our charge from the Panel on Research on Criminal Careers was to "review research findings on existing prediction-based rules for structuring criminal justice cleci- sions, with special attention to their ade- quacy in terms of predictive accuracy, efficiency, and validity, and to the relative contribution of individual predictor vari- ables to adequacy." Since the most com- monly used devices have been based on studies very similar (or identical) to those reviewed earlier in this paper, we can provide a simple response: (1) they are of low-to-moderate predictive accuracy; (2) they usually therefore are not very eff?~- cient (in a predictive sense), and they are at best modestly valid; (3) it commonly is observed that only a few variables, nota- bly those concerning offense type and offense history, make a substantial contri- bution to the prediction attained; and (4) 3,See also Cronbach and Gleser's (1957) discus- sion of the relative advantages and disadvantages of "narrow band" and "broad band" assessment pro- cedures. CRIMINAL CAREERS AND CAREER CRIMINALS this appears to be true regardless of the decision arena investigated. This "simple" response is unsatisfac- tory, however. The pane! also asked us to assess "the success of prediction-based rules in affecting the behaviors they are intended to affect (e.g., have prediction rules used in structuring parole decisions reduced the prevalence of failure on pa- role?~." This is not a simple question, although it is an obvious and important one. Had the parenthetical example not been incluclecI, our response simply would be: when properly implemented, apparently they can be successful. We will review the evidence for our assertion later; here, we wish to point out that in evaluating the efficacy of attempts to structure discretionary decision mak- ing in criminal justice settings, it is first necessary to examine the purposes un- clerlying the innovations. Criminal justice system functionaries typically make deci- sions relative to compound (and complex) coals. In the context of sentencing, for example, we noted that judges may seek to apply a criminal sanction for rehabili- tative, cleterrent, incapacitative, or desert purposes; often, they report seeking to satisfy more than one of those concerns at once (D. M. Gottfredson and Stecher, 1979~. Decision-making goals of paroling authorities also are complex, and vary widely among decision makers and across the country (O'Leary and Hall, no date). Although it is commonly perceived that paroling authorities have the minimiza- tion of recidivism risk as a principal goal, that simnlv is not the case. For example, the Maryland parole board has the stated purpose of ensuring just deserts (A. Hop- kins, personal communication, 19831; and the U.S. Parole Commission asserts three goals (relater! to accountability for the crime, institutional behavior, and risk of parole violation) (D. M. Gottfredson, Cosgrove, et al., 197~34. Thus, prediction is not a stated concern for the Maryland

ACCURACY OF PREDICTION MODELS board; and prediction is only one of sev- eral concerns for the federal board. Still, the concept of prediction gener- ally is central to the decisions macle in most ofthese settings. Accordingly, many of the attempts to provide structure for those decisions do have a predictive com- ponent. However, we are aware of no attempt to structure the decisions dis- cussec3 that involves only a predictive component. In practical application, deci- sion makers invariably seek not only to structure decisions with respect to pre- diction, but with respect to other goals as well (e.g., the satisfaction of just deserts). As we shall see, such a choice invariably constrains-often very seriously the predictive component of the tools clevel- opecl. Second, evaluating the "success" of any innovation requires that comparisons be made. lames Thurber, when once asked how his wife was, reportedly an- swered "Comparer] to what?" (Einhorn and Schact, 1975~. The needled compari- sons may be macle essentially in three ways: with respect to past practice; with respect to other innovations (inclucling, desirably, a"no innovation" conclition); and with respect to some ideal standard. Obviously, the criteria on which the com- parisons would be made must be stated, and, if the exercise is to have other than academic utility, those criteria must be relater! to the goals identifiecI for the in- novationfs) studied. In justice system settings, comparisons relative to an ideal standard are doomed to failure and thus are trivial. Debates concerning differing "icleal" stanclarcis and purposes for sentencing decisions (for example) are accelerating, as we have noted in a previous section. The ideal standard of one who advocates a just deserts perspective is radically different from that advocatecl by proponents of"se- lective incapacitation"; succinct reviews and summaries of these differences can 249 be found in a recent "debate" between Greenwood ant] von Hirsch (NJ] Reports, 19841. Similar arguments could be made for ideal standards based on other philos- ophies. Comparisons made relative to ideal standards of the type mentioned are not scientifically interesting; incleed, they essentially are not matters of science. Al- though science may inform the ethical ant] philosophical debate ant! although this debate is of obvious interest and importance, scientific comparisons of an innovation relative to an icleal will be- come important only when society even- tually comes to consensus on what that shall be. We c30 not think this likely for some time to come. Comparisons macle with past practice are of value, but that value is constrained by well-known limitations of simple pre- and post-test research designs (Campbell and Stanley, 1963; Cook ant] Campbell, 19791. In brief, a finding that the effects anticipated for the innovation are ob- servec] floes not, of course, mean that the innovation produced the effects. Without controls for many potential threats to va- liditY~ one cannot rule out the possibility that the effects result from something elsc{'ven something completely exog- enous to the innovation and the research setting. For the same reasons, a fincling that the effects anticipated for the innova- tion are not observed sloes not mean that the innovation produced no effect. Al- though one is used to thinking about alternative hypotheses (usually with a view toward cliscrecliting them) when ob- serving a presumed effect, one is not used to thinking about them when an effect is not observed. This, of course, is critical when the research design is a simple pre-post comparison. With the exception of the case study, the simple pre-post test is the weakest of all commonly used experimental designs. Ant! with one exception, it is the only kind of comparison ma(le to date concern

250 ing the utility of devices designed to structure discretionary decision making in criminal justice settings. The very first question that must be asked concerns whether the innovation in fact has been implemented. An influen- tial report recently concluded that in sev- eral jurisdictions studied, an attempt to provide decision makers with devices to assist in the structuring of sentencing de- cisions was unsuccessful, in that the de- vices were not, in fact, implemented (Rich et al., 19821. Unfortunately, the au- thors exceeded the bounds of common sense by reporting also that the innova- tion had no effect. An unimplemented innovation cannot be expected to have an effect; to observe otherwise would obvi- ously be spurious. Bail ant} Pretrial Release Prediction Based Tools Beginning in the early 1960s, numer- ous federal and state jurisdictions en- gaged in attempts to provide bait and pretrial-release decision makers system- atically with information relevant to the decisions to be made (Freed and WaId, 1964, describe several of these). The pio- neering and most widely known (and em- ulated) of these programs was the Vera Institute of Justice's Manhattan Bail Project, begun in the fall of 1961 and subsequently modeled by several other jurisdictions (Freed and WaId, 1964; M. R. Gottfredson, 1974; D. M. Gottfred- son, 19751. In this project a scale-clearly designed to be predictive of risk of failure to appear, but not empirically derived- was applied to defendants to determine release recommendations. The risk eval- uation was based on information concern- ing residential stability, family ties and contacts, employment history, and prior criminal record. An arbitrary weighting scheme was used, which resulted in a total "risk" score, according to which rec CRIMINAL CAREERS AND CAREER CRIMINALS ommendations were made concerning re- lease. Considerable success was claimed for this and related projects. For example, Freed and WaId (1964:62) report that "the Manhattan Bail Project and its progeny have demonstrated that a defendant with roots in the community is not likely to flee, irrespective of his lack of promi- nence or ability to pay a bondsman. To date, these projects have produced re- markable results, with vast numbers of releases, few defaulters and scarcely any commissions of crime by parolees in the interim between release and trial." Of course, the predictive utility of the scale is an empirical, rather than an experien- tial, question, and, when finally empiri- cally studied (over a decade after the implementation and widespread transfer of the innovation), it was demonstrated that, in all likelihood, the validity of the Vera scale had little, if anything, to do with the success claimed (M. R. Gottired- son, 19741. As already discussed, the base rate alone (when failure to appear is the criterion) could well provide the results and claims such as those made by Freed and WaId. In the M. R. Gottfredson study (described in a preceding section), Vera scale scores were found to account (at best) for 2 percent of the variance in either failure to appear or arrest rates. Further, considerable colinearity of indi- vidual Vera scale items was observed (e.g., between points assigned for family ties and for residence), which suggests that the weighting scheme intuitively de- veloped was highly inappropriate (on em- pirical grounds). The plan worked in the sense of starting a social movement; the scale, however, did not work in predict- ing failure to appear. As described earlier, in his Los Angeles study, M. R. Gottfredson (1974) at- tempted to construct normative predic- tive devices for both failure to appear and arrest criteria, with fair success. On vaTi

ACCURACY OF PREDICTION MODELS cation, however, the power ofthe cievices constructed reducer] approximately to the Tow level observed for the Vera scale. GolUkamp and GottErecison (1985) re- cently completed a study of guidelines for pretrial release and bait decisions that are based, in part, on an empirical assessment of risk. The general approach to guide- lines development Mat they followed was patterned after D. M. Gottfredson, Wilkins, and Hoffman (1978), and the empirical work on which the experimen- tal project was based is described in GolUkamp and Gottfredson (1981a,b). The study was essentially a policy ex- periment; it was not intended to provide an empirical test of the relative power of empirically derived prediction instru- ments and unguided or intuitive predic- tions. Three guideline models were de- veloped: a purely descriptive model, a p ure ly n o rmative ( actuarial ) m o de l, an d a mode] that attempted to combine the de- scriptive and normative approaches to guidelines development. Depending on the goals of the experiment, any of these could be comparer] with unguicled prac- tice; all such comparisons would be of considerable interest, but different re- sults, of course, would be expected. The descriptive model essentially provides judges with normec3 information concern- ing past practices and summarizes expe- rience concerning those factors thought most influential to past decisions. Be- cause it does not explicitly address risk of future behavior, the mocle] is not cle- signec! to be predictive (in the sense we have been using this term). One might anticipate, however, that the provision of this information would serve to constrain variability in subsequent decisions made, relative to those made in unguided prac- tice. A comparison of the normative mod- els with unguided practice would di- rectly address the question of relative accuracy; but that was not attempted in this experiment. Rather, it was the third 251 guiclelines model that which combined experiential and predictive concems- that was implemented ant] experimen- tally stuclied. The judges of the Philaclelphia Munic- ipal Court very directly were "partners" in the clevelopment, modification, and experimental stucly of the guidelines se- lected for implementation (for discus- sions of the importance of such "partner . ,, _ s alps, see ~ . ~ M. Gottfredson, Wilkins, ano nouman, 1978; Galegher and Car- roll, 19831. Without this partnership, it is highly unlikely that any guidelines mod- els could have been cleveloped, and it is a virtual certainty that the experimental study of these could not have been achieved. After reviewing the models, the judges chose the combined approach but also required modifications based on a series of policy-development meetings. The judges chose a guidelines model that simultaneously consiclered Me serious- ness of the charge (which, as described above, is not associated with subsequent risk, either of failure to appear or of pre- trial arrests, but is predictive of judges' decisions) and statistical risk. With re- spect to the latter, however, the judges chose a prediction model developed with respect to a combined criterion measure. That is, rather than separately consider- ing risk of failure to appear ant! risk of new offenses, they chose an outcome measure that combined both. As de- scribed earlier, different independent variables are associated with the two cri- teria, and the models developed concern- ing the combined outcome measure were less powerful than those predictive of a single criterion. In at least these two ways, the jllcige~' choice of models con- strainec! the likely predictive accuracy of the guidelines implemented seriousness of charge was to receive approximately equal weight as considerations of risk, and the prediction model chosen, based on an outcome measure that reflects two -

252 quite distinct prediction goals, was not optimal. Sixteen Municipal Court judges partic- ipated in the experiment; they were ran- domly assigned to treatment (use of the guidelines model) and control (no train- ing, no use of guidelines) groups. Cases, stratified by six charge-seriousness cate · 1 _ 1 _ ~ _ CRIMINAL CAREERS AND CAREER CRIMINALS may be predictive in nature (e.g., associ- ated with desired offender outcomes), but they may be of another nature (e.g., of ensuring just deserts or of increasing eq- uity). In neither of these examples is prediction (in the sense that we have been using the term) an issue. The concept of equity does suggest that gorles, were screened anct asslgnect to guidelines should reduce the disparate judges (20 per stratification level). Fol- treatment of similarly situated individu- Tow-up for all cases was achieved for a als, both within and across decision mak- 90-day period. The random assignment and stratification plan sought to ensure, and subsequent analyses demonstrated, that the treatment and control group cases ·] were similar. Goldkamp and Gottfredson (1985; see also D. M. Gottfredson, Cosgrove, et al., 1978; M. R. Gottfredson and D. M. Gottfredson, 1980a) suggest that four gen- eral concepts are of central importance in the implementation and evaluation of de- cision-making guidelines: visibility, ratio- naTity, equity, and effectiveness. These are related but may be treated separately for purposes of discussion and for con- struction oftestable hypotheses. Decision tools seek, among other things, to make explicit the goals, nature, and outcome of the decision-making process (see espe- cially D. M. Gottfredson, Hoffman, et al., 19751. As we described in the introduc- tion to this paper, this is of great impor- tance in criminal justice settings, where many of the decisions made are clearly predictive in nature, although this fact is not commonly recognized. Further, it is the "hidden" nature of decisions made, lack of explicit goals and policies, and a lack of information concerning the effec- tiveness of the decision process that re- sult (in part) in claims of unwarranted disparity and ineffectiveness and in ap- peals for reform (Kastenmeir and Eglit, 1973; Harris, 19751. The concept of rationality suggests that guidelines should assist in relating deci- sions made to the goals specified. These ers. To the extent that reductions in un- warranted disparity are achieved, equity may be said to be increased. Finally, the questions posed by the panel stressed that guidelines should in- crease the effectiveness of the decisions made. It must be remembered, however, that the question of effectiveness must be addressed relative to the goals sought by the designers of the innovation. Clearly, any of the three concepts briefly dis- cussed above- visibility, rationality, and equity may be evaluated relative to some effectiveness criterion. Goldkamp and Gottfredson (1985) primarily address the rationality and equity concerns. An important but often overlooked is- sue that must be addressed in any study purporting to evaluate the impact of guidelines (whether or not they use pre- diction methods) is whether the innova- tion was in fact used. The availability of coding sheets and a scoring grid does not ensure that decision makers understand or make use of the tools. Neither, of course, will simple debriefing sessions prove of much help in finding out if the tools are used. It is well known that ex- perimental subjects typically attempt to provide the investigator with the informa- tion sought. The question of compliance, particularly with a voluntary program, is a complex one. The problem of complexity is exacerbated in most guidelines appli- cations by the provision that decision makers may, at their discretion, apply a sanction or make a decision other than

ACCURACY OF PREDICTION MODELS that recommender! by the device (that is, a decision "outside the guidelines" may reflect compliance with the general model). Thus, simple monitoring is not very effective in addressing the compli- ance issue. GolUkamp and Gottfredson address the compliance issue in a straightforward way: in acIdition to train- ing sessions, monitoring, and debriefing, it is assumed that, if the guiclelines are found to be effective, compliance, at least to some degree, must have been achieved. This does not assume, of course, that compliance was complete, or that greater compliance might not have resulted in increased effectiveness, but the logic is straightforward. If the innova- tion is used, and if it "works," effective- ness may be demonstrated. If it is not used, it cannot be founct effective; if it does not "work," it cannot be found effec- tive even if used. The point is a simple one, but we stress it because prior at- tempts to evaluate other guidelines sys- tems appear not to have paid attention to the issue (e.g., Rich et al., 1982~. Experimental group judges in the Phil- adelphia study do appear to have used the guidelines: 76 percent of the cleci- sions made fell within the range sug- gested by the innovation; this varied from a "compliance" rate of 91 to 64 percent when individual judges were consicI- ered.32 Exceptions to the guidelines do not appear to have been random; they were less frequent in ROB and ROR/Iow- cash-baiT zones, ant! more frequent in higher cash-bait zones, than wouIc3 be 32Analyses and subsequent debriefing demon- strated that one experimental group judge com- pletely misconstrued the experiment and purpose- fully did not consult the guidelines until after his decision was made. Accordingly, these data were not considered further in the analyses reported. However, Goldkamp and Gotttredson (1985) report that analyses that include these data are little dif- ferent from those presented, and they over to pro- vide tables documenting this on request. 253 required by chance. Given that the guicle- lines studied were purposefully in large part descriptive, however, one would not expect, necessarily, that decisions made under the innovation would depart mark- edly from those made in the unguided! condition. When considered in the aggre- gate, this was found to be the case. Ap- proximately equal proportions of the sam- ples were Ire ate c! in similar manners by judges in the experimental and control groups. However, when cases judged by the control group were assigned, post hoc, to a "guidelines recommendation," only 57 percent of the decisions actually macle fell within the recommendation (as compared with 76 percent for the experi- mental group). Further, only 13 percent of the experimental group's decisions re- sultecl in more severe detention conse- quences; 29 percent of the unguided! de- cisions resulted in a consequence more severe than that that would have been recommendecl by the innovation. Devia- tions in the opposite direction were about equally likely to be made by either group (11 percent for the experimental group, 14 percent for the control group). The Philadelphia judges specifically sought the goal of increased equity in their decisions. This was addressed through two classifications of decisions; based on charge (the six stratification lev- els) and the 75-cell guidelines matrix (cocletermined by charge seriousness and risk and intenclecT, by the judges, as an operational definition of"similarly situ- atecl"~. If equity is increased through ap- plication of the innovation, the variability of decisions made should be reduced, for appropriate classifications of offenders, relative to decisions made in an unguided fashion. This was observed to be the case for both classifications considered (i.e., based on offense seriousness and on the guideline matrix). With respect to the former, variability in the amount of cash bait required was similar for treatment v - v

2S4 and control groups at Tower levels of se- riousness but was greatly different for higher ranges of offense seriousness. This difference (in interquartile ranges) was almost twofold for the penultimate seri- ousness category, and almost threefoIct for the most serious category. Similarly, it was in the cash-bait zone that reductions in interquartile ranges were observed when the guidelines matrix provided the offender classification. When variances (rather than interquartile ranges) were studied (for matrix cells having sufficient cases to permit the analysis significant reductions in the expected direction were observed for 80 percent of the cells; the overall (across celIs) effect for variance reduction also was significant. Gol~kamp and Gottfredson (1985:174) conclude that "in short, we can safely say that variabil- ity appears to have been systematically reduced under the guidelines or experi- mental bait format." A second goal of the Philadelphia judges directly involved prediction: they sought to increase the effectiveness of decision making relative both to failure to appear and pretrial arrests. If the guicle- lines "work" relative to these goals, "the bait decisions of the experimental judges should be more effective in result (FTAs, rearrests) than those of the control judges who decided bait in the normal fashion" (Gol~kamp and Gottfrecison:1761. We are less optimistic. Given the modifications noted earlier, concerning a choice of less- than-optimal prediction tools and the in- clusion, with equal weight, ofthe serious- ness dimension, we would be somewhat surprised to find effectiveness with re- spect to identification of FTAs and pre- trial arrests demonstratecl. (As will be described shortly, the seriousness climen- sion actually received greater weight than did the risk dimension.) Despite the demonstration that guide- line-structured decision making differed in important respects from unguided cle- cision making, GolUkamp ant! Gottfied CRIMINAL CAREERS AND CAREER CRIMINALS son founct lithe cli~erential effect (on the amount of bait set) for the influence of charge seriousness and the risk dimen- sion. Zero-orcler relations were similar for both groups, and resulting R2s differed little (but in the expected direction; that is, the influence of these factors was slightly greater for the experimental group's decisions). With respect to failure to appear and to arrests while on pretrial release, deci- sions macle under either condition appear equally effective. No advantage, with re- spect to either criterion, could be clemon- strated for guidelines-based versus unguiclect decisions. Did the guidelines "work"? With re- spect to an effectiveness criterion involv- ing equity, the answer appears to be yes. With respect to the predictive criterion, apparently the answer is no. Again, how- ever, we stress the design issues dis- cussed earlier and point out also that although the risk and seriousness dimen- sions that constitute the innovative matrix were intended to receive equal weight, they did not; the variance of the latter is considerably (three times) that of the former. Thus, in addition to problems associated with the pre(liction model cho- sen by the judges (developed nonoptim- ally, with respect to two goals at once), and partial reliance on a dimension known not to be associated with risk, seriousness received disproportionate weight in the guidelines gricI. It therefore is appropriate to note that, (respite these limitations, and in addition to achieving the goal of increased equity, the guide- lines-based decisions were no worse than unguided decisions relative to the risk considerations. Sentencing Decision Tools In an earlier section we noted that although descriptive studies of judicial decision making are common, normative studies are not. Indeed, since normative . ~.

ACCURACY OF PREDICTION MODELS prediction studies require the availability of a measurable criterion variable and since these are problematic in the sen- tencing area, it is not surprising that nor- mative studies of juclicial decisions are not available. As we have argued, norma- tive studies concerning the goals of inca- pacitation and rehabilitation wouIc3 ap- pear most likely to be potentially fruitful, but the undertaking and completion of such studies would be difficult incleed. We are not aware of any normative pre- diction study concerning judicial deci- sions, although we think that these should be concluctec3. There are, however, studies that have made claims for the potential utility of prediction devices for sentencing cleci- sions (e.g., Greenwood, 1982) and studies that attempt to provide some structure for sentencing decisions based in part on an assessment of risk (e.g., the various "guidelines" studies recently reviewed by Rich et al., 1982; Sparks, 1983; J. Cohen and Tonry, 19831. In this section we comment on these. Proposals for "Selective Incapacitation" The concept of selective incapacitation (Greenberg, 1975; Greenwood, 1982) has received wide attention in the public press (Newsweek, 1982; New York Times, 1982a,b; U.S. News and World Report, 1982) and in criminal justice policy de- bates if. Cohen, 1983a,b; NIT Reports, 1984; von Hirsch and Gottiredson, 19841. The concept provides a clear illustration of the relevance of the prediction of of- fenclers' future criminality to policy choices.33 It is useful to make a distinction be- tween collective and selective incapacita 33Although as we stressed in the introduction to this paper, prediction is central to any crime control strategy. Prediction of events is a requisite to their control. 255 tion strategies: the former wouIcl assign the same (or a very similar) sanction to all persons convicted of common offenses; the latter involves sentences based on predictions of future rates of offending if. Cohen, 1983a,b). Studies of collective in- capacitation effects are rare, and they re- port widely varying effects (ranging in estimated crime-recluction effects of from 1 to 25 percent, depencling on crime-rate assumptions and the crime types consid- erecI) (J. Cohen, 1983a: 121. When manda- tory terms are considerect, crime-recluc- tion estimates are somewhat larger, but impacts on prison populations appear un- acceptable given the modest impact on crime if. Cohen, 1983a:23, 301. Studies of selective incapacitation also are rare, en cl they also report varying impacts on crime (anc! on prison popula- tions) (Blumstein and I. Cohen, 1979; T. Cohen, 1983a; Greenwood, 19821. In gen- eral, these strategies are of two types: those that make use only of information concerning criminal history and current offense based on aggregate estimates (e.g., the T. Cohen and Blumstein ap- proach), and those that make use of a wider variety of predictive information measured at the indiviclual level (e.g., the Greenwood approach). The latter has been criticized on both ethical and em- pirical grounds (see, for example, J. Cohen, 1983a; von Hirsch and GottfrecI- son, 19841; the former requires estimates of average in~liviclual arrest and crime rates, as well as estimates of the average length of criminal careers. Although we c30 not address the ethical arguments in this paper, it should be noted that al- though the J. Cohen and Blumstein ap- proach meliorates some ethical concems, it still is incompatible with a strict just deserts position (since offender history is used). Either approach depends heavily on (1) predictive power and (2) the accu- racy of the other estimates macle. Our concern is with the former. Since our focus has been on individual-level pre

256 diction, we will concentrate on that ap- proach. It must be noted, however, that although the nature of the prediction problem is somewhat different in the Cohen approach, it involves prediction nonetheless (c£ I. Cohen, 1983a:73 A. Detailec! critical reviews of the report by Greenwood (1982) are available in I. Cohen (1983a), in von Hirsch and Gottfrec3son (1984), anct in Visher (this volume). Since these reviews contain ex- tended discussion of both empirical and ethical issues concerning that study, we focus specifically on the issue of accuracy. The analyses reported in Greenwood (1982) are retrospective only: no prospec- tive analyses were concluctecl. Thus, even if the instrument couIc3 be shown to have substantial retrospective predictive accu- racy, its utility for prospective application also would have to be shown before the scheme could be applied responsibly in practice. Moreover, the report essentially contains no consideration even of retro- spective accuracy. I. Cohen (1983b) en c! von Hirsch and Gottfrecison (1984) c30 provide such a consideration, with results on the accuracy issue that are clisappoint- ing. Although the scale is fairly accurate with respect to Tow-rate offenders (76 per- cent correct prediction for predicted! Tow- rate offenders), T. Cohen adds (1983b:48- 491: The scale's performance is more uniformly poor for high-rate offenders. Among those pre- dicted to be high-rate offenders, only 45 per- cent actually were high-rate offenders. This involves a false-positive rate of Who. For pur- poses of selective incapacitation, where pre- dicted high-rate offenders will be subject to longer prison terms than all other offenders, much better discrimination of the high-rate offenders would seem to be required. I. Cohen also compared the "accuracy" of the scale relative to current practice, as implied by sentence lengths given, and found that "the seven-point scale floes only marginally better overall and results CRIMINAL CAREERS AND CAREER CRIMINALS in slightly more false-positives than exist- ing subjective judgments in clistinguish- ing offenders by their crime commission rates" (p. 50~. Predictive accuracy as just considered involves the construction sample alone. Another criticism of the Greenwood study is that no validation was attempted. If this ever is done and if the typical result is observed, predictive accuracy in new samples will be even lower. Thus, in addition to the concerns already raised about prospective prediction and the lack of validation with respect to this issue, even retrospective vaTiciation on a sepa- rate sample was not attempted. Other criticisms could be made. For example, colinearity among predictor items was not investigated, nor was the weighting scheme designed in an optimal fashion. It must be notecl, however, that in practice, this has seemed to make little difference (S. D. Gottfreclson and D. M. Gottfreclson, 1979), and the items used are of the type generally observed to be predictive of future criminal behavior. In the retrospective construction sample, the device does appear of similar predic- tive power as commonly is observed. As noted, however, its accuracy in prospec- tive or cross-valicIation samples is not known. Sentencing Guictelines Sentencing guiclelines recently were considered in some detail by Rich et al. (1982), by GaTegher and Carroll (1983), and by the National Research Council (Blumstein et al., 1983~. Methoclological limitations concerning the (levelopment of sentencing guidelines (Rich et al., 1982; F. M. Fisher and Kaclane, 1983; Sparks, 1983), ethical concerns (F. M. Fisher and Kaclane, 1983), issues of im- plementation (Martin, 1983), and of ef- ficacy (J. Cohen and Tonry, 1983) have been cliscussecl. Elsewhere (M. R.

ACCURACY OF PREDICTION MODELS Gottfrecison and D. M. GottErecison, 1984), we have provided a "partisan re- view" of these critiques, and we invite attention to the issues we raise there. Here, we concentrate on the adequacy, in terms of predictive accuracy, of"prescrip- tive" sentencing guidelines. Distinctions have been made between sentencing guidelines that are intended to be "descriptive" and those intended to "prescribe" sentencing practices (D. M. Gottfredson, Cosgrove, et al., 1978; I. Cohen and Tonry, 19831. This ctifferenti- ation parallels an important organizing principle of this paper. Previously, we macle a distinction between predictive decision studies that are descriptive and those that are normative. The parallel, we believe, wouIc! equate the descriptive precliction studies and the descriptive guidelines approaches on the one hand, and the normative prediction studies and prescriptive guidelines approaches on the other. In practice, the distinction between de- scriptive and normative prediction stud- ies often becomes blurred, especially when the goal is to improve rational cle- cision making. Thus, for example, D. M. Gottfredson, Wilkins, and Hoffman (1978:10) stressed that "the research that undergirds the guidelines developed and the guiclelines themselves are essentially descriptive, not prescriptive; yet the very term Eguidelines] implies prescription." (The referent is parole guiclelines, but the statement applies equally to sentencing guiclelines.) Although the distinction may become blurred, it nonetheless is an im- portant one to bear in mind, for the con- sequences of emphasis on one or the other of the two approaches for issues such as that acIdressed by the Panel on Research on Criminal Careers will be very different. Some (e.g., F. M. Fisher and Kaciane, 1983) have criticized descriptive sentenc- ing guiclelines precisely because they are 257 intenclec3 to be descriptive of past prac- tice; others have criticized them because they are insufficiently (lescriptive of past practice (e.g., Rich et al., 1982; Sparks, 19831; and some have criticized them because they are insufficiently prescrip- tive Esee discussion by Sparks (1983: 238-239) concerning the widths of"pre- scribed normal rankest. The first criticism suggests that de- scriptive sentencing guiclelines are "unthoughtfully conservative" and re- duce to "a species of computer-ciriven conservatism" (F. M. Fisher and Kaciane, 1983:1921. Preferable, it is suggested, is a deduction of guiclelines from ethical prin- ciples. Finally, it is suggested that the empirical approach avoids hare] ethical questions but that the approach advo- cated would not. As F. M. Fisher and Kadane correctly point out, the empirical approach can at- tempt to tackle hard ethical questions, but this has not, to our knowlecige, been clone. Rather, guiclelines clevelopers have taken a much less sophisticated approach to the elimination of ethically question- able preclictors; as nicely illustrates] by F. M. Fisher and Kaclane, this may lead to misspecification of the descriptive pre- diction models developed, which leads to further ethical cTifficulties. Even follow- ing the approach recommended, it is clear that ethical decisions must be made in the specification problem. Descriptive guiclelines are conserva- tive, in the sense that dramatic changes in the nature of past practice are not ex- pectecl rather, the attempt is to improve on past practice by providing structure for future decisions. That structure, however, is based on models of past practice. J. Cohen and Tonry (1983:415) asserted that "descriptive/voluntary gui(lelines are likely to involve the smallest impact on sentencing. Since descriptive guiclelines recommend essentially no departure from current practice for the court as a

258 whole, only those judges who deviate widely from current practice are expecter] to change their sentences." If the guide- lines are voluntary, the extent of expected compliance from this deviant group may be questioned. As originally envisioned, however, the descriptive guidelines model proposes a routine feedback mech- anism that is intended specifically to al- low decision makers to change (probably incrementally, and it is to be hoped, for the better) the guidelines themselves anti, hence, the nature of the decisions made. Persons certainly may stiffer with respect to a preference for graclual im- provement or radical change; guidelines clevelopers appear to have preferred the more thoughtfully conservative approach or at least to have believed the approach taken to be preferable on pragmatic grounds. Radical proposals for change of- ten are rejected by those in authority. The suggestion by F. M. Fisher and Kaclane (1983) that a better mode! in- volves a deduction of guidelines purely from ethical consiclerat~ons is clebatable. Requisite to such clevelopment wouIc3 be some demonstrable societal consensus with respect to the variety of ethical con- cerns that invariably must arise in the exercise. Absent that consensus, the em- pirical approach holds considerable fur- ther promise. The second general line of criticism of descriptive sentencing guidelines is that such guiclelines are insufficiently and (more damaging imprecisely descriptive of past practice (Rich et al., 1982; Galegher and Carroll, 1983; Sparks, 19831. Although these reviews vary con- siderably in detail, common themes arise in each. These have to <lo with sampling issues, statistical modeling issues, and implementation issues. Also apparent is some misunderstanding ofthe ctistinction made here and elsewhere concerning the descriptive and prescriptive nature of de CRIMINAL CAREERS AND CAREER CRIMINALS cision studies. Each of the three general issues raised can have important conse- quences for the potential accuracy of pre iction moclels. The sampling issue, as raised in the reviews cited, is most important with re- spect to the appropriate unites) of analysis concerning which decisions should be modelecI. It has been demonstrated that systematic variation due to (unknown dif- ferences in) judges may be observed in sentencing (e.g., Rich et al., 1982) and in bail-setting (e.g., Gol~kamp ant! Gottfrecl- son, l981a,b) decision situations. The ev- idence in other areas is not clear: for example, D. M. Gottfrecison ant! Ballard (1966) found no differences associated with parole decision makers after control- ling for differences in cases seen. For some (recision-study purposes, the indi- viclual decision maker may be the appro- priate unit of analysis; for other purposes, it may not be. If one seeks to describe court behavior, rather than the behavior of indiviclual judges, decisions aggre- gatect across judges would seem to be preferable. It is the case, however, that, if substantial betweenjudge variability is liscoverecl, perhaps the analysis properly should be conducted on the incliviclual case data, residuaTizec! with respect to judge effects. To our knowlec~ge, this has not been clone. Whether substantially dif- ferent moclels would result remains an empirical question. It seems clear, how- ever, that moclels of incliviclual decision makers, if they are very (different from one another, would do little to constrain the disparity associated with court discretion now so widely criticized. Statistical models used in the descrip- tive modeling of sentencing practices also have been criticized. Important issues concerning potential misspecification re- sulting from insufficient attention to ethi- cal concerns aIreacly have been men- tionecl. The other principal criticism has

ACCURACY OF PREDICTION MODELS to JO with the use of stanciard multiple regression methods for decisions that are dichotomous. The criticism, which is cor- rect, is that reliance on the simple regres- sion mocle] may leac! to an inappropriate model of past practice: · 1 . ~ .1 11 regression weights anct the overall measure of fit (R2) are unstable (the latter may even exceed a value of 1.01. Other regression models (e.g., probit or tobit) are to be preferred but have not often been used. Two observations may be made. First, the models as applied in practice will be imprecise anyway, since (1) the weights usually are smoothed to simplify practical application ant] (2) the decision makers to whom a device may be recommended often rather arbitrarily change the weights in an attempt to reflect some policy concern. Second, the recom- mended regression procedures have been used in a number of studies (e.g., Palmer and CarIson, 1976; Solomon, 1976; Forst and Brosi, 1977; Rhodes, 1978; van Alstyne and Gottfrec3son, 1978; S. D. Gottfredson and D. M. Gottfredson, 1979, 1980; Schmidt and Witte, 1979; Gol~kamp and Gottfredson, 1981a,b, 1985), all but one of which predate the criticisms made (Rich et al., 1982; GaTegher and Carroll, 1983; Sparks, 1983~. The net result of these several studies is a demonstration that the results ofthe models are little different. The best available methods should, of course, be used, and the proper specification of past practice is to be desired. Given the poor quality of presently available data, how- ever, it appears that the power inherent in the moclels of choice often is not realized. IndeecI, if the data are sufficiently poor, it may be observed that less sophisticated methods can be preferable (Wainer, 1976; D. M. Gottfredson, Cosgrove, et al., 1978; S. D. Gottfredson and D. M. Gottfredson, 9791. In short, descriptive guidelines often 259 have not been developed using the best and most recent methods available. As a practical matter, however, it probably has not macle much difference, either to the specification of the models or to their accuracy. The third general criticism of descrip- tive guidelines is that they are insuffl- ciently prescriptive. In general, attention has focused on the widths of ranges of- ferec3 in the guidance schemes. Although we think it odd that the tools would be criticized for this reason, it is quite possi- ble, and potentially quite desirable, that the criticism be extended. If prescription with respect to predictive accuracy is de- sired, it is through normative decision study that practice should be altered. We can envision consiclerable advantage to a purely normatively based guidelines ap- proach, and we think that resulting accu- racy wouIcl be much improved. It must be remembered that in the de- scriptive case the issue of accuracy has to clo with the accuracy with which past prac- tice is mo(leled. If prescriptive accuracy is desirecl, normative decision study is de- sirecI. To our knowledge, no guiclelines have been developed in this fashion. I. Cohen ant] Tonry (1983) suggest that prescriptive guiclelines are exemplified by those developed and implemented in Minnesota. Neither (limension ofthe grid usecI, however, was intended to be pre- clictive; such an intent was explicitly ex- cluded by the Minnesota Sentencing Guidelines Commission (1982~. It is the case that one of the axes (the "criminal history score") bears a remarkable resem- blance to many instruments that are de- signed with a predictive intent; and items repeatedly found to be predictive, such as prior felony sentence, a prior felony-type juvenile record, and prior nontragic mis- demeanor or gross misdemeanor sen- tences, are used to construct this scale. One could, of course, assess the predic

260 five utility of the criminal-history score; but this hardly could be viewed as ger- mane to an evaluation of the Commission in achieving its goals.34 So far as we know, no such analysis has been done. It is notable (and, we believe, laudable) that the commission sought to ensure that its sentencing guidelines "be neutral with respect to the race, gentler, social, or economic statics of convicted felons" (Minnesota Sentencing Guidelines Com- mission, 1982:11. This admirable objec- tive, which may be shared by those who would include an explicit predictive in- tent, is difficult to achieve especially in view of correlations among offense or criminal-history (or other) items thought legitimate for inclusion and measures of race, gender, or socioeconomic status not desired to be bases for decisions. This point must be discussed further, along with the contribution of F. M. Fisher and Kadane (1983) already noted; suffice it to say here that this problem may remain whether or not there is a predictive in- tent. In summary, our charge was not to assess the impact of sentencing guicle- lines per se, be they descriptive, prescrip- tive, or some combination of these. As noted by Martin (1983), complex imple- mentation issues must be addressed if sentencing guidelines are to survive in evaluatable form. I. Cohen and Tonry (1983) did attempt such an evaulation, and others (Rich, Sutton, et al., 1982) also 34The relation between items of information ac- ceptable under a just desert orientation and those found predictive of future criminal behavior was discussed in D. M. Gotttredson, Cosgrove, et al., 1978:149: "So far as the major dimension of the proposed just-desert sentencing procedure is con- cerned, the prescription is very similar indeed to that of the United States Parole Commission. The Goodell Committee (von Hirsch, 1976) specifically rejected any predictive basis for their sentencing determination; but, of course, the fact that they wished to take into account the prior record of the offender, in fact, provided a predictive dimension." CRIMINAL CAREERS AND CAREER CRIMINALS make evaluative statements (although without first ensuring that some innova- tion had been seriously implemented. Currently, Abt Associates is engaged in an evaluation of voluntary guidelines in several states; but preliminary reports of this evaluation study could not be made available to us in time to be included in this review (D. Carrow, personal commu- nication, 1984~. In general, these evalua- lions likely will focus on issues of compli- ance and of disparity reduction; little in the way of achievement relative to a pre- cTictive component is likely to be assessed because, as we have suggestecl, little in the way of a predictive component is provicled by these guidelines attempts. Tools to Structure Parole Decision Making The "guiclelines" approaches de- scribed in the two preceding sections were first developed in parole clecision- making settings. The model used in the early studies is more similar to that used in the Philaclelphia bait experiment than to those discussed relative to sentencing decisions. Unlike the latter, the parole and bait guidelines do make use of an empirical assessment of risk. It is not our intent here to discuss in detail the development and implementa- tion of parole guidelines, nor to provide an assessment of their utility for the pur- poses originally intended for them. Rather, our focus is on one component of the guiclelines of the U.S. Parole Com- mission, the Salient Factor Score, since it is in regard to that score that an assess- ment of predictive accuracy can be ma(le. Because we were specifically requested to assess a parole-risk screening instru- ment recently developed in Iowa, that too is provicle(1. A complete description ofthe proposals for parole guidelines and their original development can be found in D. M. Gottfredson, Wilkins, and Hoffman

ACCURACY OF PREDICTION MODELS (1978) and in D. M. Gottfredson, Cos- grove, et al. (1978~. The Salient Factor Score Parole guiclelines were developed in the early 1970s for consideration by the U.S. Board of Parole (now the U.S. Parole Commission) and were first implemented by that body in 1972. They were formally adopted for national use in 1973. The guidelines are in part descriptively baser] and in part baser] on a normative predic- tion study. One axis of the decision-ma- king too! reflects the seriousness of the commitment offense; this was clevelopect in an iterative process of judgments by the responsible parole board members, which resulted in ordinal classifications on this dimension. The other axis is based on an empiri- cally derived assessment of recidivism risk. The instrument on which this axis is basecI is callecI the Salient Factor Score (Hoffman and Beck, 19741. This device was developed, as were the guidelines themselves, in collaboration with mem- bers of the parole board. Although other models of constructing normative predic- tive tools were presented to the commis- sion (e.g., the regression-based "base ex- pectancy" scales developed in CaTifomia; see D. M. Gottiredson ant! Beverly, 1962), the board preferred a simple, unweighted, additive model (similar to the approach originally advocated by Burgess and used for years by the Illinois parole board; this subsequently was moc3- ified and evaluated by OhTin, 19511. Ac- cordingly, this model was followed in the development of the Salient Factor Score. The original Salient Factor Score was developed on a 25 percent sample (N = 902) of all persons released from federal prisons by parole, mandatory release, or expiration of sentence during the first 6 months of 1970. Two validation samples were use(l: a different 25 percent sample . . 26] of persons released during the same time period (N = 919), and a 20 percent sam- ple of persons released cluring the latter half of 1970 (N = 6621. Sampling was conducted in a manner that allowed a reasonable assumption that randomness was approximated. More than 60 items of data concerning the offenders' criminal and social histories, demographic charac- teristics, living arrangements (past and anticipatecl), and prison conduct were coded from case records for each indivicI- ual; follow-up data (based on a 2-year period) were based on parole board rec- ords and on "rap sheets" macle available by the Federal Bureau of Investigation. A criterion measure was developecI that could be used regardless of an offender's type of release and that was acceptable to the parole board collaborators. An unfa- vorable outcome, for example, was con- sidered to have occurred if any of the following were observed: a new convic- tion that resulted in a sentence of 60 days or more; a return to prison for a technical violation of release conditions; or an out- standing warrant for absconding from su- pervision. Otherwise, the outcome was classified as favorable. Variables were selected for inclusion in the additive moclel based simply on the inspection of bivariate relations with the criterion measure clescribe(l. The selec- tion criteria used were: that the measure be significantly associated with the out- come (basecl on chi-squared tests with = .051; that the variable not pose ethical problems; and that it appear frequently enough to be useful for most cases, but not appear to overlap substantially with other variables to be included (D. M. Gottfrecison, Cosgrove, et al., 1978:48- 491. Using these criteria (some of which clearly involved subjective judgment on the part of the investigators), nine vari- ables were selected for inclusion in the model initially used. Each of these was dichotomize to reflect presence or ab

262 sence of Me attribute represented, except for two, which were trichotomizec3 These were prior convictions and prior incarcer- ations). The items used in the original Salient Factor Score model, and their relations with the criterion described (in the con- struction sample) are (l) prior convictions as an adult or a juvenile (.211; (2) prior incarcerations as an adult or juvenile (.23~; age at first adult or juvenile convic- tion (.14~; commitment for auto theft (.201; parole revocation or commitment for a new offense while on parole (.211; history of heroin, cocaine, or barbiturate depen- dence (.13~; completion of twelfth grade or receipt of general equivalency diploma (GED) (.081; verified employment (or fi~-time school attendance) for a total of at least 6 months cluring the last 2 years in the community (.121; and release plan to live win spouse or children (.16~.35 Thus, it may be seen both that the types of items considered are similar to those found pre- dictive in most settings and that the gen- eral level of predictive accuracy of these is on a par with that commonly observed. Two of the items referenced above were not originally examined in the form described (i.e., parole revocations and drug usage); these were modifiecI, based on consideration by the parole board, into the format we have describecI here. In the construction sample, the Salient Factor Score was observed to correlate significantly with the outcome criterion (point-biserial correlation = .32; MCR = .361; some shrinkage was noted when the device was applier} to the two validation samples (on the first sample, the point- biserial was .28 and MCR = .33; on the second, these values were .27 and .32, respectively). In operational use the device is col 35These are contingency coefficients calculated by us from data presented in D. M. GottfFedsc~n, Cosgrove, et al. ( 1978:5~51). CRIMINAL CAREERS AND CAREER CRIMINALS lapsed from a O to 11 scale to a 4-category scale. This, when combined with a 6- category seriousness of offense ranking, gives He guidelines matrix actually used. Since the adoption of the guidelines, the Salient Factor Score has been vaTi- dated on new samples a number of times (c£ Hoffman and Beck, 1976; Hoffman, Stone-Meirhoefer, and Beck, 1978) and recently has been revised in light of fur- ther ethical concerns (Hoffman, 19831. Each validation effort has provided re- sults substantially equivalent to the first such efforts; the device has held up well in prospective validation samples. The reconstruction effort and its validation (Hoffinan, 1983) show little change in performance. The level of predictive accuracy of the scale thus may be considered to be rather firmly established. But what of He addi- tional question raised by the Panel on Research on Criminal Careers: Has use of the instrument as a component of the decision guidelines led to a reduction in recidivism? We know of no study that has sought to test this hypothesis.36 And it seems clear such a study would be fraught with methodological difficulties that could only be overcome at best by a careful quasi-experimental or experimen- tal design of some sort. But it also may be asked why such a result would be expected. It is not known to us Hat the parole board claimed this as an objective. Nearly all inmates of all prisons eventually are released, and, most commonly, they are released on parole. Unless time served in prison reduces the probability of reoffending, a proposition not supported by the literature (see M. R. Gottiredson and D. M. Got~redson, 1980b, for a review), an effect on recidi 36There is one report Janus, 1984) that appears to show the potential for this, but it is not clear whether the sample used is of paroled persons or We general federal prison population.

ACCURACY OF PREDICTION MODELS vism rates would not be expected. It is plausible, however, that orate could expect (and speculate that it has been the intent of the parole board to achieve) a selective incapacitation effect. Assessments of such an effect, so far not published to our knowledge, must address the myriad problems noted in the recent National Research Council report on the topic (Blumstein et al., 1983~. Finally, we would note that here, as with the bait guidelines study described earlier, decision-maker preferences for the inclusion of competing goals in the guidelines device adopted may well con- strain the potential predictive utility of the moclel. In the parole guidelines adopted by the federal board, as in the bait guidelines acloptecT by the Philaclel- phia judges, a competitive tension exists between seriousness of the offense in cluclecT probably to satisfy a just clesert motivation- and the empirically derived risk assessment. The extent to which these effects constrain one another has not been adequately investigated to ciate. The Iowa Instrument In light of claims made for dramatic improvements in the accuracy with which offender risk assessments may be 263 planning and research documents macle available by the Panel on Research on Criminal Careers. No document available to us contained sufficient information concerning the development of the de vice to allow comment on the statistical models used.37 Similarly, we cannot com ment on the predictive value of specific items of information used. (We will, how ever, comment on the appropriateness of some of the items in a later section.) We first discuss the original scheme devel opect and then the more recent versions of this scheme. The risk-assessment system developed in Iowa appears to be based on an excel lent, and relatively untried, concept. It long has been stressed that sample heter ogeneity may constrain validities of pre Fictive devices. Correlation matrices for various subsamples often do not provide accurate estimates of the parameters for the larger sample; thus, the correlations providing the basis for the equations are inadequate for estimates made for the subsamples (D. M. GottfrecTson and Bal larc3, 1966; D. M. Gottfredson, 19671. This is particularly problematic given use of regression-based prediction methods that do not include interaction terms and is only partly meliorated by use of configu ral approaches or Tog linear moclels. It macle (Chi, 1983; Fischer, 1983, 1984), appears that those who developed the we were asked to pay special attention to - ~ the instrument clevelopec] and used in Iowa. Since the Bureau of Justice Statis tics has indicated interest in exploring the transferability of the device (Fischer, 1984) and some jurisdictions (e.g., Wash ington, D.C.) are engaged in this process, a critical review of the clevelopment and accuracy of the system was seen to be clesirable. To our knowledge, no information con- cerning the development, valiclity, or use of the instrument is available in the pub- lished literature; accordingly, in the re- view that follows we rely on unpublished system in Iowa approached the problem rather clirectly, in that the assignment to risk categories seems actually to be based on the application of several risk-assess- ment instruments. Cases first are cIassi- fied with respect to age (18, 19, 20, 21-24, 2~29, 30+~; within age classifications, other criteria are applied (e.g., prior ar- rests) to further subclivide the sample. In 370ne report (Statistical Analysis Center, 1983: 106) notes only that "new methods, such as configural analysis, were incorporated with well- established techniques to maximize predictive effi . ,, cogency.

264 all, 12 subsamples are developed (Statis- tical Analysis Center, 1980:96~. Depend- ing on subgroup membership, different combinations of one of seven "general" risk-assessment instruments and one of four "violence" risk-assessment instru- ments are applied to a given case. All cases are subject to a "supplementary" risk assessment (Statistical Analysis Cen- ter:l091; in combination, these crevices determine a "risk" category. An unde- finect and unexplained "smoothing func- tion" then is appliecI, which results in a final assignment to one of eight risk cate- gories.38 Finally, classification with re- spect to "violence" may be further re- fined (through classification with respect to current offense type), which results in classification to one of nine "violence" risk categories (Statistical Analysis Cen- ter: 1 131. The statistical adequacy of any ofthese several devices is not discussed in the available reports. If, as we may speculate, the devices have about the same validity as other such devices, in combination they may well be expected to clemon- strate considerably more power indeed, it is probably use of this bootstrapping technique that accounts for the improved vaTiclity noted for the final classification. To summarize, it appears that the final classification is based on a very good idea: devices are constructed for several more-homogeneous subgroups and the resulting classifications are combiner! in a final "expectancy table." It is important to note that persons may be classified into a given category based on very different combinations of predictor variables. We are, of course, concerned that the cIassifi- cation relies, in part, on certain items of information that many find objectionable 38Reports do suggest that the "smoothing func- tion" compensates for low-frequency cells; it may also adjust small reversals (the latter is our supposi- tion only). CRIMINAL CAREERS AND CAREER CRIMINALS (both on ethical and legal grounds; see Underwood, 1979; von Hirsch and GottFredson, 1984) for use in applications such as those proposed for this cIassifica- tion (Chi, 1983; Statistical Analysis Cen- ter, 1983; Fischer, no date). This concern is exacerbated when we are toIc3 that these "are among the best predictors" (Statistical Analysis Center, 1983: 16~. Exaggerated Claims, Improved Accu- racy, or Both? Several reports (e.g., Chi, 1983; Fischer, 1983; Fowler, 1983) aimed at the practitioner audience have hailed the "unprecedentecI accuracy" of the Iowa classification scheme. Chi (1983:8) reports that "values of the Mean Cost Rating (MCR = .637) and the Coefficient of Predictive Efficiency (CPE = .807) demonstrated in Iowa are much higher than for risk assessment crevices else- where." In addition to some probable increase in accuracy, a number of other factors combine to provide the basis for this remarkable claim. As we discuss be- low, both of the figures cited above are at best misTeacling; at worst, they are mean- ingless for the purposes intended. The source ofthe figures cited by Chi (1983) is Statistical Analysis Center (1980), which forms the basis for much of the discussion to follow. The classification scheme (lescribed above was clevelopecl on a construction sample of 4,704 adult offenders released from probation and parole in Iowa cluring the 3-year period 1974-1976. Time at risk varied (and averaged 11.7 months) but does not appear to have been controlled for in the analyses. Follow-up data in- clucled (1) information concerning up to three new criminal charges (if any), (2) type of release (clischarge, revocation, escape/absconcl), and (3) jai] time prior to release. The classification was vaTidatecl on a sample of 7,813 offenders released Luring 1977-1979 (time at risk is not specified for this sample).

ACCURACY OF PREDICTION MODELS An outcome measure designed to re- flect rearrest and the number and serious- ness of charges was developed (called a weighted outcome measure). The devel- opment of this measure is not detailed in reports available; we do not know if the scaling is arbitrary, but it appears to be (see Statistical Analysis Center, 1980:21. The index heavily weights felony of- fenses against persons and gives little weight to technical violations. The maxi- mum achievable score is 17 (15 points for three felonies against persons, plus 2 points for a revocation of probation or parole); the minimum is zero (for a dis- charge without new charges or jai] time for technical violations). The mean value of this index for the construction sam- ple is 1.18; it is 1.22 for the vaTiclation sample. It is with respect to this index that the classification scheme was developed (Sta- tistical Analysis Center, 1980:3~. Twenty- five variables were reported to have sig- nificant associations with the inclex: type of current offenseks), age, age at first ar- rest, prior arrests, juvenile convictions, juvenile commitments, prior adult con- victions, prior aclult jai] terms, prior aclult prison commitments, prior probations (JU- venile or adult), prior convictions Ouve- nile or adult), prior aclult incarcerations, prior incarcerations, prior jail terms/ juvenile commitments, prior jail/prison/ probation, known aliases, history of drug or alcohol problem, narcotics use, em- ployment status (most recent in commu- nity), possession of employable skills, possession of high school diploma/GED, years of school, legal marital status, pre- trial services or detention, and probation time in jail/resiclence. These items must be highly colinear, but the extent of this as a problem in the development of the classification scheme cannot be cleter- minec! since the nature of that clevelop- ment is not specified. (However, the scheme floes not appear to use weighted 265 variables, and so the issue probably is not terribly important.) The ordinal (perhaps interval) criterion measure should provide advantage in terms of predictive power (cf. S. D. GottErecison and Taylor, 19861; however, the criterion measure is not user! directly in evaluating the accuracy of the cIassifi- cation scheme. The rank-orcler (or other) correlation between levels of cIassifica- tion and the criterion (for both the con- struction ant] the validation samples) wouIc3 be of considerable interest, but it is not providecI. In fact, the potential power of the inclex is not used. Cases in each classification level are assigned the mean criterion value for that level, thus cliscarcting all within-group (or level) variance; only between-group variance remains to be assessed. Clearly, this pro- vi~les substantial advantage in demon- strating the "accuracy" of the crevice (in- cleed, since there are no reversals, the rank-orcler correlation will be 11. From here, the clevelopers clefine a new "out- come index" for each classification level as the mean index value for that level, divided by the mean value for the highest risk-cIassification level. The resulting proportion is changed to a percentage. This has the effect, of course, of making the transformed mean for the highest risk group equal to 100 percent;39 means for the other risk levels are a percentage of this "base group" mean. The authors cor- rectly noted that "this change of scale in no way alters the relative degree of suc- cess or failure of any of the risk catego- ries" (Statistical Analysis Center, 1980:5), but they apparently failed to recognize that the original problem remains; they 39This manipulation was made based on the con- struction sample, and the mean "weighted out- come" score for the highest risk group is used in transformations for the validation sample as well. Although this meliorates the variance reduction problem, it does not obviate it

266 have discarcled within-cel1 variability. In a traditional assessment strategy, the sim- ple correlation would be examined, or, if the clepenclent measure is a dichotomy (as it usually is), the proportion of success or failure is examined directly. In either case, within-cell variability remains. Here, all cases in the highest risk group are treated (in essence) as "failures" (whether they were or not), and the per- centages examined simply reflect cell means as a percentage of that of the base group. The authors acknowledged some of the difference between these two types of "percentages" in a brief and rather confused discussion (Statistical Analysis Center, 1980:8) involving "units" of suc- cess and failure and concluded that "with the preceding convention we can now talk about the predictive efficiency of the general risk assessment and the extent to which we fall short of perfect preclic- tion in terms of the distribution of our units of success and failure among the eight risk levels." Values of the MCR are calculated, on the percentages described above, to be .65 for the construction sam- ple, .639 for the validation sample, and .637 for the combiner] construction and validation samples. The authors also cal- culatec3 MCRs for a variety of hypotheti- cal base-rate conditions; here, a slight embarrassment occurs when the percent- age for the high-risk group rises above 100; however, a quick "clown-scaling" (Statistical Analysis Center, 1980: 14) han- c3les this. The MCRs for the construction, validation, and combined samples re- portec] by Hoffinan and Adelberg (1980) using the Salient Factor Score on federal samples are offered for comparison (these are, respectively, .33, .37, ant] .351. Thus it is asserted, that even under varying base- rate conditions, the advantage ofthe Iowa classification is clemonstratecI. However, by providing the Salient Factor Score (in the examples used) the "logical" act vantage provided the Iowa classification scheme, we calculate an CRIMINAL CAREERS AND CAREER CRIMINALS MCR of .711 (for the construction sam- ple). This is achieved simply by "rescal- ing" in the same manner as used in the Iowa studies; that is, each of the levels is considered simply as a proportion of the failures observer! in the highest risk group. To use the terminology of the authors of the Iowa report, this is a "lofty value" indeed. It also is essentially mean- ingless. The authors also developed and used a "coefficient of predictive efficiency," cle- fined as "the variance of the outcome indices (or rates of failure) of the risk levels dividecI by 2500, where the base (overall) index for the study group has been acljusted to 50%" (Statistical Analy- sis Center, 1980:15; see also Statistical Analysis Center, 1984~. This coefficient is used to describe the "accuracy" of the Iowa model in several reports; in some of the most recent reports available, only this coefficient is used (e.g., Statistical Analysis Center, 19841. Accordingly, a brief exploration of its properties is re- quirecI. In essence, this description and equa- lions given in Statistical Analysis Center (1984) provide a shorthanc] method for calculating the variance of the means of expectancy cell observations, when the distribution of means has been trans- forme(1 such that the grand mean is equal to 50. Once the variance has been found, its value is "unencodect" by dividing the coded-score variance by 2,500 (the square of the transformed base rate). The result, of course, is not the variance of the cell means. To obtain the true variance, one would clivide the coded-score variance by the square of the weighting factor used to create the transformation; very rarely would this be 50. Using the data provided in Statistical Analysis Center (1980:7), we calculated the variance of the distribution of cell means to be 566.18. When we uncoiled the distribution so that the grand mean was 50, we observed a cocled-score variance of 2,016.41. To obtain the

ACCURACY OF PREDICTION MODELS unencoclec3 variance, we divided this by the square of the factor used to transform the distribution (50/26.5) (since we are told that the "variance" is of interest), which, within hancI-calculator rounding error, of course gives the variance of the original scores (566.411. The Iowa investigators do not do this; they unencocle the cocled-score variance by dividing by the square of the trans- forrned base rate and obtain the value reported (.807~. Uncler the same condi- tions, the federal parole prediction method achieves a value of only .198. It is suggested that "for 'perfect' prediction in the Iowa sample, using a 50 percent out- come index, we would have fa value of CPE] = 1.00. Thus, using CPE as a measure of predictive efficiency, we can think of the Iowa system as roughly 81% of perfect, remembering- of course- that 'perfect' in this sense cloes not necessarily mean the ideal 0% - lOO~o prediction" (Statistical Analysis Center, 1980:16~. In a footnote, reaclers are advised that the "CPE can theoretically be greater than 1 if the net effect of prediction is greater than the ideal O - 100% result" (Statisti- cal Analysis Center, 1980:151. In fact, all that is necessary for the index to exceed 1 is that the variance of the coded scores exceed 2,500; this can occur for many reasons that only tangentially are related to the prediction problem. We see nothing of value in the coeffi- cient used to assess the "accuracy" of the Iowa model, but we see many reasons why it should not be used. First, it is a least-squares measure (of a peculiar sort); accordingly, it gives disproportionate weight to extreme scores. Although the developers appear to desire this,40 the use 40They report that "one difficulty in using MCR to measure predictive efficiency [is that] it doesn't reward the researcher for isolating extremely high risk groups that is, groups with performance at least twice as unfavorable as the overall sample performance" (Statistical Analysis Center, 1980:151. 267 of a least-squares measure of variability when the distribution is markedly skewed is not advised (Guilforct, 1965; Minium, 19701. Simple inspection shows that skew is marked for this ant] others of the Iowa samples. Recoil, we fail to un- derstand why a squared index term is useful. Usually, when one wishes to in- terpret an inclex of variability, one relies on the stanclard deviation (which, of course, may be interpreted in the original metric). Third, and related to the two concerns already raised, the inclex is not inclepenclent of scale value. In general, the larger the scores, the larger the value of the index. In comparing the Iowa and the federal models, for example, mark- edly different scale values are observed. The highest encoded score for the federal sample is 75.7; for the Iowa model it is 179.8. Sums of squares must be larger (all else equal) for the latter distribution. Thus, depencling on the outcome metric used, values of the CPE will vary. In general the index appears roughly to be nonsense for the purpose intended; in any event it is very different from the usual inclex of predictive efficiency as clescribe(1 earlier in this paper. Although as noted, that index is not problem free, it does at least have a clear, specific, ant! useful meaning; for the Iowa data we calculated it (in a manner to be clescribed below) to be about 13 to 19 percent (de- pencling on the criterion measure consid- erecl). The value for the Salient Factor Score (as clescribect in Hoffman and Adelberg, 1980) is about 11 percent. To summarize, the accuracy of the Iowa classification system as consiclered in Statistical Analysis Center (1980) and touted by Chi (1983) and others is wrong and exaggeratecl. Not only are the values of the MCR and the "coefficient of pre- dictive efficiency" reported based on the combined construction and validation samples, but the former is calculated rel- ative to an absurd criterion, and the latter, despite its familiar-sounding name, is es

268 sentially meaningless for the purpose in- tenclecl.4i How Accurate Is the Classification Scheme? Unfortunately, we cannot an- swer this question well, but we can pro- vide some clues. Statistical Analysis Cen- ter (1980:vii; see also Chi, 1983:5) provides a table that gives outcome clis- tributions relative to a revocation/ absconder criterion and to a rearrest cri- terion. Also given in the table is a "threat to public safety" criterion, which is the unfortunate criterion described above and the index of choice ofthe authors. We make the assumption in the discussion that follows that the rates (percentages) reported in the first two columns of the table have not been "acljusted" in the same or a similar manner as has the third column. If this assumption is warranted, MCRs can reaclily be calculatecI for these data. We have done this, and obtained values of .55 (for the absconder criterion) and .58 (for the rearrest criterion).42 These values are impressive and suggest that the classification scheme developed in Iowa does have substantial potential power. Unfortunately, however, the data are presented only for the combined con- struction and validation samples, and hence the values cited above likely are overestimates. In addition, there remains the problem of varying time at risk, which we do not believe the researchers ad- ctressecI. The values shrink only a bit (to .51 and .54) when a collapsed (three-category) version of the classification is consiclered. Consiclerec] as a selection device (in our use of this term), the classifications result in an index of predictive efficiency of 12.5 - 44iFor a description of an index that is conceptu- ally similar but that is not subject to these limita- tions, see John (1963~. 42In a later report (Fischer, 1981), we find these coefficients reported for a rearrest and a program- failure criterion, based on the sample of 12,517. CRIMINAL CAREERS AND CAREER CRIMINALS and 19.3 percent for the revocation/ absconder criterion and the rearrest crite- rion, respectively. Values of the RIOC index are 46.7 and 48.S percent. For com- parison, the index of predictive efficiency of the Salient Factor Score (using the data proviclecl by Hoffinan and Adelberg, 1980) is 11 percent, and the value of the RIOC index is 40.5 percent. Later Iterations: the 1983 and 1984 Models. Again, insufficient information concerning issues of sampling, measure- ment, and crevice construction is con- tained in available reports to allow us to provide detailecl comment. In general, it appears that modification to the scheme resulted from criticisms of the choice of predictor items (as raised above). Objec- tionable items of information appear not to be included in the newer crevices, and, as found by many others, predictive accu- racy does not appear to have suffered dramatically (S. D. Gottfrecison and D. M. Gottfredson, 19851. Rather than essen- tially repeat earlier discussion, let us raise some reservations that have not been re- solvec] (anc! in some cases are exacer- bated) by information concerning the newer models. First, we are concerned about potential Type I error problems in the develop- ment of the devices usecI. It is clear that a great many statistical tests have been used and a great many crevices con- structed on the same samples of cases. Since we clo not know how many tests have been used or devices developed, we cannot provide an assessment ofthe Type I error problem, but we can note that one ought to be sensitive to it. Consequences of this problem will, of course, be ob- served on validation; but we are not convinced that this has been achieved properly. We are also concerned about scaling and measurement issues, particularly with respect to the outcome criteria used.

ACCURACY OF PREDICTION MODELS As notes] above, the original outcome measure was weighted in an apparently arbitrary manner with respect to the seri- ousness of offenses allegecI. Although time at risk was not considered in the early version of the scheme, it does ap- pear to have been included in develop- ment of the outcome measure used to develop and assess the later versions (Sta- tistical Analysis Center, 19841. The seri- ousness issue also appears to be ac3- dressed in a manner different from that originally developed. Neither the treat- ment of time at risk (StolImack and Har- ris, 1974; Maltz and McCleary, 1977; Levy, 1978; Lloyd and foe, 1979; Maltz, McClearv, and Pollock, 1979; S. D. Gottfredson and Taylor, 1986) nor the measurement of offense seriousness (Thurstone, 1927; Sellin and Wolfgang, 1964; Rossi et al., 1974; S. D. Gottirectson and Goodman, 1983) is a trivial or easy matter. Each is fraught with consiclerable methodological and practical difficulties, not one of which appears to have been considered. For example, the seriousness measure used in the later reports appears highly arbitrary (incleed, simple multi- ples of an initial "weight" are applied based on statutory maximum penalties; see Statistical Analysis Center, 1984~; this results in some rather peculiar possibili- ties (e.g., an alcohol offense may receive the same score as a homicicle). (In fairness it should be noted that this is not likely to occur.) Given that the scheme remains heavily weighted toward felonies, distri- bution of the outcome measure is highly skewed. Not surprisingly, when the "CPE" is calculated on such measures, it is large. The MCRs, as calculated by us, are much lower (but still are larger than typically observed). We do not believe that the comparisons of the utility of the Iowa model and sev- eral others (e.g., INSLAW, Rand, Salient Factor Score, Michigan) offered in one report (Statistical Analysis Center, 1984) ~. . . . . 269 are of value. First, they appear to compare the efficiency of all models using the Iowa ciata, which provides an advantage to the model developer] on those data. Second, the outcome index used appears to be that also developer! in Iowa. Again, since the other devices were not con- structed relative to that peculiar criterion, they are disadvantaged. Third, the "CPE" is the only inclex macle available for comparative purposes; as described earlier, it is not meaningful for the pur- pose intended. In short, the comparisons provicled are inappropriate. We are concerned that the validation efforts described give insufficient infor- mation regarding sampling methods used. One report on recent validation of the moclel suggests that "the data collec- tion was limited to offenders for whom quality presentence investigations giving comprehensive criminal histories were available in inmate files" (Fischer, 1983: 181. We cannot determine if this is part of the sample reported later (indeed, one problem is that the "sample," with the exception of the large, early samples re- portect on in 1980, seems to keep chang- ing), nor do we know what other selection may have occurred. If the selection de- scribed above indeed occurrecl, it could well be expected to have serious biasing effects. At a minimum, if one is to have confidence in the model and in the vali- dations reported on, a great deal more information concerning the samples and their selection must be available. Finally, for all these reasons (and oth- ers; see S. D. Gottfredson and D. M. Got~redson, 1979, for a cliscussion), we would urge that, prior to applications in other jurisdictions, the methods be de- fined more explicitly, the sampling issues be clarified, the vaTiclation evidence be presented in conventional terms and with commonly used measures, and tests of validity in the jurisdictions of interest be performed.

270 Summary This section has considered a number of models designed for application in criminal justice decision making in the areas of bail ant] pretrial release, sentenc- ing, and parole. In the sentencing area, guidelines models have includes! a pre- dictive component, but these have been descriptive, rather than normative. The guidelines moclel as implemented in Minnesota explicitly was intenclec3 not to be predictive per se, but the offencler- history dimension undoubtedly is preclic- tive to some extent given the nature ofthe items used ant] their demonstrable rela- tion to recidivism in other jurisdictions. The extent of this relation in the Minne- sota application is not known. The Rand report Greenwood, 1982) discussed in this section diet involve nor- mative prediction study and is purported to have implications for sentencing policy and practice. Predictive accuracy in retro- spechve construction samples is modest at best; no information concerning accu- racy in cross-vaTiciation or prospective validation samples is available. Since to our knowledge no application of the model proposer! has been achieved, it is not known whether the device would `` 1 ', . . wore In practice. In the area of parole, we considered the fecleral guidelines model, particularly the predictive component of the model, the Salient Factor Score. The crevice was con- structed in a very simple manner, makes use of few items, and has rather low predictive power: it floes have about the same level of accuracy as is commonly observed for instruments of this type. Like the Ranc] instrument, it is con- structec3 of items of the nature most often found to be predictive of recidivism. Al- though predicative power is low, the same level of power is observed in several validation samples; the relation observer] CRIMINAL CAREERS AND CAREER CRIMINALS apparently is stable. In application, the crevice is simplified further by collapsing it into four categories of risk. These are combined, in a matrix format, with an offense-seriousness measure. We know of no evidence concerning the extent to which inclusion of the empirically cle- rivecI risk dimension in the guidelines model has lee] to a reduction in recicli vism. The crevice clevelopecI in Iowa seems basec] on a sounc] principle: it appears that normative prediction models for ho- mogeneous subgroups are combined to provide an overall expectancy table. Claims macle for the power of the various versions of the moclel appear to be wrong anc] exaggerated, but it cloes appear that the moclel may be a bit more powerful than others. Still, predictive accuracy can only be clescribecl as moclest, at best. Again, items used are similar to those discussec! earlier in this paper. Although claims have been macle for the utility of the model for decreasing recidivism among parolee! populations, reports avail- able to us clo not provide sufficient infor- mation to enable us to assess the adequacy of those claims. Certainly, caution is to be adviser! in considering the application of the Iowa model in other jurisdictions. In the area of bail anc] pretrial release, the Philadelphia experiment clescribec! does provide sounc] advice concerning the utility of an empirically clerived, risk- assessment crevice applier! in practice. The risk-assessment crevice was clevel- opecT using souncT methodological anc] statistical procedures, incluclec! com- monly usec! variables, anc! hac] modest predictive validity. In the guidelines ap- plication, it is simplified anc] combiner! in a matrix format with an assessment of the seriousness of the offense. No effect for the guidelines moclel was observer] with respect either to a failure-to-appear or a ~ . recic ivism criterion. -

ACCURACY OF PREDICTION MODELS DISCUSSION Summaries are rather like statistical av- erages: rarely clo they adequately de- scribe the nature of the original ciata, and variability is of course ignored. The anal- ogy could be carried forward cynically by noting that arithmetic averages often in fact take values (on the underlying clistri- butions) that cannot naturally occur. Still, statistical averages are useful for many purposes, and so, we hope, may be this summary. To highlight, the evidence re- viewec! in this paper suggests the follow- ~ng: · At present, researchers' ability to pre- dict the decisions of criminal justice sys- tem functionaries or the behavior of of- fenders can most politely be called "modest." Generally, descriptive deci- sion studies are more powerful than are normative decision studies; that is, we are better at predicting decisions macle in practice than we are at predicting of- fender (or other) outcomes of interest. When normative prediction studies are consiclered, the proportion of criterion variance explained rarely exceeds .15 to .20; it often is lower. Considerable room for improvement clearly remains. · Criminal justice decision makers ap- pear to rely with regularity on a few common items of information regardless of the decisions being macle. Likewise, there is consiclerable commonality among items found useful in normative predic- tion studies again, regardless of the de- cision-making arena ant! criterion vari- ables studied. An exception may be in the area of prosecution, where evidentiary factors appear important. · The descriptive and normative cleci- sion studies reviewed recommend rather different items of information as predic- tive. In particular, it may be noted that decision makers tend to focus heavily on 27] offense seriousness, which generally is not found to be predictive of behavioral outcomes, while the normative studies focus on offense type, which generally is found to be predictive of offender behav- ioral outcomes. · The best predictors of future criminal behavior appear to be measures of prior criminal behavior. Both the length of of- fenclers' records and the age at which involvement with the criminal justice sys- tem began appear to be consistent and important indices. · When decision-making aids that in- corporate an empirically basest predictive component are implemented in practice, there is little evidence that they reduce the prevalence of the criterion offender behaviors. It must be noted, however, that little empirical evidence concerning this important question is available. · It does appear that when properly implemente(l, decision-making tools that incorporate a predictive component can provide advances relative to an equity criterion. With respect to the goal of changing the behavior of functionaries, the crevices appear more successful. Do Prediction Models Improve Criminal Justice Decisions? As Cureton (1957) has shown, any valid continuous predictor can improve on the base rate, and, as we have observed, there appear to be several of these relative to the criteria consiclere(1 in this paper. Va- liclities are Tow, but equations and de- vices cliscussed CO provide advantage over base-rate prediction. As we also have shown, statistical prediction devices typ- ically outperform human judgments; what is true for other clecision-making situations appears also to be true for crim- inal justice settings. Why, then, does no predictive advantage appear to accrue from use ofthese devices?

272 First, we stress again that advantages relative to offender behavioral outcomes are only one sort of ad vantage that may be sought through use ofthe device. There is growing evidence that when properly im- plementec3 and evaluated, attempts to provide structure for criminal justice de- cisions c30 result in increased equity. Of- ten, this has been a principal goal for the introduction of the innovation. Second, it must be notes] that some of the moclels proposed for use clo not at- tempt to provide an empirically (lerive<1 normative risk assessment, even though they appear to. Third, we know of no device in opera- tional use that has not been constrained, perhaps severely, by policy consi(ler- ations. Decision makers often change the cocling of predictor or criterion informa- tion baser] on policy concerns. For exam- ple, the federal parole board chose to alter some predictor items, chose the cri- terion to be used based on policy con- cerns, and decided on weights to be ap- pliec! to some items (D. M. Gottfredson, Cosgrove, et al., 19781. Each of these considerations may constrain the utility of the crevice constructed. In the Philadel- phia experiment, the judges chose a cri- terion variable known not to be optimal for purposes of risk classification (GoIc~kamp and Gottfrec3 son, 1981a,b). In both these examples the decision makers clecicled to give more weight to a concern for offense seriousness than to a concern for risk. Since offense seriousness is at best inconsistently relatecl to risk of recid- ivism, this may have had important con- straining consequences. Thus, the statis- tical risk assessment invariably is only part of the "guidance" provided by the decision-making models, and often, it is the lesser part. It is appropriate that concerns other than risk be consiclerec] in criminal justice decision making. It must be recognized, however, that consideration of these may CRIMINAL CAREERS AND CAREER CRIMINALS work at cross-purposes relative to the risk . c Dimension. Can Predictive Accuracy Be Improved? If statistical prediction tools can pro- vide benefits to decision making in crim- inal justice system settings, we clearly must work to improve the accuracy of those tools. In this brief section we men- tion a variety of issues that, if addressed, may help to increase the validity of pre- cTictions in criminal justice. Improved Reliabilities The first effort, we believe, should be clevotect to a consideration of improving both the predictor and criterion variables usecl. The reliability of many criminal justice data sources is notoriously poor (see M. R. GottErec3son ant! D. M. Gott- fredson, 1980a, for an extended discus- sion of this issue). This often is recog- nizec3 with respect to predictor variables, but forgotten with respect to the criterion variables usecl; greater attention also must be paid to the reliability of criterion information. Hin(lelang, Hirschi, and Weis (1981) consider He accuracy of a variety of means of obtaining outcome data. Case-specific data often are needled, en cl these typically are found only in case files available through parole and proba- tion or correctional agencies. Although it has been observed that trained persons can Cole the data available in those files with respectable reliabilities (e.g., S. D. Gottfreclson and D. M. Gottfreclson, 1979), little is known about the reliability of those data in the first place. Comment- ing on Ohlin an(1 Duncan's (1949) com- parison of a number of prediction schemes, Vol(1 (1949:452) lamented: The most discouraging thing about the whole field of prediction in criminology is the con

ACCURACY OF PREDICTION MODELS tinned unreliability and general wor~less- ness of much ofthe so-called "information" in Me original records. Opinions, hearsay, and haphazardly recorded judgments still consti- tute the bulk of any parole file. Statistics made of this can be no better than the original data. From our experience, we can report that little appears to have changed in the past 35 years: these ciata must be regarded with considerable skepticism. (Actually, one thing has changed: apparently unre- liable information is readily available in computerized form in many jurisdictions. In point of fact, this may be undesirable, since investigators not familiar with the nature of this information may accept it uncritically.) Sparks (1983:244) has sug- gested that we seek to increase the reli- ability of information by collecting it pro- spectively, rather than by relying on case records. This is attractive but would prove very expensive. Improved Measurement Improved measurement of both precTic- tor and criterion variables is neeclecI. Var- iously considered, prior record consis- tently proves of predictive value. Generally, however, this has been opera- tionally defined in crude fashion. Im- proved scaling of this construct poten- tially could improve the accuracy of predictions baser! on it. Offense-serious- ness scales have been clevelopec3 but are not often used. We have experimented with seriousness scales considered as a criterion measure with demonstrable suc- cess (S. D. Gott~frecison and Taylor, 19861. Similarly, perhaps we should seek to pre- dict criteria of interest other than recicti- vism, considered as a dichotomy. For some purposes, the prediction of"time to failure" may prove advantageous (for il- lustration, see Schmidt and Witte. 1979: S. D. Gottfrectson and Taylor, 19861. Fi- nally, multiple criteria of failure should be explored. 273 Use of the Most A ppropriate Analytic Methods As we have noted, many prediction studies have not capitalizer! on the poten- tial power of sophisticated analytic meth- ods, and some studies may in fact be subject to specification error resulting from inappropriate use of simple regres- sion methods. When more appropriate methods are available, they should be userl. However, little advantage is likely to result unless the measurement and reliability issues just raised are resolved; several studies cited earlier attest to this fact. If the measurement and reliabilities of both predictor and criterion variables are improvecl, the power of more sophis- ticated methods could well be reaTizecl. Statistical Bootstrapping As described earlier, moclels such as that apparently developed in Iowa poten- tially couIc3 do much to increase the util- ity of prediction in criminal justice set- tings. The basic procedure simply wouIct require the identification of relatively ho- mogeneous subgroups of offenders, the construction of statistical prediction equa- tions for each, and the combination of these into an "expectancy table" for the full sample. Although not a new idea, it is a good one, and one that potentially holds considerable promise. Theory-Driven A p preaches to the Prediction Problem Sparks (1983) correctly noted that the- oretical considerations couIcl be of sub- stantial benefit to those working in the area of prediction but offered little in the way of advice concerning directions such theories might take. Generally, it appears that criminal justice prediction research has been rather atheoretical, although it seems to have been of some value in

274 theory construction. Recently, Monahan (1981; Monahan and Klassen, 1982) has proposer! ways in which situational ap- proaches may air! in the prediction prob- lem. This clearly represents a theory- driven approach to increasing predictive accuracy (and understancling of the phe- nomena investigated. S. D. Gottfrecison and Taylor (1986), following the person- environment integrity model of Olweus (1977), recently have demonstrated that recidivism predictions can be improved if person-environment interactions are in- cluclec3 in the models developed. Further, the magnitude and nature of the effects observed varied depending on the crite- rion variable used and on the nature of the offender and environmental variables considerecl. Statistical-Subjective Bootstrapping We would argue that, just as decision makers may learn from statistically based information, the actuary may learn from the human decision maker. We already have noted that models of subjective cle- cisions can have more predictive accu- racy than the subjective decisions alone (e.g., Goldberg, 1970), and recent evi- dence suggests that subjective judgments may be more accurate than actuarial de- vices for some limited but important pur- poses (Hollanc! et al., 19831. In general, this has become known as the "clinical versus statistical" problem, and debate concerning the relative value of the two general approaches continues. We be- lieve this debate to be counterproductive. Although we tend to come down on the "statistical" side ofthe argument, we also agree with Horst (1941), DeGroot (1960), D. M. Gottiredson (1967), Unclerwooc] (1979), ant! Monahan (1981) that pre(lic- tion may be improved through a com- bined use of methods. An iterative bootstrapping process in which succes- sive normative anct descriptive devices CRIMINAL CAREERS AND CA0ER CRIMINALS are used to inform and modify each other may well prove productive. Attention to Ethical Concerns Finally, it is clear that investigators must pay more sophisticate<] attention to ethical considerations involvecl in the construction of prediction crevices in- tended for operational use (F. M. Fisher ant] Kaclane, 19831. Ethical concerns can be adciressecT within complex statistical models (although ethical choices always must be made), but this has not often been done adequately. Comparisons of models constructed via an approach that attempts to suppress unwarranted effects and models constructed in a simpler fash- ion would be of interest. Is Prediction Currently Accurate Enough to Be Useful? The prediction literature that we have reviewer] leacis inescapably to the con- clusion notes! above: predictive accuracy is rather Tow. Devices used to structure criminal justice decisions appear to have little impact on offender behavioral out- comes, even when an empirically clerivec3 prediction instrument is part of the de- vice use(l. (We aIreacly have noted sev- eral reasons why this may be so and have attempted to identify some ways in which weaknesses of currently available predic- tion studies may be improved and valid- ities increased.) Yet, prediction tools are being used in criminal justice settings, and calls for their use are increasing. There is no es- caping the question, then, of whether prediction currently is accurate enough to justify its use in practice. (This section concentrates on the selection issue only. Prediction methods clearly are accurate enough to be useful for purposes of con- clucting quasi experiments and program evaluations and for other applications. )

ACCURACY OF PREDICTION MODELS There are those who argue against the use of prediction, whether statistically or subjectively basecI, on ethical grounds alone. A strict just desert argument, for example, would suggest that prediction properly is irrelevant to decisions made concerning criminal offenders the ensur- ing of deserved punishment and resulting demonstrable equity are the clesired ideals. (These too will be difficult to achieve, even if desirecI. Many complex issues of measurement remain before the goal of ensuring desert aclequately could be met.) No statistical or pragmatic argu- ment is likely to sway these critics, for those arguments would be seen as funcIa- mentally irrelevant. Philosophical or eth- ical arguments may be persuasive, but it is not our intent to attempt them here. Only the strict desert orientation, how- ever, rejects the concept of prediction as important to decisions made concerning Fencers. Discussion here is directecI to those who will, at least, allow the argu- ment. Other arguments against the use of sta- tistically based prediction tools all recluce to considerations of their accuracy. The technically sophisticated arguments focus directly on the accuracy issue and cite Tow proportions of variance explainecI and resulting high error rates (focusing usually on false positives; false negatives may be equally, or even more, un(lesir- able depending on the application). Oth- ers cite potential, or even clemonstrable, misspecification of prediction mociels. Less technically sophisticated critics con- tinue to complain of"reducing people to numbers" and observe that human be- havior is too complex to allow judgmental decisions to be made on the basis of an "equation." This, too, essentially is a complaint concerning accuracy. In an earlier section concerning the evaluation of innovations, we noted the need for comparative study. The point must be macle here as well: accuracy 275 must be assessed relative to something. The most obvious comparison is with an ideal stanclarc3. Whatever that standard might be, it clearly is desirable that as few errors as possible be made in decision making. Unless prediction is perfect, however, errors will be made. Whether statistical or subjective, prediction falls short of an ideal standard. Decisions will be macle in criminal justice settings with or without the aid of statistical prediction tools. Those who make the decisions the parole board members, the judges, the prosecutors, and others typically receive no training with respect to the difficult decision prob- lems confronting them. We have men- tioned a variety of factors that combine to decrease the validity of subjective predic- tive judgments, and Monahan (1981) re- views several more. The literature very strongly suggests that in comparison even with trained decision makers, statistical tools are more accurate. On simply an accuracy consideration, their use wouIc! seem to be preferred. Einhorn and Schact (1975) have shown that the correlation between clinical judgments and any cri- terion is likely to be low to moderate uncler a wicle variety of conditions, and that the only way to better the selection problem without tracling off among false positives and false negatives is to in- crease that correlation. As we have ar- gued, statistical methods can help do this. Part of the answer to the question of whether statistical prediction tools are accurate enough to justify their use de- pends, we think, on the use to which it is proposed the too! be put. Summarizing a · r cow . . ~' recent review or career criminal re- search, which to date is meager, Petersilia (1980:322) notes! that "the data accumu- latecI to date on criminal careers do not permit us, with acceptable confidence, to identify career criminals prospectively or to predict the crime reduction effects of alternative sentencing proposals." Simi

276 larly, J. Cohen (1983b:49) noted with re- spect to the Rand study that "for purposes of selective incapacitation, where pre- dicted high-rate offenders will be subject to longer prison terms than all other of- fenders, much better discrimination of the high-rate offenders would seem to be required." We agree: proposals for dra- matic change in sentencing and incarcer- ation policies based on individual-lever prediction studies are at best premature. Prediction of such Tow validity as demon- strated here cannot, we think, justify the policy changes proposed. We do, however, think that prediction tools of comparable validity can be used appropriately for other purposes, and we will try to explicate this position below. We have attempted in this paper to con- centrate on the question of accuracy. In so doing we intentionally have not ad- dressed ethical questions in detail. There is no avoiding those questions entirely, however, and some will be raised in the following discussion. We describe con- cerns about the two types of errors to be made in any selection or prediction prob- lem, and we focus on ethical consider- ations involved in the type of policy c: o > 6 I UJ y m c 6 Cow CRIMINAL CAREERS AND CAREER CRIMINALS changes to be made by the proposed use of prediction tools. Figure 2 summarizes an imaginary se- lection-decision problem that is based on prediction. For purposes of explication, we assume that both the criterion (Y) and the measurements on which selection will be based (X) are measured continu- ously. In the figure they are represented in standardized form. The correlation im- plied by the elipse drawn is moderate (but any positive correlation, save unity, would suffice). Let Xc represent the cut- ting score, and Yc the criterion cutoff, that is, that point on the criterion distribution at or above which we assume the case a "failure" and below which we assume it a "success." At or above Xc, we predict failure and select accordingly; below Xc we predict success. In Figure 2, Xc and Yc are set at the means of the distributions. For any value of r, positive and negative hits are equal, as are false positives and negatives (as- suming a normal bivariate surface). In fact, of course, rarely does the practical situation seem to be as depicted in this figure. Usually one does not select based on the mean score, nor does one observe False Negatives Positive / | Hits Negative Hits False Positives xc PREDICTED BEHAVIOR FIGURE 2 Hypothetical prediction-based selection decision problem.

ACCURACY OF PREDICTION MODELS A: O Y > I UJ m False Negatives Negat ive in - ,( I l l rain False ~Positives / Positive H its xc PREDICTED BEHAVIOR FIGURE 3 Hypothetical selective incapacitation scheme. base rates equal to .50 (as represented on the ordinate). The symmetry observed in Figure 2 would not hoist if one increased or decreased Xc from the mean (imagine Xc moving to either the right or the left along the abscissa). Neither would it hold if one increased or decreased Yc Consider Figure 3 in light of a "selec- tive incapacitation" proposal. The distri- bution shown is assumed to be of of- fenclers to be sentenced either to incarceration or to longer than usual terms of incarceration, based on pre- dicted future criminality. The proposal argues for a change in sentencing poli- cies: persons are to be incarcerated (or incarcerated for longer terms) based on the predicted risk of repeated (high-rate) offending. Accordingly, it would seem that the cutting score probably would lie above the mean of the "risk" distribution (or else one is not selecting the high-risk cases) and that the criterion "cutting score would lie above the mean ot the distribution representing subsequent criminal behavior (or else one would be `` . . . . .. se echve y ~ncapac~tahng average or below-average offenders). 277 Figure 3 is basecl on these assump- tions: as shown, false positives are re- duced at the expense of false negatives. Either may be clecreasecT, but always at the expense of the other; one has only to change the selection ratio. (We assume that the cutting score represents a "stan- dard." The standard! could, of course, be changed; this too coup! have conse- quences for the ratio of false positives and negatives.) Neither error is desirable. False posi- tives are not to be desired on ethical grounds (that is, persons are falsely im- prisoned or falsely imprisoned for a longer term because of inaccuracy of pre- cliction). False negatives also are not de- sired (because of inaccurate prediction, persons who pose a risk to society are not incarcerated or not incarcerated for longer terms). Which error is more impor- tant is a question that society has neither sufficiently a(ldressecl nor answered, and it may well be that the costs of the two types of error are not equal. Moreover, concern about each type of error may be expressed on different ethical grounds. Consider next Figure 4. Here, the pop

278 O I UJ m Y ~c c: Pa Ise Negatives 1' CRIMINAL CAREERS AND CA^ER CRIMINALS P/ositive / Hits / Negative Hits ~Positives ~.. .. xc PREDICTED BEHAVIOR FIGURE 4 Hypothetical "emergency release" scheme. ulation of interest has changed. In Figure 3 the distribution shown was of persons about whom an incarceration decision is to be macie. In Figure 4 the distribution is of persons already incarcerated under present sentencing policy (whatever that is). We assume that in incarcerating these persons the sentencing judges held a va- riety of goals for the decisions made. Suppose that one is forced to decrease that population for some reason. Perhaps one wing of the prison burned clown or the courts have ordered population re- ductions clue to prison crowding or per- haps it simply has been decided that it costs too much money to imprison this many people. Selection criteria that might be consiclered in decisions about whom to release could be risk of recicti- vism or of high rates of offending. (Other criteria could of course be used. For ex- ample, one might choose to release those "least cleserving" of punishment.) Here, the selection criterion lies below the mean of X (Xc less than mean X); that is, one wishes to select those inmates who appear to present the least risk of repeat (or repeated offending. Since one seeks to identify Me best risks, the cutting score for the criterion variable also likely would lie below the mean. lust as before, one can manipulate tile trade-off of false pos- itives and false negatives by moving Xc to the left or the right. For a given Yc' the value of Xc chosen will determine whether more false positive or false neg- ative errors will be made. The ethical consequences of errors made in the two scenarios are different. In the selective incapacitation scenario, the effect of a false positive is to deny liberty (or to (leny it for a longer time) based on faulty prediction at the sentenc- ing stage. Although some (e.g., Gordon, 1977) have argued that this is acceptable, the argument requires justification based on a desert, rather than an incapacitative, principle. That is, it is argued that false positives, although perhaps not deserving of additional punishrrlent based on actual risk, typically are so deserving based on desert principles. Since Me predictions and resulting errors are based largely on past criminal conduct, the argument is that the false positive legitimately may be treated more harshly because of that past conduct. Extended confinement of false positives cannot, however, be justified on

ACCURACY OF PREDICTION MODELS 279 prediction-based utilitarian grounds. On tially with respect to policy changes pro- these premises, it must tee seen as unde- posed and the consequences of those sirable. Note also that the selective Inca- changes. Selective incapacitation sug- pacitation concept apparently seeks to gests clearly that there is a proper pur- minimize false negatives (that is, failure pose for the sentencing of criminal of- to select those who in fact pose a substan- fenclers: removing them from normal tial risk of continued criminal behavior). society, thereby preventing them from Unless predictive accuracy can be in- engaging in normal criminal activity. An extreme position would suggest that this is the only proper purpose for the sen- tencing decision.43 The suggestion, then, is for a radical change in sentencing and imprisonment policy, and this proposal is based in large part on claims macle for the accuracy of prediction. The second scenario, which we have elsewhere caller! "selective deinstitu- tionalization" (S. D. Gottiredson, 1984) makes no such presumption. Indeed, sen- tencing decision policy is not directly affected through adoption of the scheme. Consequences relative to decisions macle, of course, would result. Funda- mentally, however, the scheme presumes that all purposes for sentencing currently practiced are equally valid. The scheme does propose that risk (and accordingly, an incapacitative purpose) shouIcl be a primary consideration in early-release cle- cisions. creased, this can only be clone at the expense of increasing false positives. In the second scenario false positives will also be punished more harshly than will those selected for release based on the selection device. But they wit! not be punished more harshly than they would have been had the device (and predic- tion) not been used. This is a critical distinction. Rather than falsely treating some persons more harshly than "neces- sary," the proposal treats some persons less harshly than "necessary," treats some persons no more harshly than "neces- sary," and is agnostic with respect to the harshness of punishment received by oth- ers. The scheme implicitly assumes that punishment is imposer} for a variety of reasons; thus, although release may be granted or denied based on risk consider- ations, those cleniec3 release including those "falsely" denied the privilege ap- propriately are confined for whatever purpose originally intencled. (We c30 not claim that all original purposes necessar- iTy are appropriate. We simply point out that the scheme appears to be atheoretical with respect to them.) It must be remembered that the actual consequences of the two types of preclic- tion errors probably are not equal. This likely will prove true whether one con- siders costs in social, economic, or ethical terms. Earlier, we provided a simple mocleT whereby one could assign relative weights to the consequences of one or the other type of error, but so far as we know, this has not yet been attempted. We would urge that such modeling be con- si~lerecI. The two scenarios also slider substan Thus, it may be noted that the selective incapacitation notion argues, based in part on considerations of the accuracy of prediction, that sentencing policies and practice shouIcl be changed. The selec- tive cleinstitutionalization concept makes no such argument. Indeed, in our exam- ple we were forced to make selections clue to other considerations (e.g., prison crowding). There is a fundamental difference be- tween the two situations, and this cliffer- ence requires some clarification of our original question: Is prediction currently accurate enough to be useful? When the 43We do not argue that this position necessarily has been advanced by the proponents of the strat- egy.

280 question is stated this way, the answer can only be "yes and no." Prediction in criminal justice settings clearly is not suf- ficiently accurate to form the basis of social policy. Proposals for dramatic changes in policy and practice that rely on the accuracy of prediction are prema- ture at best. Once social policy has been set, however, prediction clearly is suffi- ciently accurate to be useful, and deci- sions made will be made more accurately if statistically based prediction tools are used. Even when validity is very low, it has been clemonstratec3 that selection cle- vices provide significant improvements in accuracy (Dunnette, 19661. We freely admit the judgmental nature of our preference for the selective clein- stitutionaTization proposal over the selec- tive incapacitation proposal and note that the choice largely is an ethical one. It floes appear, however, that consequences of the proposal we advocate are more benign than are consequences arising from a selective incapacitation proposal. Ant] we believe that predictive accuracy, while in need of much improvement, is sufficient for the former but insufficient for the latter. If society should clecide that selective incapacitation is the appropriate strategy for sentencing criminal offend- ers, it is clear that prediction tools should be used in the decision-making process. To decide the policy question on the basis of current predictive accuracy, how- ever, would be foolish. REFERENCES Adams, K. 1983 The effect of evidentiary factors on charge reduction. Journal of Criminal Justice 11:525-537. American Bar Association 1968 Standards Relating to Pretrial Release. New York: Institute for Judicial Administration. Anderson, N. 1968 A simple model for information integration. Pp. 731-743 in R. P. Abelson, Elliot Aronson, CRIMINAL CAREERS AND CAREER CRIMINALS William J. McGuire, Theodore M. Newcomb, Milton J. Rosenberg, Percy H. Tannenbaurn, eds., Theories of Cognitive Consistency: A Sourcebook. Chicago, Ill.: Rand McNally. 1974 Cognitive algebra: integration theory ap- plied to social attribution. Pp. 1-101 in L. Berkowitz, ea., Advances in Experimental Social Psychology. New York: Academic Press. 1979 Algebraic rules in psychological measure- ment. American Scientist 67:555-563. Angel, A., Green, E., Kaufman, H., and Van Loon, E. 1971 Preventive detention: an empirical analysis. Harvard Civil Rights Civil Liberties Lau; Review 6:301-396. Babst, D. V., GottEredson, D. M., and Ballard, K. B. 1968 Comparison of multiple regression and con- figural analysis techniques for developing base expectancy tables. Journal of Research in Crime and Delinquency 5(11:72~80. Babst, D. V., Inciardi, J. A., and Jaman, D. R. 1971 The uses of configural analysis in parole prediction research. Canadian Journal of Criminology and Corrections 13(31:20(~ 208. Babst, D. V., Koval, M., and Neithercutt, M.G. 1972 Relationship of time served to parole out- come for different classifications of burglars based on males paroled in fifty jurisdictions in 1968 and 1969. Journal of Research in Crime and Delinquency 9:9~116. Baldwin, J. 1979 Ecological and areal studies in Great Britain and the United States. Pp. 2~66 in N. Morris and M. Tonry, eds., Crime andJus- tice: An Annual Review of Research. Chi- cago, Ill.: University of Chicago Press. Ballard, K. B., Jr., and GottEredson, D. M. 1963 Predictive Attribute Analysis in a Prison Sample and Prediction of Parole Perforrn- ance. Institute for the Study of Crime and Delinquency. Vacaville, Calif. Becker, G., and McClintock, C. 1967 Value: behavioral decision theory. Annual Review of Psychology 18:23~286. Berkson, J. 1947 Cost utility as a measure of efficiency of a test. Journal of the American Statistical As- sociation 42:246-255. Bernstein, I., Kelly, W., and Doyle, P. 1977 Societal reaction to deviants: the case of criminal defendants. American Sociological Review 42~0ctober):743-755. Bernstein, I., Kick, E., Leung, J., and Schulz, B. 1977 Charge reduction: an intermediary stage in the process of labelling criminal defendants. Social Forces 56(2~:362-384.

ACCURACY OF PREDICTION MODELS Blumstein, A., and Cohen, J. 1979 Estimation of individual crime rates from arrest records. Journal of Criminal Law and Criminology 70:561~85. Blumstein, A., and Graddy, E. 1982 Prevalence and recidivism in index arrests: a feedback model. Law and Society Review 16:26~290. Blumstein, A., Cohen, J., and Nagin, D., eds. 1978 Deterrence and Incapacitation: Estimating the Effects of Criminal Sanctions on Crime Rates. Washington, D.C.: National Academy of Sciences. Blumstein, A., Cohen, J., Martin, S., and Tonry, M., eds. 1983 Research on Sentencing: The Searchfor Re- form. Washington, D.C.: National Academy Press. Bock, E. W., and Frazier, C. E. 1977 Official standards versus actual criteria in bond dispositions. Journal of Criminal Jus- tice 5:321-328. Borden, H. G. 1928 Factors for predicting parole success. Amer- ican Institute of Criminal Law and Crimi- nology 19(3~:328~336. Brosi, K. 1979 A Cross-City Comparison of Felony Case Processing. Washington, D.C.: INSLAW. Brown, L. D. 1978 The development of a parolee classification system using discriminant analysis. Journal of Research in Crime and Delinquency 15:92-108. Burgess, E. W. 1928 Factors determining success or failure on parole. In A. A. Bruce, E. W. Burgess, and A. J. Harno, eds., The Workings of the Inde- terminate Sentence Law and the Parole Sys- tem in Illinois. Springfield, Ill.: Illinois State Board of Parole. Bynum, T. 1976 An Empirical Exploration of the Factors In- fluencing Release on Recognizance. Unpub- lished doctoral dissertation, Department of Criminology, Florida State University, Talla- hassee. Caldwell, M. G. 1951 Preview of a new type of probation study made in Alabama. Federal Probation 15 (June):3-15. Campbell, D. T., and Stanley, J. C. 1963 Experimental and Quasi-experimental De- signsforResearch. New York: Prentice-lIall. Carroll, J. S. 1977 Judgments of recidivism risk: conflicts be- tween clinical strategies and base-rate infor 28] mation. Law and Human Behavior 1~2): 191-198. 1978a Causal attributions in expert parole deci- sions. Journal of Personality and Social Psy- chology 36:1501-1511. 1978b Causal theories of crime and their effect upon expert parole decisions. Law and Hu- man Behavior 244~:377-388. Carroll, J. S., and Payne, J. W. 1976 The psychology of the parole decision proc- ess: a joint application of attribution theory and information processing. In J. Carroll and J. Payne, eds., Cognition and Social Behav- ior. Hillsdale, N.J.: Erlbaum. 1977a Crime seriousness, recidivism risk, and causal attribution in judgments of prison terms by students and experts. Journal of Applied Psychology 62:592-602. 1977bJudgments about crime and the criminal: a model and a method for investigating parole decisions. In B. Sales, ea., Perspectives in Law and Psychology, Vol. I: CriminalJustice System. New York: Plenum. Carroll, J. S., Wiener, R. L., Coates, D., Galegher, J., and Alibrio, J. J. 1982 Evaluation, diagnosis, and prediction in pa- role decision making. Law and Society Re- view 17~1~:199-228. Chi, K. S. 1983 Offender risk assessment: the Iowa model. Innovations 1-12. Lexington, Ky.: Council of State Governments. Clarke, S. H., Freeman, J. L., and Koch, G. G. 1976 The Effectiveness of Bail Systems: An Anal- ysis of Failure to Appear in Court and Rear- rest While on Bail. Institute of Government, University of North Carolina, Chapel Hill, N.C. Cohen, J. 1983a Incapacitating criminals: recent research findings. NIJ Research in Brief. Washington, D.C.: National Institute of Justice (Decem- ber). 1983b Incapacitation as a strategy for crime control: possibilities and pitfalls. Pp. 1-84 in M. Tonry and N. Morris, eds., Crime and Jus- tice: An Annual Review of Research. Vol. V. Chicago, Ill.: University of Chicago Press. Cohen, J., and Tonry, M. 1983 Sentencing reforms and their impacts. Pp. 305~59 in A. Blumstein, J. Cohen, S. Mar- tin, and M. Tonry, eds., Research on Sen- tencing: The Search for Reform, Vol. II. Washington, D.C.: National Academy Press. Cohen, L., and Kluegel, J. 1978 Determinants of juvenile court dispositions: ascriptive and achieved factors in two metro

282 politan courts. American Sociological Re- view 43(April): 162-176. Cole, G. 1970 The decision to prosecute. Law and Society Review 4:331-343. Cook, T. D., and Campbell, D. T. 1979 Quasi-experimentation: Design and Analy- sis Issues for Field Settings. Chicago, Ill.: Rand McNally. Cronbaeh, L. J. 1960 Essentials of Psychological Testing. New York: Harper. Cronbaeh, L. J., and Gleser, G. C. 1957 Psychological Tests and Personnel Deci- sions. Urbana: University of Illinois Press. Cureton, E. E. 1957 Recipe for a cookbook. Psychological Bulle- tin 54(6):494-497. Daiger, D. C., Gottfredson, G. D., Stebbins, B., and Lipstein, D. J. 1978 Explorations of Parole Policy. Center for So- eial Organization of Schools, Johns Hopkins University, Baltimore, Md. Dawes, R. M. 1975 Case by case versus rule-generated proee- dures for the allocation of scarce resources. Pp. 83-94 in M. Kaplan and S. Schwartz, eds., Human Judgment and Decision Pro- cesses in Applied Settings. New York: Aea- demie Press. 1979 The robust beauty of improper linear models in decision making. American Psychologist 34~7~:571~82. Dawes, R. M., and Corrigan, B. 1974 Linear models in decision making. Psycho- logical Bulletin 81(2~:95-106. Dawson, R. 1969 Sentencing: The Decision as to Type, Length, and Conditions of Sentencing. Chi- cago, Ill.: Little, Brown. DeGroot, A. D. 1960 Via Clinical to Statistical Prediction. Invited address, presented at the meetings of the Western Psychological Association, San Jose, California (April). Dershowit:z, A. 1976 Fair and Certain Punishment. Report of the Twentieth Century Fund Task Force on Criminal Sentencing. New York: McGraw- Hill. Duncan, O. D., Ohlin, L. E., Reiss, A. I., and Stanton, H. R. 1952 Formal devices for making selection deci- sions. American Journal of Sociology 58:573~584. Dunnette, M. D. 1966 Personnel Selection and Placement. Belmont, Calif.: Brooks/Cole. CRIMINAL CAREERS AND CAREER CRIMINALS Ebbesen, E., and Konecni, K. 1975 Decision making and information integra- tion in the courts: He setting of bail.Journal of Personality and Social Psychology 32(5):805-821. 1981 On the external validity of decision-making research: what do we know about de- cisions in the real world? Pp. 21-43 in T. Wallsten, ea., Cognitive Processes in Choice and Decision Behavior. Hillsdale, N.J.: Erlbaum. Edwards, W. 1954 The theory of decision making. Psychologi- cal Bulletin 51(41:380-417. 1961 Behavioral decision theory. Annual Review of Psychology 12:473-498. Ehrlich, I. 1973 Participation in illegitimate activities: a ~e- oretical and empirical investigation. Journal of Political Economy 81~3~:521-565. 1974 Participation in illegitimate activities: an economic analysis. Pp. 69-134 in G. S. Becker and W. M. LaIldes, eds., Essays in Economics of Crime and Punishment New York: Columbia University Press. Einhom, H., and Hogarth, R. M. 1981 Behavioral decision theory: processes of judgment and choice. Annual Review of Psy- chology 32:53-88. Einhorn, H., and Schacht, S. 1975 Decisions based on fallible clinical judg- ment. Pp. 126-144 in M. Kaplan and S. Schwartz, eds., Human Judgment and Deci- sion Processes in Applied Settings. New York: Academic Press. Elion, V., and Megargee, E. I. 1979 Racial identity, length of incarceration, and parole decision making. Journal of Research in Crime and Delinquency 16:232 - 245. Ennis, B. I., and Litwack, T. R. 1974 Psychiatry and the presumption of expertise: flipping coins in the courtroom. California Law Review 62:69~752. Farringon, D. P. 1978 The family background of aggressive youths. In L. Hersov, M. Berger, and D. Shaffer, eds., Aggression and Antisocial Behavior in Childhood and Adolescence. Oxford, Eng- land: Pergamon. 1979 Longitudinal research on crime and delin- quency. Pp. 289 348 in N. Morris and M. Tonry, eds., Crime and Justice: An Annual Review of Research, Vol. 1. Chicago, Ill.: University of Chicago Press. 1982 Longitudinal analyses of criminal violence. Pp. 171-200 in M. E. Wolfgang and N. A. Weiner, eds., Criminal Violence. Beverly EIills, Calif.: Sage.

ACCURACY OF PREDICTION MODELS Feeley, M., and McNaughton, J. 1974 The Pretrial Process in the Sixth Circuit. Unpublished manuscript, University of Cal- ifornia, School of Law, Berkeley. Fergusson, D. M., Fifield, J. K., and Slater, 9. W. 1977 Signal detectability theory and the evalua- tion of prediction tables.Journal of Research in Crime and Delinquency 14(21:237-246. Fildes, R., and Gottfredson, D. M. 1968 Cluster analysis in a parolee sample.Journal of Research in Crime and Delinquency 5(1):2~11. Fischer, D. R. 1981 Offender Risk Assessment: Implications for Sentencing and Parole Policy. Statistical Analysis Center, Iowa Office for Planning and Programming. 523 E. 12th St., Des oines, Iowa. 1983 Better public protection with fewer inmates? Corrections Today (December):1~20. 1984 Risk Assessment: Sentencing Based on Probabilities. Statistical Analysis Center, Iowa Office for Planning and Programming, 523 E. 12th St., Des Moines, Iowa. No Selective Incapacitation of Potentially Vio date lent Adult Offenders. Statistical Analysis Cen- ter, Iowa Office for Planning and Program- ming, 523 E. 12th St., Des Moines, Iowa. Fisher, F. M., and Kadane, J. B. 1983 Empirically based sentencing guidelines and ethical considerations. Pp. 18~193 in A. Blumstein, J. Cohen, S. Martin, and M. Tonry, eds., Research on Sentencing: The Search for Reform, Vol. II. Washington, D. C.: National Academy of Sciences. Fisher, J. 1959 The twisted pear and the prediction of be- havior. Journal of Consulting Psychology 23:400~05. Forst, B. 1976 Participation in illegitimate activities: fur- ther empirical findings. Policy Analysis 243~:477~92. 1983 Selective incapacitation: an idea whose time has come? Federal Probation 46: 1~23. Forst, B., and Brosi, K. 1977 A theoretical and empirical analysis of the prosecutor. Journal of Legal Studies 6: 177-191. Forst, B., Lucianovic, J., and Cox, S. 1977 What Happens After Arrest? Washington, D.C.: INSLAW. Fowler, L. 1983 Classification and prediction: improving on chance. Corrections Today (December): 41 17. Freed, D., and Wald, P. 1964 Bail in the United States: 1964. Washington, 283 D.C.: U.S. Department of Justice and the Vera Institute of Justice. Frey, E. 1951 Der frnhEriminelle Ruckfallsverbrecher. Criminalite precoce et recividisme, Schweizerische Criminalistische Stud., 4. Basel: Berlag fur Recht und Gesellschaft. Galegher, J., and Carroll, J. S. 1983 Voluntary sentencing guidelines: prescrip- tion for justice or patent medicine? Law and Human Behavior 7(4):361~00. Galton, F. 1895 (Nature, June). Cited in E. Banks, 1964, Reconviction of young offenders. Curren~t Legal Problems 17:61-79. Garberj S., Klepper, S., and Nagin, D. 1983 The role of extralegal factors in determin- ing criminal case dispositions. Pp. 129- 183 in A. Blumstein; J. Cohen, S. Martin, and M. Tonry, eds., Research on Sen- tencing: The Search for Reform, Vol. II. Washington, D.C-.: National Academy Press. Gaudet, F., Harris, G., and St. John, C. 1933 Individual differences in the sentencing of judges. Journal of Criminal Law and Crim- inology 23:811. Gerecke 1939 Zur Frage der Ruckfallsprognose. Monats- schriftfurKriminalbiologie und Strafrechts- reform 30:35 38. Glaser, D. 1954 A reconsideration of some parole prediction factors. American Sociological Review 19: 335 341. 1955 The efficacy of alternative approaches to pa- role prediction. American Sociological Re- view 20:28~287. 1962 Prediction tables as accounting devices for judges and parole boards. Crime and Delin- quency 8(3):239-258. 1964 The E:ffectivenes of a Prison and Parole Sys- tem. New York. Bobbs-Merrill. Glaser, D., and O Leary, V. 1966 Personal Characteristics and Parole Out- come. National Parole Institutes. Office of Juvenile Deiinquency and You~ Develop- ment, Washington, D.C:.: U.S. Department of Heal~, Education, and Welfare. Glass, G. V. 1976 Primary, secondary, and meta-analysis of re- search. Educational Research 5:~8. Glass, G. V., McGaw, B., and Smith, M. 1981 Meta-analysis in Social Research. Beverly Hills, Calif.: Sage. Glueck, S., and Glueck, E. 1950 Unraveling Juvenile Delinquency. New York: Commonweal~ Fund.

284 Goldberg, L. R. 1965 Diagnosticians vs. diagnostic signs: the diag- nosis of psychosis vs. neurosis from the MMPI. Psychological Monographs 79(91. 1968 Seer over sign: the first "good" example? Journal of Experimental Research in Person- ality 3:16~171. 1970 Man versus model of man: a rationale, plus some evidence for a method of improving on clinical inference. Psychological Bulletin 73:422-432. Goldkamp, J. 1979 Two Classes of Accused: A Study of Bail and Detention in American Justice. Cambridge, Mass.: Ballinger. Goldkamp, J., and Gottiredson, M. R. 1980 Bail decision making and pretrial detention: surfacing judicial policy. Law and Human Behavior 3~41:227-249. 1981a Bail Decisionmaking: A Study of Policy Guidelines. Washington, D.C.: National In- stitute calf Corrections. 1981bBail Decisionmaking: Appendices. Washing- ton, D.C.: National Institute of Corrections. 1985 Policy Guidelinesfor Bail: An Experiment in Court Reform. Philadelphia, Pa.: Temple University Press. Goodman, L. A. 1953a The use and validity of a prediction instru- ment. I. A reformulation of the use of a prediction instrument. American Journal of Sociology 58(5):503-510. 1953b The use and validity of a prediction instru- ment. II. The validation of prediction. Amer- icanJournal of Sociology 58~5~:51~512. Gordon, R. A. 1977 A critique of the evaluation of the Patuxent Institution, with particular attention to the issues of dangerousness and recidivism. Bul- letin of the American Academy of Psychiatry and the Law 5(21:21(~255. Gottiretlson, D. M. 1961 Comparing and combining subjective and objective parole predictors. Research News- letter 3(Sept.-Dec.~. Vacaville: California Medical Facility. 1967 Assessment and prediction methods in crime and delinquency. Pp. 171-187 in Task Force Report: Juvenile Delinquency and Youth Crime. Task Force on Juvenile Delin- quency, President's Commission on Law Enforcement and the Administration of Jus- tice. Washington, D.C.: U.S. Government Printing Office. 1975 Decision-making in the Criminal Justice System: Reviews and Essays, ed. Center for studies of Crime and Delinquency. CRIMINAL CAREERS AND CAREER CRIMINALS Rockville, Md.: National Institute of Mental Health. Gottfredson, D. M., and Ballard, K. B. 1964a Association Analysis, Predictive Attribute Analysis, and parole behavior. Paper pre- sented at the Western Psychological Associ- ation meetings, Portland, Oregon (April). 1964b Estimating Sentences Under an Indetermi- nate Sentencing Law. Institute for the Study of Crime and Delinquency, Vacaville, Calif. 1965a Prison and Parole Decisions: A Strategy for Study. Final Report to the National Institute of Mental Health. Institute for the Study of Crime and Delinquency, Vacaville, Calif. 1965b The Validity of Two Parole Prediction Scales: An Eight Year Follow-up Study. In- stitute for the Study of Crime and Delin- quency, Vacaville, Calif. 1966 Differences in parole decisions associated with decision-makers.Journal of Research in Crime and Delinquency 3:112-119. GottEredson, D. M., and Beverly, R. F. 1962 Development and operational use of pre- diction methods in correctional work. Pro- ceedings of the Social Statistics Section. Washington, D.C.: American Statistical As- sociation. Gottiredson, D. M., and Bonds, J. A. 1961 A Manualfor Intake Base Expectancy Scor- ing (Form CDC-BE 61A). Research Division. Sacramento, Calif.: California Department of Corrections. GottEredson, D. M., and Stecher, B. 1979 Sentencing Policy Models: An Action Re- search Program. Paper presented at the meetings of the American Psychological As- sociation, Toronto, Ontario, Canada. Gottfredson, D. M., Ballard, K. B., and Lane, L. 1963 Association Analysis in a Prison Sample and Prediction of Parole Performance. Institute for the Study of Crime and Delinquency, Vacaville, Calif. GottEredson, D. M., Wilkins, L. T., and Ho~nan, P. B. 1978 Guidelines for Parole and Sentencing: A Policy Control Method. Lexington, Mass.: D. C. Heath. Gotttredson, D. M., Ho~nan, P. B., Sigler, M. H., and Wilkins, L. T. 1975 Making paroling policy explicit. Crime and Delinquency 21(1):31 14. Gotttredson, D. M., Cosgrove, C. A., Wilkins, L. T., Wallenstein, J., and Rauh, C. 1978 Classification for Parole Decision Policy. Washington, D.C.: U.S. Government Print- ing Office. Got~redson, G. D., and GottEredson, D. C. 1984 Victimization in Six Hundred Schools: An

ACCURACY OF PREDICTION MODELS Analysis ofthe Roots of Disorder. New York: Plenum. Gottfredson, M. R. 1974 An empirical analysis of pre-trial release de- c i s i o n s . J o u r n a l o f C r ~ m i n a l J u s t i c e 2 : 287 - 30 4 . 1979a Parole board decision making: a study of disparity reduction and the impact of institu- tional behavior. Journal of Criminal Law and Criminology 70(1~:77-88. 1979b Treatment destruction techniques. Journal of Research in Crime and Delinquency 16:39. GottEredson, M. R., and GottEredson, D. M. 1980a Data for criminal justice evaluation: some re- sources and pitfalls. Pp. 97-118 in M. Klein and K. Teilman, eds., Handbook of Criminal Justice Evaluation. Beverly Hills, Calif.: Sage. 1980bDecisionmaking in Criminal Justice: Toward the Rational Exercise of Discretion. Cambridge, Mass.: Ballinger. 1984 Guidelines for incarceration decisions: a par- tisan review. University of Illinois Law Re- view 2:291-317. GottEredson, S. D. 1984 Institutional responses to prison crowding. New York University Review of Law and Social Change 12~1~:259-273. Gottfredson, S. D., and Goodman, A. C. 1983 The Dimensions ofJudged Offense Serious- ness. Center for Metropolitan Planning and Research. Baltimore, Md.: Johns Hopkins University. GottEredson, S. D., and Gottfredson, D. M. 1979 Screening for Risk: A Comparison of Meth- ods. Washington, D.C.: National Institute of Corrections. 1980 Screening for risk: a comparison of methods. CriminalJustice and Behavior 7(3~:315-330. 1985 Screening for risk among parolees: policy, practice, and method Pp. 54-77 in D. Far- ringon and R. Tarling, eds., Prediction in Criminology. Albany, N.Y.: SUNY Albany Press. GottEredson, S. D., and Taylor, R. B. 1983 The Crisis in Corrections: Prison Popula- tions and Public Policy. Washington, D.C.: National Institute of Justice. 1986 Person-environment interactions in the pre- diction of recidivism. In J. Byrne and R. Sampson, eds., The Social Ecology of Crime. New York: Springer Verlag. Cough, H. G. 1962 Clinical versus statistical prediction in psy- chology. Pp. 526-584 in L. Postman, ea., Psychology in the Making. New York: Knopf. Green, B. F., Jr., and Hall, J. A. 1984 Quantitative methods for literature reviews. Annual Review of Psychology 33:37-53. 28S Green, D. M., and Swets, J. A. 1966 Signal Detection Theory and Psychophysics. New York: John Wiley & Sons. Greenberg, D. F. 1975 The incapacitative effect of imprisonment: some estimates. Law and Society Review 9:541-580. Greenwood, P. W., with Abrahamse, A. 1982 Selective Incapacitation. Report to the Na- tional Institute of Justice. Santa Monica, Calif.: Rand Corp. Grossman, B. A., ed. 1980 New Directions in Sentencing. Toronto: Butterworth. Guilford, J. P. 1965 Statistics for Psychology and Education. New York: Prentice-Hall Hagan, J. 1974 Extra-legal attributes and criminal sentenc- ing: an assessment of a sociological view- point. Law and Society Review 8(Spring): 357-383. Hagan, J., and Bumiller, K. 1983 Making sense of sentencing: a review and critique of sentencing research. Pp. 1-54 in A. Blumstein, J. Cohen, S. Martin, and M. Tonry, eds., Research on Sentencing: The Search for Reform, Vol. II. Washington, D.C.: National Academy Press. Hakeem, M. 1948 The validity of the Burgess method of parole prediction. American Journal of Sociology 53~5~:376-386. Hale, M. M. 1984 The Influence of Sentencing Goals on Judi- cial Decision-Making. Unpublished doctoral dissertation, Department of Psychology, Johns Hopkins University, Baltimore, Md. Hamilton, W. A., and Work, C. R. 1973 The prosecutor's role in the urban court system: the case for management conscious- ness. Journal of Criminal Law and Cr~mi- nology 64~2~: 183-189. Harris, M. K. 1975 Disquisition on the need for a new criminal justice sanctioning system. West Virginia Law Review 77:263. Hart, H. 1923 Predicting parole success. Journal of Crimi- nal Law and Criminology 14:405-413. Hart, H. L. A. 1968 Punishment and Responsibility: Essays in the Philosophy of Law. New York: Oxford University Press. Hays, W. L. 1963 Statisticsfor Psychologists. New York: Holt, Rinehart, & Winston.

286 Hindelang, M., [Iischi, T., and Weis, J. 1981 Measuring Delinquency. Beverly Hills, Calif.: Sage; lIirschi, T., and Selvin, H. 1967 Delinquency Research: An Appraisal of An- alytic Methods. New York: Free Press. Hoffman, P. B. 1983 Screening for risk: a revised Salient Factor Score (SFS 81~.Journal of CriminalJustice 11(6):539-547. Hoffman, P. B., and Adelberg, S. 1980 The Salient Factor Score: a non-technical overview. Federal Probation 44(1~:44-57. Hoffman, P. B., and Beck, J. L. 1974 Parole decision-making: a Salient Factor Score. Journal of Criminal Justice 2: 195-206. 1976 Salient Factor Score validations: a 1972 re- lease cohort. Journal of Criminal Justice 4:69-76. Hoffman, P. B., Stone-Meirhoefer, B., and Beck,J. L. 1978 Salient Factor Score and release behavior: three validation samples. Law and Human Behavior 1:47-62. Hogarth, J. 1971 Sentencing as a Human Process. Toronto, Ontario: University of Toronto Press. Hogarth, R. M. 1980 Judgement and Choice: The Psychology of Decision. Chichester, England: John Wiley & Sons. Holland, T. R., Holt, N., Levi, M., and Beckett, G. E. 1983 Comparison and combination of clinical and statistical predictions of recidivism among adult offenders. Journal of Applied Psychol- ogy 68~21:203-211. Horst, P. 1941 The prediction of personal adjustment. So- cial Science Research Council Bulletin No. 48. New York: Social Science Research Council. 1963 The statewide testing program. Personnel and Guidance Journal ELI: 394-402. 1966 Psychological Measurement and Prediction. Belmont, Calif.: Wadsworth. Inciardi, I. A. 1971 The use of parole prediction with institution- alized narcotic addicts. Journal of Research in Crime and Delinquency 8~11:65-73. Jacoby, J. E. 1977 The Prosecutor's Charging Decision: A Pol- icy Perspective. Washington, D.C.: U.S. Government Printing Office. lacoby, J. E., liatledge, E. C., and Turner, S. H. 1979 Research on Prosecutorial Decisionmaking: Phase I Final Report. Washington, D.C.: Bureau of Social Science Research. CRIMINAL CAREERS AND CAREER CRIMINALS Janus, M. G. 1984 Selective Incapacitation: Have We Tried It? Does it Work? Federal Prison System. Wash- ington, D.C.: U.S. Department of Justice. John, H. 1963 Prediction Improvement Using the Split- Sample Technique and Criterion-Scaled In- dependent Variables. Unpublished master's thesis, University of Illinois. Kaplan, J. 1965 The prosecutorial discretion: a comment. Northwestern University Law Review 60:174. Kassenbaum, G., Ward, D., and Wilner, D. 1971 Prison Treatment and Parole Survival. New York: John Wiley & Sons. Kastenmeier, R. W., and Eglit, PI. C. 1973 Parole release decision-making: rehabilita- tion, expertise, and the demise of mythology. American University Law Review 22~3): 477-525. Kirby, B. C. 1954 Parole prediction using multiple correlation. American Journal of Sociology 59~61: 539-550. Klepper, S., Nagin, D., and Tierney, L. 1983 Discrimination in the criminal justice sys- tem: a critical appraisal of the literature. Pp. 55-128 in A. Blumstein, J. Cohen, S. Martin, and M. Tonry, eds., Research on Sentencing: The Searchfor Reform, Vol. II. Washington, D.C.: National Academy Press. Klienig, J. 1973 Punishment and Desert. The lIague; Neth- erlands: Martinus-Nijhoff. Kohnle, E. F. 1938 Die Kriminalitat entlassener Fursor- gezoglinge un die Moglichkeit einer Erfolgsprognose. Leipzig, East Germany: Kriminalistische Abahandlungen, No. 33. Lagoy, S. P., Senna, I. I., and Siegel, L. J. 1976 An empirical study on information usage for prosecutorial decision making in plea nego- tiations. American Criminal Law Review 13:43~471. Lancucki, L., and Tarling, R. 1978 The relationship between mean cost rating and Kendall's rank correlation coefficient taut Social Science Research 761~:81~7. Lee, W. 1971 Decision Theory and Human Behavior. New York: John Wiley & Sons. Levy, K. I. 1978 Predicting the time at which a specified number of subjects will achieve "success." Educational and Psychological Measure- ment 38:939-942.

ACCURACY OF PREDICTION MODELS Lipton, D., Martinson, R., and Wilks, J. 1975 The Electiveness of Correctional Treat- ment. New York: Praeger. Lloyd, M. R., and Joe, G. W. 1979 Recidivism comparisons actress groups: methods of estimation and tests of signifi- cance for recidivism rates and asymptotes. Evaluation Quarterly 3(11:10~117. Locke, J., Penn, R., Rock, R., Bunten, E., and Hare, G. 1970 Compilation and Use of Criminal Court Data in Relation to Pretrial Release of De- fendants: A Pilot Study. Washington, D.C.: U.S. Government Printing Office. Loeber, R., and Dishion, T. 1985 Early predictors of male delinquency: a re- view. Psychological Bulletin 94(1):68-99. Luce, R. D., and Raiffa, H. 1957 Games and Decisions. New York: John Wiley & Sons. Maltz, M. D., and McCleary, R. 1977 The mathematics of behavioral change: re- cidivism and construct validity. Evaluation Quarterly 143~:421-438. Maltz, M. D., McCleary, R., and Pollock, S. P. 1979 Recidivism and likelihood functions: a reply to Stollmack. Evaluation Quarterly 3~11: 12il31. Mannheim, PI., and Wilkins, L. T. 1955 Prediction Methods in Relation to Borstal Training. London, England: Her Majesty's Stationery Office. Manski, C. F. 1978 Prospects for inference on deterrence through empirical analysis of individual criminal be- havior. Pp.400 424 in A. Blumstein, J. Cohen, and D. Nagin, eds., Deterrence and Incapaci- tation: Estimating the Effects of Criminal Sanctions on Crime Rates. Washington, D.C.: National Academy of Sciences. Martin, S. E. 1983 The polities of sentencing reform: sentenc- ing guidelines in Pennsylvania and Minne- sota. Pp. 265 304 in A. Blumstein, J. Cohen, S. Martin, and M. Tonry, eds., Research on Sentencing: The Search for Reform, Vol. II. Washington, D.C.: National Academy Press. McCord, J. 1980 Patterns of deviance. In S. Sells, R. Crandell, M. Roff, J. Strauss, and W. Pollin, eds., Hu- man Functioning in Longitudinal Perspec- tive. Baltimore, Md.: Williams and Wilkins. Meehl, P. E. 1954 Clinical versus Statistical Prediction. Min- neapolis: University of Minnesota Press. 1965 Seer over sign: the first good example. Jour- nal of Experimental Research in Personality 1:27~32. 287 Meehl, P. E., and Rosen, A. 1955 Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin 52(3~: 194- 216. Meywerk, W. 1938 Beitrag zur Bestimmung der sozialen Prognose an Ruckfallsverbrechern. Monats- schrift 29. Miller, F. 1970 Prosecution: The Decision to Charge a Sus- pect with a Crime. Boston: Little, Brown. Minium, E. W. 1970 Statistical Reasoning in Psychology and Ed- ucation. New York: John Wiley & Sons. Minnesota Sentencing Guidelines Commission 1982 Preliminary Report on the Development and Impact of the Minnesota Sentencing Guide- lines. Report prepared by the Minnesota Sentencing Guidelines Commission, Suite 284 Metro Sq. Bldg., 7th and Robert Sts., St. Paul, Minn. Monachesi, E. D. 1932 Prediction Factors in Probation. Hanover, N.H.: Sociological Press. Monahan, J. 1978 The prediction of violent criminal behavior: a methodological critique and prospectus. Pp. 244-269 in A. Blumstein, J. Cohen, and D. Nagin, eds., Deterrence and Incapacita- tion: Estimating the Elects of Criminal Sanctions on Crime Rates. Washington, D.C.: National Academy of Sciences. 1981 Predicting Violent Behavior: An Assessment of Clinical Techniques. Beverly Hills, Calif.. Sage. Monahan, J., and Klassen, D. 1982 Situational approaches to understanding and predicting individual violent behavior. Pp. 292 ;319 in M. E. Wolfgang and N. A. Weiner, eds., Criminal Violence. Beverly Hills, Calif.: Sage. Morris, N. 1974 The Future of Imprisonment. Chicago, Ill.: University of Chicago Press. Mueller, G. 1977 Sentencing: Process and Purpose. Spring field, Ill.: Charles C Thomas. National Advisory Commission on Criminal Justice Standards and Goals 1973 Corrections. Washington, D.C.: U.S. Gov- ernment Printing Office. New York Times 1982a Cutting crime tied to jailing of busiest crim- inals. October 6. 1982b Making punishment fit future crimes. No- vember 14:E-9.

288 Newsweek 1982 To catch a career criminal. November 15:77. NIJ Reports 1984 Selective incapacitation: two views of a com- pelling concept. January 4~. Washington, D.C.: National Institute of Justice. Ohlin, L. E. 1951 Selection for Parole. New York: Russell Sage. Ohlin, L. E., and Duncan, O. D. 1949 The efficiency of prediction in criminology. American Journal of Sociology 54:441-451. O'Leary, V., and Hall, J. No Frames of Reference in Parole. National date Parole Institutes training document. Hacken- sack, N.J.: National Council on Crime and Delinquency Training Center. Ca. 1976. O'Leary, V., Gottfiedson, M. R., and Gelman, A. 1975 Contemporary sentencing proposals. Crimi- nal Law Bulletin 11:55. Olweus, D. 1977 A critical analysis of the "modern" interac- tionist position. Pp. 221-234 in D. Magnus- son and N. Endler, eds., Personality at the Crossroads: Current Issues in Interactional Psychology. Hillsdale, N.J.: Erlbaum. Palmer, J., and Carlson, P. 1976 Problems with the use of regression analysis in prediction studies. Journal of Research in Crime and Delinquency 13~1~:64-81. Petersilia, J. 1980 Criminal career research: a review of recent evidence. Pp. 321-379 in N. Morris and M. Tonry, eds., Crime and Justice: An Annual Review of Research. Vol. 2. Chicago, Ill.: University of Chicago Press. Pitz, G. F., and Sachs, N. J. 1984 Judgment and decision: theory and applica- tion. Annual Review of Psychology 35: 13~164. Pope, C. 1976 The influence of social and legal factors on sentence dispositions: a preliminary analysis of offender-based transaction statistics. Jour- nal of Criminal Justice 4~3~:203-221. 1978 Sentence dispositions accorded assault and burglary offenders. Journal of CriminalJus- tice 6:151. Porebski, O. R. 1966 On the interrelated nature ofthe multivariate statistics used in discriminatory analysis. British Journal of Mathematical and Statis- tical Psychology 19~21:197-214. President's Commission on Law Enforcement and the Administration of Justice 1967 The Challenge of Crime in a Free Society. Washington, D.C.: U.S. Government Print- ing Office. CRIMINAL CAREERS AND CAREER CRIMINALS Rapoport, A., and Wallsten, T. 1972 Individualized decision behavior. Annual Review of Psychology 23:131-175. Reiss, A. J. 1951a The accuracy, efficiency, and validity of a prediction instrument. American Journal of Sociology 61:552-561. 1951bDelinguency as the failure of personal and social controls. American Sociological Re- view 16(2~:196-207. 1951c Unraveling juvenile delinquency. II: An ap- praisal of the research methods. American Journal of Sociology 57: 115-120. Rhodes, W. M. 1978 Plea Bargaining: Who Gains? Who Loses? PROMIS Research Publication No. 14. Washington, D.C.: INSLAW. Rich, W. D., Sutton, L. P., Clear, T. R., and Saks, M. 1982 Sentencing by Mathematics: An Evaluation of the Early Attempts to Develop and Imple- ment Sentencing Guidelines. Williamsburg, Va.: National Center for State Courts. Richardson, M. W. 1950 Effectiveness of selection devices. In D. H. Fryer and E. R. Henry, eds., [Iandbook of Applied Psychology. Vol. I. New York: Rhinehard. Rorer, L. G., Hoffman, P., and Hsieh, K. 1965 Utilities as base rate multipliers in the deter- mination of optimum cutting scores for the discrimination of groups of unequal size and variance. Journal of Applied Psychology 50:364~68. Rose, G. 1966 Trends in the use of prediction. Howard Journal of Penology and Crime Prevention 12~1~:26-33. Rossi, P., Waite, E., Bose, C., and Berk, R. 1974 The seriousness of crime: normative struc- ture and individual differences. American Sociological Review 39:224-237. Roth, I., and Wice, P. 1978 Pretrial Release and Misconduct in the Dis- trict of Columbia. PROMIS Research Project Publication No. 16. Washington, D.C.: INSLAW. Sarbin, T. 1943 Contributions to the study of actuarial and individual methods of prediction. American Journal of Sociology 48:593:602. Savitz, L. D. 1965 Prediction studies in criminology. Interna- tional Bibliography on Crime and Delin- quency. National Clearinghouse for Mental Health Infor~nation. Chevy Chase, Md.: Na- tional Institute of Mental lIealth.

ACCURACY OF PREDICTION MODELS Sawyer, J. 1966 Measurement and prediction, clinical and statistical. Psychological Bulletin 66: 178- 200. Schmidt, P., and White, A. D. 1979 Models of criminal recidivism and an illus- tration of their use in evaluating correctional programs. Pp. 21(}224 in L. Sechrest, S. White, and E. Brown, eds., The Rehabilita- tion of Criminal Offenders: Problems and Prospects. Washington, D.C.: National Acad- emy of Sciences. Schuessler, K. F. 1954 Parole prediction: its history and status.Jour- nal of Criminal Law and Criminology 45(November) :425-431. Scott, J. E. 1974 The use of discretion in determining the severity of punishment for incarcerated of- fenders. Journal of Criminal Law and Crim- inology 65(21:21~224. Sechrest, L., White, S., and Brown, E., eds. 1979 The Rehabilitation of Criminal Offenders: Problems and Prospects. Washington, D.C.: National Academy of Sciences. Sellin, T., and Wolfgang, M. 1964 The Measurement of Delinquency. New York: John Wiley & Sons. Shiedt, R. 1936 Ein Beitrag sum Problem der RuFfalls- prognose. Munich. Simon, F. H. 1971 Prediction Methods in (criminology, Includ- ing a Prediction Study of Young Men on Probation. London, England: Her Majesty's Stationery Office. 1972 Statistical methods of making prediction in- struments. Journal of research in Crime and Delinquency 9~11:4~53. Slovic, P., Fischoff, B., and Lichtenstein, S. 1977 Behavioral decision therapy. Annual Review of Psychology 28: 1-39. Solomon, H. 1976 Parole outcome: a multidimensional contin- gency table analysis. Journal of Research in Crime and Delinquency 13:107-126. Sparks, R. F. 1983 The construction of sentencing guidelines: a methodological critique. Pp. 194-264 in A. Blumstein, J. Cohen, S. Martin, and M. Tonry, eds., Research on Sentencing: The Search for Reform, Vol. II. Washington, D.C.: National Academy Press. Statistical Analysis Center 1980 The Iowa Offender Risk Assessment Scoring System: Volume I: System Overview and Coding Procedures. Iowa Office for Planning 289 and Programming, 523 E. 12th St., Des Moines, Iowa. 1983 The Impact of Objective Parole Criteria on Parole Release Rates and Public Protec- tion. Final Report to the General Assembly of Iowa. Iowa Office for Planning and Pro- gramming, 523 E. 12th St., Des Moines, Iowa. 1984 Offender Risk Assessment: The Iowa Model Validation Results First Draft. Iowa Office for Planning and Programming. 523 E. 12th St., Des Moines, Iowa. Stollmack, S., and Harris, C. M. 1974 Failure-rate analysis applied to recidivism data. Journal of the Operations Research Society of America. 22:1192-1205. Sutton, L. 1978 Federal Sentencing Patterns. Washington, D.C.: National Criminal Justice Information and Statistics Service. Thomas, W. H., Jr. 1976 Bail Reform in America. Berkeley: Univer- sity of California Press. Thurstone, L. L. 1927 The method of paired comparisons for socialvalues. Journal of Abnormal and So- cial Psychology 21(4~:381 100. Tibbets, C. 1931 Success and failure in parole can be pre- ~licted. Journal of Criminal Law, Criminol- ogy, and Police Science 22:11-50. Trunck, H. 1937 Soziale Prognosen an Strafgefangenen. 28 Monatsschrift fur Kriminalbiologie und Strafeschtsreform. Underwood, B. D. 1979 Law and the crystal ball: predicting behavior with statistical inference and individualized judgment. Yale Law Journal 88(6~: 140~1448. U.S. News and World Report 1982 Key to criminals' futures: their pasts. Octo- ber. van Alstyne, D. J., and GottEredson, M. R. 1978 A multidimensional contingency table anal- ysis of parole outcome: new methods and old problems in criminological prediction. Jour- nal of Research in Crime and Delinquency 15: 172-193. Vandaele, W. 1978 Participation in illegitimate activities: Ehrlich revisited. Pp. 27(}335 in A. Blumstein, J. Cohen, and D. Nagin, eds., Deterrence and Incapacitation: Estimating the Elects of Criminal Sanctions on Crime Rates. Washington, D.C.: National Academy of Sciences.

290 Vold, G. B. 1931 Prediction Methods and Parole: A Study of Factors Involved in the Violation or Non- violation of Parole in a Group of Minnesota Adult Males. Minneapolis, Minn.: Sociolog- ical Press. 1949 Comment on "The efficiency of prediction in criminology." American Journal of Sociol- ogy 54:451-452. von Hirsch, A. 1976 Doing Justice: The Choice of Punishments. New York: Hill-Wang. von Hirsch, A., and Gottfredson, D. M. 1984 Selective incapacitation: solve queries about research design and equity. New York Uni- versity Review of Law and Social Change 12(1):11~1. von Neumann, J., and Morgenstern, O. 1947 Theory of Games and Economic Behavior. Princeton, N.J.: Princeton University Press. Wainer, H. 1976 Estimating coefficients in linear models: it don't make no nevermind. Psychological Bulletin 83(2~:213-217. Warner, F. B. 1923 Factors determining parole from the Massa- chusetts Reformatory. Journal of Criminal Law and Criminology 14:172-2()7. Wice, P. 1973 Bail and Its Reform: A National Survey. Washington, D.C.: U.S. Govemment Print ing Office. CRIMINAL CAREERS AND CAREER CRIMINALS Wiggins, J. 1973 Personality and Prediction: Principles of Personality Assessment. Reading, Mass.: Addison-Wesley. Wilbanks, W., and Hindelang, M. 1972 The comparative efficiency of three predic- tion methods. Appendix B in D. GottEredson, L. Wilkins, and P. Hoffman, Summarizing Experience for Parole Decision-Making. Davis, Calif.: National Council on Crime and Delinquency Research Center. Wilkins, L. T., and Chandler, A. 1965 Confidence and competence in decision-ma- king. British Journal of Criminology 5 (lan- uary): 1. Wilkins, L. T., and MacNaughton-Smith, P. 1964 New prediction and classification methods in criminology.Journal of Research in Crime and Delinquency 1~1~:19~32. Williams, K. 1978 The Role of the Victim in the Prosecution of Violent Crimes. PROMIS Research Publica- tion No. 12, Washington, D.C.: INSLAW. Wolfgang, M. E., Figlio, R. M., and Sellin, T. 1972 Delinquency in a Birth Cohort. Chicago, Ill.: University of Chicago Press. Wright, K., Clear, T., and Dickson, P. 1984 Universal applicability of probation risk- assessment instruments: a critique. Crimi- nology 22(1~:113-134.

Next: 7. Some Methodological Issues in Making Predictions »
Criminal Careers and "Career Criminals,": Volume II Get This Book
×
 Criminal Careers and "Career Criminals,": Volume II
Buy Paperback | $110.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Volume II takes an in-depth look at the various aspects of criminal careers, including the relationship of alcohol and drug abuse to criminal careers, co-offending influences on criminal careers, issues in the measurement of criminal careers, accuracy of prediction models, and ethical issues in the use of criminal career information in making decisions about offenders.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!