Read "Criminal Careers and "Career Criminals,": Volume II" at NAP.edu

« Previous: 9. Dynamic Models of Criminal Careers

Page 380 Cite

Suggested Citation:"10. Random Parameter Stochastic-Process Models of Criminal Careers." National Research Council. 1986. Criminal Careers and "Career Criminals,": Volume II. Washington, DC: The National Academies Press. doi: 10.17226/928.

Page 381 Cite

Page 382 Cite

Page 383 Cite

Page 384 Cite

Page 385 Cite

Page 386 Cite

Page 387 Cite

Page 388 Cite

Page 389 Cite

Page 390 Cite

Page 391 Cite

Page 392 Cite

Page 393 Cite

Page 394 Cite

Page 395 Cite

Page 396 Cite

Page 397 Cite

Page 398 Cite

Page 399 Cite

Page 400 Cite

Page 401 Cite

Page 402 Cite

Page 403 Cite

Page 404 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

10 Random Parameter Stochastic Process Models of Criminal Careers John P. Lehoczky INTRODUCTION Background In the past decade there has been great growth in the development of quantita- tive methodologies to deal with criminal justice problems. This has included ex- tensive data gathering and analysis anct some modeling of offender behavior. As this data analysis proceeds, one can gain clearer insights into the nature of offender behavior and these shouIc! be incorpo- rated into increasingly detailect models. As the moclels increase in accuracy, one John P. Lehoczky is professor and head, Depart- ment of Statistics, Camegie-Mellon University. I wish to express my thanks to Alfred Blumstein and Jacqueline Cohen for their major contributions to the paper. The approaches developed in this paper are the outgrowth of a long series of discussions concerning appropriate models for criminal behav- ior and the empirical evidence supporting those models. In addition, I wish to thank Donald Gaver for his many discussions concerning hierarchical models and corrections to biases in criminal justice data sets. My thanks also to Arthur Gol~lberger, Jan Chaiken, Chul Woo Ahn, and Mark Schervish for their many comments on earlier drabs of this paper. 380 can begin to use them as policy tools to analyze the impact of various approaches to crime control, such as selective inca- pacitation. Unfortunately, it seems that the quan- titative moclels of offender behavior that have been cleveloped to date clo not cap- ture the recent insights about offender behavior fount] in major ciata analysis projects, such as the Rand prisoner self- report study. Indeed, the stochastic mod- eling approach began in 1973 with the work of Avi-Itzhak and Shinnar. This work, described below, treats individual- offender recidivism as a Poisson process. A great deal of subsequent modeling has been clone, but most of the models are simple extensions of the Poisson-process moclel, namely renewal-process moclels. This class of models assumes that recidi- vism times are independent and have the same (listribution. Such moclels may fit data better than a Poisson-process model, but they do not incorporate the current improver] un~lerstanding of offender be- havior. This paper represents an attempt to develop~a stochastic moclel that is in better accord with this understanding.

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS Three major aspects of offender behav- ior have been observed with sufficient frequency to merit incorporation into an- alytic models: · Crime-commission propensities change as a function of age. · Offender populations are markecIly heterogeneous. · Offenders often are thought to com- mit crimes in spurts and then to have periods with little or no activity. The age effect is very pronounced. It is widely recognized that offender behavior is at its peak during the late teens ant! early 20s and then cirops significantly clur- ing the Bus. Any stochastic moclel must address this age effect. Standard renewal- process moclels do not incorporate such effects, because they assume that the times between arrests are independent and identically distributed (i.i.c3.) and hence stationary. There is need to de- velop moclels that account for age effects but that simultaneously offer analytic tractability. Such models will be pre- sented in this paper. It is evident from recidivism ciata as well as self-report data that there is great heterogeneity in the offender population. This heterogeneity refers to differences between offenders, including markedly differing offense rates, career lengths, and types of crimes engaged in. This variation goes beyond differences that can reason- ably be observed from independent rep- lications of a single stochastic process. Most moclels do not take this heterogene- ity into account. The few exceptions arose from work at the Ranc! Corporation, in- cluding Chaiken and Rolph (1980) and Rolph, Chaiken, and Houchens (19811. In this paper, I argue for the use of hierar- chical moclels to represent heterogeneity. In such models each indivicluaT's crimi- nal career is regardec! as a stochastic proc- ess governed by parameters. Those pa- rameters are themselves treated as 387 random variables drawn from a parent distribution (superpopuJiation). The par- ent distribution captures the heterogene- ity of the population of offenders or the variation between individuals. One wishes to estimate the parameters of the parent distribution to gain insight into the population of offenders. In addition, one wishes to estimate the rate-influencing parameters of inclivicluals to understand the behavior of each of the offenders. Hierarchical moclels form the basis of the analysis in this paper. They are formally described, applied, and estimated in the discussions that follow. There is another aspect to criminal ca- reers that has generally not been incorpo- ratecT into stochastic models. This is the occurrence of quiescent periods in the course of the career. Self-report data re- veal that criminal behavior often occurs in spurts and is followed by lulIs in activ- ity. This is not surprising if, for example, the offender was attempting to gain suffi- cient money through a series of crimes and then, having reached that goal, stopped for a period. The typical renewal- process models do not incorporate such behavior. A new class of models that in- clucles this behavior is cleveloped below. Several other aspects of stochastic mod- eling of criminal careers are clealt with in this paper. One of the most interesting is the use of the hierarchical modeling ap- proach to correct for natural biases in data sets. Generally, criminal justice data sets JO not provide random samples from the offender population. Rather, inclivicluals are part of a sample because they meet a specific criterion that may be directly or inclirectly related to their parameter val- ues. For example, one might gather data on prisoners. This group of offenders is, however, not representative of the of- fencler population because it typically consists of inclividuals with high offense rates, more serious offenses, or longer careers. Similarly, if one took a sample of

382 arrestees in some time period, such a sample would overrepresent high-rate of- fenders, since they have a greater proba- b~lity of falling into such a sample. As illustrated below, the hierarchical model- ing approach can help to overcome this problem. It offers the opportunity to cle- velop a correction for nonrandom sam- pling, so that one can make more nearly correct inferences about the offender population from inherently biased data sets. Overview This paper introduces a hierarchical ~superpopulation) model for criminal ca- reers within a population of offenders or potential offenders. There are two levels to the hierarchy. The top level is used to explain variation between individuals in the population, that is, to explain the heterogeneity of the population. At the Tow level of the hierarchy, individuals engage in criminal careers that are Ire ate c! as independently evolving stochastic pro- cesses governed by certain distributions. These distributions contain parameters with values at the top level. The Tow level thus uses a stochastic-process model to help explain differences within careers governed by the same parameter values. Covariates can be introcluced at both levels of the hierarchical model. Covari- ates are of two types: "historical" covariates, which are fixed at the start of the career, and "dynamic" covariates, which can change during the evolution of the career. For an analysis of aclult of- fending careers, historical covariates could include juvenile record or the age at the time of the first juvenile arrest. Relevant dynamic covariates might in- clude employment status or drug use. The historical covariates can influence the choice of parameters for each indivicI- ual at the highest level of the hierarchy. Since these parameters are selected and CRIMINAL CAREERS AND CAREER CRIMINALS fixed at the beginning of the career, cly- namic covariates cannot be used. All covariates are allowed to influence the evolution of the career of any particular offender, the lowest level of the hierar- chy. A new family of stochastic models is introcluced in the paper. The models are characterized by two states, one of which corresponds to a high rate of crime com- mission and the other of which represents a low rate of activity (which is taken to be zero). Parameters are inclucled for the time spent in each state, state-switching probabilities, arrest probabilities, crime- type termination probabilities, and the times between crimes. For multiple crime types, a competing-risks formula- tion is used. The models offer tractability, can include covariates, provide periods of high and low activity, and introduce some behavorial parameters. Methods are also developed to assess and correct the biases that occur in many criminal justice data sets. Three specific issues are adclressecl: 1. If a data set is gathered by taking indivicluals who were arrester] during a certain Winslow of time, the data set will overrepresent individuals with high crime rates among those at liberty and underrepresent those who are in prison for part of the window period. 2. If a data set comes from self-reports of prisoners, it is not representative of the population in general, since prisoners tend to be high-rate offenders and to com- mit more violent crimes. 3. Even at the level of the incliviclual parameters, the inclividual crime or arrest rates that are estimated from among ar- restees or prisoners will be biased up- ward. This is because indivicluals are more likely to be caught in a period of high activity (even if their parameter val- ues may be low) and hence empirically show a high arrest rate.

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS The hierarchical model developer] in this paper can help to assess and correct these biases. Finally, a class of "phase distributions" is introduced. This class is very versatile in that it can approximate arbitrarily closely the distribution of any nonnega- tive random variable. In addition, the class is closed uncler a number of opera- tions that are useful for the models pre- sentecI in this paper. The closure proper- ties include convolution and mixtures, as well as maxima and minima of random variables drawn from this class. HIERARCHICAL MODELS . This section describes the use of hier- archical stochastic models for studying criminal careers. There are several rea- sons why this class of models is especially useful and of great conceptual value. First, it has frequently been observed that criminal behavior varies widely between inclividuals. This is especially true for crime rates, as measured by self-report ciata; some inclividuals report committing crimes at a very high rate, while others report they commit crimes rarely. Even allowing for biases in these data and cle- liberate falsification, it is clear that there is great variation between individuals. It is, therefore, appropriate to use a model that can represent this great variation. A second benefit of using a hierarchical model is that it can help to improve pa- rameter estimates for each inclividual. Suppose one treated each individual in isolation and attempted to estimate pa- rameter values for each individual using only his data (for example, arrest record ant! the values of covariates). One wouIcl finch that these estimators have a large variance. With a hierarchical moclel, how- ever, the data for other inclivicluals can be usecl to help estimate parameter values for a single individual. This follows be- cause the parameter values for all individ 383 uals are related, since they are modelecI as coming from a common parent distri- button. This situation has been observed and exploited with increasing frequency in statistical studies. This statistical for- mulation leacis to "shrinkage" estimators. This type of estimator was first intro- cluced by lames and Stein (19611. This topic has been receiving substantial re- cent attention in the statistics literature (see the review by Morris, 19831. Meth- ods based on maximum likelihood, em- pirical Bayes, and Bayes procedures have been developed. These approaches will be cliscussec3 in the section on parameter estimation; however, a recent example provided by Dempster, Rubin, and Tsutakawa (1981) may help to explain the benefits of the methodology. These au- thors present several examples, along with a theoretical treatment of likelihoo methods. One example deals with esti- mating the first-year performance of law school students using several explanatory variables. For a single law school, the estimates of regression coefficients are highly variable. It is possible to improve the estimates for a single law school markedly by simultaneously carrying out the analysis for many (82 in this case) law schools. One believes there is a reason- able similarity among law schools, and so the data for other schools are pertinent for any incliviclual school. The estimates for one school gain precision by considering many similar schools simultaneously. Other examples of this type are cited in Morris (19831. The situation is analogous to Model II analysis of variance or ranclom-effects moclels. One can distinguish the variation within a particular career and the varia- tion between careers. Criminal careers are modelecl using stochastic models. This is appropriate because any career has many random elements that control its evolution. If two in~lividuals have the same stochastic mechanism (i.e., the

384 same parameter values), the two resulting careers may nevertheless be quite clif- ferent. The offenders have possibly dif- ferent criminal opportunities, possibly different arrest realizations, possibly dif- ferent sentences, and so on. If an inclivic3- ual were allowed a second realization of his career, it would cliffer from the first. This is variation within a career. Variation between careers arises when individuals have different stochastic mechanisms (pa- rameter values) governing their careers. Once the individuals have been linked through a superpopulation or hierarchical moclel, the data for all individuals can be user] in aciclition to the data for a single indiviclual. The individual parameter es- timates will be drawn toward the average of the population. This is known as shrinkage. The amount of shrinkage will clepend on the size ofthe variation within careers versus the variation between ca- reers. If there is relatively small variation between individuals, the shrinkage can be great. The formal idea of the hierarchi- cal mode] (presented in more detail in the next section) is that one has a family of parameters, 0, that controls the evolution of an individual career, which is denoted {X e, t-0~. The parameters ~ are treated as random variables with some distribution Aft, which itself may contain some un- known parameters. One then wishes to use a clata set to estimate individual values and or or the unknown parameters of A. Given the above formulation, there is a third advantage to using hierarchical models. This is the possibility of assess- ing ancI correcting for sampling biases in a data set. Generally, criminal justice data sets are not randomly sampled from the population of offenders. More tYnically~ one generates a (lata set by selecting from inclivicluals having a particular attribute, such as an arrest in a certain time period, or who are in prison at a particular time. Each of these sampling mechanisms CRIMINAL CAREERS AND CAREER CRIMINALS yields a sample that is not random from or. If one is, therefore, to make inferences concerning the offender population, one must assess and correct those biases. The hierarchical formulation is useful in car- rying out this process, as will be illus- trated below. (Cohort samples can over- come the biasing problem; however, they generally yield too small a sample of criminal activity to be of great utility.) A NEW STOCHASTIC MODEL This section introduces a new family of stochastic moclels of the crime process and arrest process associated with a sin- gle criminal career. These new models are intended! to encompass more of the salient aspects of criminal behavior than has been possible with previous models. The basic mocle! presented is itself still oversimplifiecI but shouIcl serve as an introduction to a set of ideas and tools that future researchers will find useful. Only the most tractable versions of this family of models are presented in detail. This section is organized in a sequential man- ner. First, some ofthe most familiar, early stochastic moclels of criminal careers are summarized, including their good and bad points. Next, a model is presented that overcomes some of the objections to previous moclels. Finally, a class of flex- ible moclels is presented that seems to improve previous efforts consiclerably. Of course, further changes are to be antici- patecl as unclerstanding of the un(lerlying processes increases. Poisson Crime Processes A frequently used moclel for the proc- ess of crimes committed by a single incli- viclual is that crimes form a Poisson proc- ess (see Marlin and Taylor, 1975) cluring the times the individual is not in prison (see, for example, Avi-Itzhak and Shinnar, 1973; Rolph, Chaiken, ant! Houchens,

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS 19811. The times between crimes (after removing time in prison) are inclepen- clent random variables with an exponen- tial distribution having some mean, say 1/A. Associated with a crime process is an arrest process. This arrest process is a thinned version of the crime process in that only a subset of the crimes results in arrest) (and only a subset of the arrests results in imprisonment). A common as- sumption in all work is that an arrest is determined at random for each crime. This means that there is an arrest proba- bility q and that a crime event yields an arrest event with probability q indepen- dent of anything else. With this type of thinning to construct the arrest process, if the crime process is Poisson (A), the arrest process is Poisson (Aq). This simple mode! ofthe crime process has several attractive features: 1. The Poisson process is well uncler- stooc] and very tractable. In addition, given the assumption of random thinning, both the crime and arrest processes are Poisson. If the crime process is a renewal process and one thins it at random, the arrest process will be approximately Pois- son even if the crime process is not. 2. The Poisson process has a single parameter, and the statistical inference for it is well unclerstoocI. On the other hand, several drawbacks to the model must be ac3ciressed: 1. The Poisson moclel, as such, floes not account for population heterogeneity. 2. The Poisson moclel for the arrest process may not fit recidivism data (see HoIclen, 1983, for a discussion). That is, the times between arrests (after eliminat- ing time in prison) are not exponentially clistribute(l. 3. It has been observed, especially from prisoner self-report data, that arrests lit is assumed there is no problem of"false arrest." 385 and crimes appear to be more clustered than would be suggested by a Poisson moclel. Moreover, this mode! does not allow for any sort of aging effects. It has been widely noted that the frequency of arrests varies with time and at some point drops to essentially zero, suggesting an effective enc! to the career.2 4. A major drawback of the simple Poisson mode! is that the individual ex- erts no control over the career other than picking A. There are no decision points built into such models at which, for exam- ple, the incliviclual could clecicle to stop, or change, A. The events of the past career do not influence the future. Clearly, one would suspect that past events can have an important effect on the future and so one would like to broaclen the class of moclels to allow for this. Some of the previous issues can be overcome in a straightforward way. For example, one could introduce a random lifetime. Each indiviclual has a career length (often assumed to be exponen- tially clistributecT, to enhance tractability). When the length is exceeclecI, the incli- vidual no longer engages in crime (and so presumably is no longer arrested). One problem with this approach is that the career length, if determined at the start of The concept of a finite career length is some- what controversial (see, for example, Holden, 1983: 261. It is argued that there can be no logical point at which a criminal career can end, except death. Any former criminal could be presented with an oppor- tunity such that he would again commit a crime. While one may respect this point of view, it should be realized that no single stochastic model can be expected to represent an exact truth. Rather, one strives to construct models that are approximately true, that account for important effects, and that offer a tractable analysis. It may be that some criminals whose careers are said to have "ended" may, in fact, commit a few additional crimes. In such a case, one would expect the frequency of those crimes to be very low. When coupled with the fact that the arrest probability is generally very small, one expects that the arrest processes may be no different.

386 the career, is not influenced by any fac- tors in the career. (Overcoming the poor fit offered by the exponential distribution is discussed in the next section.) One can introduce population hetero- geneity in two ways: 1. One can allow A to depend on covariates, such as juvenile record or age at first juvenile arrest.3 2. One can allow A to be random (see, for example, Rolph, Chaiken, and Houchens, 19811. In this way, A can rep- resent the heterogeneity of a population of offenders. Renewal-Process Models Some improvement in fit can be achieved by replacing the Poisson-pro- cess model for crimes with the more gen- eral renewal-process model. Recall that a renewal process is a point process in which the times between points (crimes) are independent, identically clistributec3 random variables having a cumulative distribution function F. which is not nec- essariTy exponential. Arrests are then commonly consiclered to be a randomly thinned version of crimes. Arrests will also form a renewal process with distribu- tion G. One can determine G in terms of F and q; however, the relation is most easily expressed in terms of the Laplace- StieltJes transform (Neuts, 1981), LEFTS) Etexp(-sT)], where T represents a ge- neric random time between crimes. We fins! = where (A is the distribution of times be G~ ~ 1 - (l - q)*~F(s) . ~ . ~ . . . tween arrests. Many authors have used renewal-pro 3See, for example, Stollmack and Harris (1974), Barton and Tumbull (1981), and Golden (1983). CRIMINAL CAREERS AND CAREER CRIMINALS cess models for the crime process (see, for example, Holden, 19831. Many (listribu- tions have been used for F. These include the exponential, Weibull, Tognormal, gamma, mixtures of various distributions, and defective distributions. In addition, logistic regression methods have also been used. In each case a particular cTis- tribution was used to fit a particular clata set. One can readily see that no uni- versally appropriate family seems to fit recidivism data. However, a family of continuous distributions, callect phase distributions (Ph-distributions), seems particularly useful for several reasons: 1. The family is dense in the set of all positive distributions, that is, any positive distribution can be approximate(1 arbi- trariTy closely by a phase (distribution. 2. Commonly used distributions, such as the exponential, some gamma, and their mixtures, are phase distributions (Iognormal ancI Weibull are not but they can be closely approximate(l). 3. These distributions have a Markov- ian structure (see Neuts, 1981) and so are useful in stochastic model buiTcling, be- cause of their tractability. 4. The distributions are closed under such operations as mixing and convolu- tion. The renewal-process approach to mod- eling the crime process helps in that re- ci(livism data can be better fit; however, the other (li~culties cited earlier remain. The principal problems are the follow- ing: 1. The general renewal-process moclel still assumes indepenclent, identically (listributecl arrest times and does not offer the clustering of crimes or arrests usually reported or observecl. 2. The mocle! does not allow the incli- vi(lual to make (recisions concerning be- havior.

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS 3. The moclel does not account for the great amount of population variability that has been observed in criminal justice data sets. 4. The model floes not build in inter- actions between individuals and the criminal justice system. A New Class of Models In this section, I present a new mocle] designee] to overcome some of the diffi- culties with earlier moclels. The new moclel makes use of a pair of states be- tween which the individual moves.4 When the individual is in one ofthe states (the "high" state), he commits crimes at a high intensity. When in the other state (the "low" state), crimes are committed at a Tow intensity (which is taken to be zero). In acictition, switching between states al- Tows the individual some decision- making latitucle. I begin with a single crime type, generalize to multiple crime types, and then develop a hierarchical formulation. The following parameters are used in the model: T: number of distinct crime types. initial crime types for an indivi(l- ual, A C {1, 2, . . . , 11. Fin: the calf of the time between crimes of type t, 1 ' t ' T. A: the arrest probability for crime type t, 1 ' t ' T. ,8~: the probability of terminating crime type t, 1 ' t ' T. a: the probability of switching from the high-rate state to the Tow-rate state. PA two-state model of this sort was studied by Maltz and Pollack (1980~. It is also mentioned in llolph, Cha~ken, and Houchens (1981:37). 387 G: the calf of the time in the low-rate state. Let us definc · a crime process that is a renewal process with the time between crimes being a phase distribution F with mean 1/A; · an arrest probability q; · a state-switching probability a; · a phase-type distribution G govern- ing the amount of time the indiviclual spends in the Tow state, during which no crimes are committed; and · a probability ,B giving the probability that the career ends with the start of the current Tow period. We can describe the process intu- itively, as follows. A cycle begins with the individual in the high state. Crimes are committed according to a renewal proc- ess with distribution F. After each crime, two issues must be resolvecI: 1. with probability q, the inclividual is arrested, and 2. with probability car, the individual switches to the Tow state. If the inclividual switches to the low state, he may terminate his career with probability ,B. With the complementary probability, he stays in this state for a period detem~ined by the cumulative dis- tribution function G. While the inclivid- ual is in the low state, no crimes are committecl. The arrest, state-switching, and te~mi- nation probabilities are applied at ran- (lom, that is, without any (lependence on the realization of the process to that point. Indeed, it is interesting to consider a generalized model in which the process up to that time can influence these tran- sitions. For example, one can introduce a reinforcement eject. If any individual commits a crime and is not arrested, that might provide reinforcement to stay ac

388 five. Two unarrested crimes provide fur- ther reinforcement, and so on. One conic] simply introduce a sequence I~°~n) in which In represents the probability the individual offender moves to the Tow state after n consecutive unarrested crimes. The sequence could be chosen so that it strictly decreases to a positive limit. Any arrest may result in a conviction and a prison sentence. We are only inter- ested in the behavior during free time; consequently, the model is applicable only while the person is not in prison. This brings up the issue of how the pro- cesses should be initiated at time 0 (which we take to be age 18 for adult offending) and how they should be re- started after release from prison, if appli- cable. It is mathematically convenient to keep the process in equilibrium as much as possible. (The advantage ofthis will be seen below in the discussion of correc- tions for sampling biases.) This suggests that we should take the initial time to the first crime to be given by the equilibrium forward! recurrence time distribution. Once the first crime occurs, the renewal- process mocle! begins. Uncler the assumptions presenter! ear- lier, the crime process is a renewal proc- ess (if prison time is cleletec3~. After each crime, a coin is flipped, and clepen(ling on the result, the next crime comes ac- cording to F with probability (1 - cz) or according to F*G with probability cr(1- ,l3), or no further crimes occur with prob- ability cog. (Here * denotes convolution, since the time to the next crime is the sum of the length of the Tow period and the times to the first crime in the next high period.) This gives us a mixture of two- phase distributions, which will also be a phase distribution, and the resulting dis- tribution is defective in that it allows a positive probability of the value Go. Let us refer to this distribution of times between crimes by H. and let its Laplace-StieTtJes transform be given by AH. The arrest CRIMINAL CAREERS AND CAREER CRIMINALS process is also a terminating renewal process. This is easily seen, since each crime has an inclependent arrest proba- bility. The time between arrests is thus a random sum of independent random vari- ables having distribution H. If we let the distribution be K, then Kit ~ 1 - ~ ~ - q)* ~H(S) It should be noted that . H(O) = P(time between crimes < x' = 1-~h and ~K(O) = P(time between arrests ~ ok = . q*(l- ah) 1 - (1 - q)41 - ah) This model has several attractive fea tures: 1. The model is, in reality, a terminat- ing renewal process for both crimes and arrests. In this instance though, the model explicitly builds in parameters to repre- sent ways in which the individual can control activities and does so in a realistic way. After each crime, the individual con- trols whether to continue or change states and, if he or she continues, whether to terminate the crime type or not. Rather than merely fitting K, the arrest-process distribution, the model shows how it is composed of more fundamental behav- ioral parameters. Thus, the model over- comes the objection to the lack of individ- ual control over the career while retaining the simplicity of a renewal proc- ess. 2. The model introduces important be- havioral effects in a tractable way and allows for the generality provided by phase distributions. It should be noted that the model can be further generalized in several ways. As

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS mentioned before, the single cat parameter can be replacecl by a sequence of param- eters to represent a reinforcement effect. In addition, one can increase the number of states (from "high" and "low". This allows for more complex behavior. De- spite this acicled generality, I have re- tained the two-state moclel with no crimes in the Tow state. This reduces the number of parameters in the model. Cur- rently available ciata sets lack the size or detail needed to estimate a more complex model successfully. As better data sets become available, they can be used to extend the moclel. In fact, if the duration of the Tow state is sufficiently short, the two-state model is little different from a one-state moclel. The two-state model is beneficial when Tow periods are of at least moderate duration. Multiple Crime Types The previous moclel can be generalized to allow for multiple crime types. Such a generalization can be carried out in several ways. In this section I explore several pos- sibilities and arrive at a final version of the model. Throughout this discussion, T de- notes the total number of crime types, which in turn are indexer] by t. Crime-switch MocteZs The simplest approach is to consider crime types as having no influence on the stochastic structure described above. The distributions F and G. as well as the parameters q, a, anti ,(3, are unchangecl. Rather, crime type serves only to label the crimes. This can be clone by assuming T distinct types and introducing a T x T Markov crime-switch-transition matrix C = (cij), where cij is the probability that the offender, having last committed a crime of type i, will next commit a crime of type _i 389 The Markov crime-switch approach is much used in stochastic moclels of the crime process. Moreover, it can be made more general by allowing arrest probabil- ities to clepend on crime type. I do not pursue this approach any further in this paper, however, for two reasons. First, one adds T(T - 1) parameters to the mocle] through C while gaining only a little more explanatory power. Seconcl, I prefer to pursue an alternate approach that enables one to clear better with the age ejects, which are clear in crime data but are not yet incorporated in the model. Competing Risk Model An alternate approach to constructing a multiple-crime-type moclel is to intro- cluce a set of distributions of phase type F.`, 1 ' t ' T. where T is the number of crime types. Suppose that the in~liviclual commits a crime and stays in the high state or that the individual leaves the low state and enters the high state. We neecl to define the time until the next crime occurs. This can be done using a "com- peting risks" formulation. In this formu- lation we imagine random variables X being drawn independently with (listri- bution Fir, 1 ' t ' T. The time until the next crime is given by X = min X`, the crime with the shortest ~ ~ ~ ~-T time to its occurrence. The type of crime is given by the index t, which gives X. The family of phase distributions is well suited to this approach, since if each of the X~ has phase distributions, then X will also have a phase distribution (see be- Tow). This version of the multiple-crime- type model is therefore essentially equiv- alent to the single-crime-type moclel with the exception that the distribution F be- longs to a special subset of the phase distributions, those that arise as mini- mums of other phase distributions.

390 The Final Version of the Modest The competing-risk version of the mul- tiple-crime-type moclel leaves one issue unaddressed. In many criminal justice data sets pertaining to individual offend- inp;, there is a pronounced ape effect. CRIMINAL CAREERS AND CAREER CRIMINALS ~ an arrest probability for crime type t, qt. 1 C to T. · a state-switching probability cY, · a crime type t termination probabil- ity, IBM, 1 c t c T. and · a phase-type distribution G. denoting the length of the low period. ~ , ~ ~ Individuals seem to have high crime commission rates as older juveniles or Note that ~ and G could be allowed to be young aclults, and those rates sharply di- crime-type dependent, as well. minish at older ages. None of the models An intuitive description of the criminal presented thus far addresses this issue. career is as follows. The incliviclual be Indeed, a renewal-process moclel of crimes or arrests would not allow for such an age effect. Fortunately, there appears to be a straightforward way to introduce such ef fects, and this approach is supported somewhat by empirical evidence. Studies by Peterson ancI Braiker (1981) indicate that for a single crime type, there is no age effect. Rather, an individual commits a particular crime type in a time-stationary way (although in a clustered fashion, like that given by the two-state models, then at some time point in the career the inclivid ual essentially stops committing that crime type altogether. This process goes on inclepenclently for the T crime types (although some offenders specialize in a subset of these types). The incliviclual has a set of active crime types. The time until the next crime (when the individual is in the high state) is taken to be the minimum of the times for the active crime types. As time progresses, the portion of the of fencler's career involving crime type t will end, and so the set of active crime types decreases in size. As more crime types are eliminated, the time between crimes will naturally increase. This will, in turn, result in an age effect. The multiple crime type can be sum marized. It consists of the following: · a set of phase-type distributions, Fir, 1 _ t_ T. ~ an initial set of active crime types A c 11,. .., hi, gins his or her career with a set of active crime types A. Each crime type has a , 1 _ distribution associated with it. Let X, 1 c t ' T represent random variables, where X~ has distribution Fir. The time until the next crime is given by mind, and the type of that crime is given by the index associated with the minimum. When a crime of type t is committed, with proba- bility ,8~, that crime type is removed from the active set A. With probability 1 - Al, this crime type is kept in A. With proba- bflity qua the incliviclual is arrested, and with probability cr the indiviclual switches to the low state. After a period having distribution G. he moves back to the high state. The process continues in the same fashion, except that over time the active set A will be re(luce(1 in size. When A becomes empty, the career is encled. As A becomes smaller, the times between crimes (ancI hence arrests) will increase. This wfl] produce an age effect. (One could allow A to increase in size, i.e., new crime types to be addled. I do not consider such a possibility in this paper.) In the next section, I discuss the hier- archical version of the model en cl the addition of covariates. It is clear that, at any point in time, the decision to termi- nate a crime type, to drop the low state, to adjust the arrest probability, and so on will be influenced by covariates, such as drug use or employment status. The basic model clescribed above can be enhanced to allow such consicLerations.

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS Hierarchical Versions of the New Mode! The moclel developed above does not build in any population heterogeneity. It is important to include this, and it can be clone in two ways: 1. by allowing the moclel to depend on covariates ant] 2. by using a hierarchical mode] that assumes that the distributions governing the individual crime and arrest processes contain random parameters. Both approaches will be used here. It will be assumed that each of the parame- ters is random and that they are jointly sampled from a joint distribution. More- over, this joint distribution can depend on covariates. A large number of parameters have been introducer! into this moclel of the criminal career. These include Ft. qt. kit, 1 c t c T. cat, G. and A. The distributions Ft and G are of phase type and so have parametric representations (y, Red and (5, S). One can consider drawing (y, Rip), (3, S), qt. pt. a, and A jointly from some superpopulation. This would allow for an arbitrarily large amount of depenclency among the parameters. For example, the distributions associated with the times between burglary and robbery might be highly positively correlated. Neverthe- less, given the ciata sets currently avail- able, it seems most reasonable to simplify the moclel as much as possible; it can be expanclec3 when more detailed data sets are available. One simplification would be to consider F and G to have general- izec! gamma distributions. This would re- place Hi, Rib by T sets of parameters Ant, rat, . . ., ray) corresponding to the param- eters of the exponentials that will be con- valved to form the generalized gamma distribution. A similar reduction could be made for G. Indeed, it seems reasonable to reduce even further to a gamma clistri 39] button. One could consider Fin to be a gamma Hi, A[) and G to be gamma (&, cr). The number of parameters would be dra- matically reclucecI. If this family (loes not fit sufficiently, the moclels could be easily expander! to enhance the fit. One will still want to moclel some de- penclencies among the parameters. One will expect the random vectors lays, AT), · · ·, CAT, AT) to be correlated. Indeed, the parameter set A (which is a set denoting the initial crime types) offers many in- triguing possibilities. One can create spe- ciaTist offenders or generalists or both. Such offenders may have in their initial set A only property crimes, only violent crimes, or some offense mixture. More- over, the initial active set may well be correlatecl with the parameters that gov- ern Fir. The hierarchical approach thus offers a natural way to buiTcl in clependen- cies among the fundamental parameters while still retaining a relatively simpli- fied incliviclual career structure. As more ciata are gathered, these dependencies can be explored more fully. It is useful to include covariates in the superpopulation distribution. Two broac] classes of covariates should be clistin- guished. First are the historical covari- ates, i.e., those that are fixed at the start of the career and remain unchanged! throughout the career. For adult offencI- ers, these might include variables such as sex, race, juvenile record, and age of first juvenile offense. A second class of covariates are "(lynamic," i.e., variables that can change with time. These might include employment status, (lrug use, and arrest recorcl. It is very desirable to in- clucle both classes of covariates in the model; however, some care is required. Uncler the current formulation, the ran- dom parameters are selected indepen- clently by each inclivirlual once at the start of the career and are then fixed forever. Consequently, the group of historical covariates could influence the choice of

392 those parameters. The group of dynamic covariates, however, cannot be used in this fashion. This second group can be entered only by modifying the stochastic structure of the moclel. It seems that Fir, G. A, cat, and ,B~ are the most important parameters to allow to have a clepenclence on covariates. Fur- thermore, A is chosen at the start of the career and can be influenced only by historical covariates. There are surely fac- tors in an indivicluaT's past that influence whether he will ever engage in a partic- ular crime type. It follows that any distri- bution for A shouIc! involve some histor- ical covariates. Marginally, A will be chosen from a probability distribution Ever the c'7hc - ~c ~f T1 ~` ~1~ ~- ~. in,..., T} that de- pends on X, a vector of covariates. The times between crimes of any par- ticular type, the length of time in the Tow period, and the probability of eliminating a particular crime type are much more likely to clepend on dynamic covariates than historical ones. For example, if the inclividual is currently "on drugs," one wouIcI expect frequent crimes ant] shorter Tow periods. If, on the other hand, the individual is employed, one shouIcl ex- pect longer times between crimes. Other covariates may well also be influential. Let us consider a very simple example of how covariates might be includecI in the dynamic moclel.-Suppose Z(a) is a vector of covariates that includes both historical ant! dynamic covariates for an individual of age a. We assume that Fir has a gamma be, At, eZ(aJbi), 1 ' t C T clistri- bution, where Z ancI bi conform, and a is the age at which the current crime was committed. In this mocle] the unknown coefficients bi are the same for each crime type. We assume that a crime of type t, committed by a person of age a, leads to an arrest with probability q~eZ(a)b'. The probability that an individual who com- mits a crime of type t at age a will drop this crime from his active set is given by CRIMINAL CAREERS AND CAREER CRIMINALS ,B~eZ(a~b3. The probability that such an in- diviclual switches to a Tow period is given by cY eZ<a)b4. The Tow-perioc3 duration has a gamma (&, creZ(a)b5) distribution. The pa- rameters Hi, At, A, pt. a, 3, and ~ are random and are drawn jointly from a superpopulation. The vectors bi,..., b5 are unknown and must be estimated from data. It should be noted that the moclel is appropriate when crimes (as opposed to arrests) are observable. In this case the likelihood function can be written and the parameters estimated. Prisoner self- report data are an approximation to such a data set. If, however, one has access only to arrest data for individuals, one cannot construct the likelihood function. In this instance one should start with a different, reduced model. Recall that the times be- tween arrests will still be of phase type; however, rather than computing this in terms of the various parameters, it is sim- plest to moclel it directly as being of phase type with its own set of parameters. This will eliminate ~ and G from the model and cause Fir to be redefined as a time to arrest for crime type t rather than a time to the next crime of a given type.5 The approach to modeling criminal ca- reers in this paper is somewhat related to the work of Flinn and Heckman (1982a, b, 19831. In the context of a criminal career as oppose<1 to a more traclitional labor career, this approach would define the time between crimes (or perhaps ar- rests) in terms of the hazard function. Suppose, for example, that the last crime occurred at time a, -and we want to con- struct the distribution ofthe time until the next crime. Let X represent this inter- event random variable. The hazard func- tion is define to be With only arrest data, the parameters in the full model are not fully identified. This reduction helps to identify the model. The notion of an active crime set and dropout probability is still present

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS in(t) = km P(t'X't+ At) ~ To . Knowledge of the hazarc! function is equivalent to knowledge of the distribu- tion. Flinn and Heckman allow the haz- ard function to have a general parametric form that depencIs on covariates and ran- dom quantities. This form is in(t) = expEZ(t + r),8 + At + 72 to- + V(t + A, where k2 > ki ' 0, Fit + ~ is a vector of covariates corresponding to time ~ + t, l? is a vector of coefficients, and V(t + a) is restricted to being stationary, i.e., V(t + r) = V. This formulation can allow for the ith individual to have a hazard function hi(t) with associated covariates Zi and unobserved variables Vi. In addition, the form can be generalized to a multistate, multispell formulation. There may be some advantage to pa- rameterizing the hazard function rather than the distribution if the covariates are rapidly changing. In the case of criminal careers, the time between crimes is rela- tively small compared with the change in covariates and so there is little gain in modeling the hazard function. Moreover, the class of phase distributions used in this paper is very versatile and capable of mocleling any positive distribution. Nev- ertheless, the two approaches naturally complement each other and perhaps can be successfully combined. ~ 7 7 ~ ~ PARAMETER ESTIMATION This section addresses the problem of parameter estimation for the hierarchical models clevelopect in this paper. The sta- tistical literature on the estimation of hi- erarchical models and the empirical Bayes approach is quite large and rapidly growing. For reviews of this literature, see Deely and LindIey (1981), Dempster, 393 Rubin, and Tsutakawa (1981), Copas (1983), and Morris (19831. Only the basic approach is described here. The Formulation Suppose there are n individual offenc3- ers. Each individual selects a vector of parameters ~ from a distribution parame- terized by an unknown parameter ¢, go. Conditional on the value of 0, the ith individual undertakes a criminal ca- reer, (Xi~i' s ~ 0~. We assume that xi is cleaned in such a way that it is observ- able. We can therefore observe {Xi~i' 1 C i ~ nT, and we seek to estimate pi, 1 c i c n, and ¢, the parameter of the superpop- ulation distribution. The parameter ~ is important, because it characterizes the offender population. The individual di pa- rameters characterize the behavior of the ith inclividual and can be used to help predict future behavior of that individual. If ~ were known, only Xi~i would be useful in estimating di. This leacis to the shrinkage estimators mentioned above. The problem as stated fits comfortably in the empirical Bayes framework. There are a number of approaches to the estima- tion problem. The focus here is on only likelihood-basec3 methods, although other approaches, such as methods of mo- ments, could well be used. Three basic approaches are considerecl: 1. a full Bayesian approach, 2. an empirical Bayes approach, and 3. a simultaneous-likelihood approach. The Full Bayesian Approach The full Bayesian program is straight- forward. One treats ~ as an unknown and hence as having a probability distribu- tion, prior distribution, ¢. The prior joint (distribution of ¢' and di, 1 c i c n is given by f(~)IIin = ~ Egt Gil Al, since 6~, . ... on are conditionally inclepen(1ent given ¢.

394 One must calculate the posterior joint distribution of ~ ant! pi, 1 ' i c n given Xi, 1 ' i ~ n. This calculation may involve significant numerical integration. Once the posterior distribution has been caTcu- latec3, it can be used to estimate any of the parameters or to predict future values of xi, 1 ~ i ~ n. The reacler shouIc] consult Deely en cl LindIey (1981) for details and examples. It seems generally cli~cult to deter- mine fall accurately, say by elicitation. Fortunately, for many criminal justice clata sets, n can be very large. If this is the case, the prior distribution of ¢,f(¢J), wit! have very little influence on the esti- mates. It will be clominatecI by the data. One can, therefore, select a prior distribu- tion that maximizes computational conve- nience, say by picking a conjugate prior distribution if one exists. Far more care is requirec! if n it small. The Empirical Bayes Approach Morris (1983) discusses the empirical Bayes approach and includes many cita- tions for the use of this methoclology. The general approach in this instance is to proceed in two steps. First, one integrates out the conditional distribution of ~ given ¢, to find the conclitional distribution of each xi given ¢. The data Xi, 1 c i ~ n, are then used to estimate ¢. This is typically clone using maximum-likelihoocl estima- tion, although one couIc3 follow Deely and LincTley (1981) in using Bayesian estimation. In aclclition, the method of moments is often very convenient; how- ever, its small sample behavior is un- clear. Notice that all the data are used to estimate ¢,, and this will result in an estimate ¢. It remains to estimate the individual di parameters. This is done using likelihood! methods by assuming d has distribution gym and using xi as data. Again a choice of methods is possi CRIMINAL CAREERS AND CAREER CRIMINALS ble, but the Bayes approach using the posterior distribution of ~ given ¢, and xi is preferable. The Simultaneous-Likelihood Approach A third approach is the simultaneous- likelihood approach. This approach is very unreliable and should be ignored because it can produce inconsistent esti- mators. It entails writing the joint likeli- hooc] of di and Xic. This [unction is then simultaneously maximized over ~ and pi, 1 s i ' n. The method is unreliable, in part because the number of parameters grows with the number of observations. This situation is one in which the maxi- mum-Tikelihoocl method may perform in an undesirable fashion. Such behavior is shown in the examples below. Some Simple Examples The following example is clesigned to illustrate the ideas developed in the pre- vious three sections. The example is based on the simplest model of a criminal career, given in the discussion of the Poisson process above. Assume that each of n individual of- fenders is arrested according to a Poisson process with parameter 6. In addition, the n values of ~ are drawn independently from a gamma (a, ,8) distribution. The shape parameter cr is known, but the scale parameter ,`3 is unknown. The individual Poisson processes are observed over an interval of length L. By sufficiency and the memoryless property, we merely need to consider the total number of arrests over this time interval. We denote this quantity by Xi. Conditional on pi, Xi has a Poisson distribution with mean die. We can calculate the conditional diski- bution of Xi given ,B by integrating out the parameter Bi. This results in

RANDOM PaRAMETER STOCHASTIC-PROCESS MODELS P(Xi = dim) = ~'r`Xi + 1) (p + L') ~ L \xi t~ + LJ ~ Xi = 0, 1, 2, . . . . a negative binomial distribution. Again and L are known. The Fu77 Bayesian A p proach In this approach a prior distribution for ,ll is introduced. The most convenient choice is to introduce the conjugate prior distribution for the negative binomial dis- tribution, the beta distribution. We let ,B/(,B + L) have a beta (a, b) prior distribu- tion. The full hierarchical model then becomes: · ,(3/(,(3 + L) has a beta (a, b) distribu- tion, · (61,---, t3n)|/3 are independent with gamma (cr. ,(3) distribution, · X1, . . ., Xnlfl ,8 are independent and Xi has a Poisson (die) distribution. One now wishes to find the posterior joint distribution off and 61, . . ., en given ail,..., Xn. This can be easily carried out, and we find . p/(h + L)IXl, . . ., Xn has a beta (a + n n(x,b + ~ xi),and i = 1 · 01, ~ ~nll3, X1,..., Xn are condi- tionally independent with gamma (cr + Xi, ,B + L) distribution. One can now construct Bayes estimates ofthe parameters. This requires the intro- duction of a loss function. For simplicity, consider nonsimultaneous estimation of the parameters based on the conditional mean. This would result in 395 _ L(a + na) - n X ~ b + ~ xi is d i- n 11 a + no + b + ~ xi/ i = 1 For large n, these estimates are given approximately by A _ ,8 = Logs, _ (a + Xi)(X) . _ L(a + x) Note that the estimate of pi involves all the data. In addition, for large n, the choice of prior parameters (a, b) becomes immaterial. Empirical B ayes Approach The first step of this approach is to estimate ,B after integrating out 0. We treat X1, . ., Xn~,B as being i.i.d. with negative binomial distribution. The maximum likelihood estimate is ,8 = cxL/x, the same as the limiting version of the Bayes esti- mate off. To find the estimate of each ~i, we treat the following problem: · di has gamma (<x, c~L/x) distribution, and Xi~Bi has Poisson (6iL) distribution. One can then find the posterior distri- butionof~i~Xitobeagamma(a + Xi,L + aLIX) distribution. The Bayes estimator would then be given by ^ ( c} + xi)x L(a + x) which is again identical to the limiting form of the Bayes estimate.

396 The Simultaneous-Like7tihood Approach This method involves writing a simul taneous likelihood, including that of the Bi and Xi, 1 c i c n. One then maximizes over pi and ,B. The log-likelihood is given by constant + ncrIog,(3 + Ad, Act + Xi-1) The likelihood equations are and give and ~ di = n a/,6. These can be simultaneously solved to ~ aL is=_ x - 1 c~+xi- 1 ~ _ vi - ,i] + L This estimate of '3 can be negative and even if positive is inconsistent. The point of this example is to give a simple illus- tration of a situation in which the method gives an unreasonable estimate. In other cases the method of maximum likelihood may not even provide an answer because infinite likelihood can be generated at some boundary of the parameter space. In summary, either the full Bayes or empirical Bayes method should be used, if possible. The simultaneous-likelihooc] method should be avoided. CRIMINAL CAREERS AND CAREER CRIMINALS several properties of this class. These distributions have recently been redis coverec3 and extensively developed by Neuts (19811. The reader should consult this text for a complete treatment. Only continuous time phase distribu tions are considered here. These distribu tions arise naturally in the context of con tinuous time Markov chains. A phase distribution arises as the amount oftime it ~ - L ~takes such a Markov chain to first reach a °g i (0 + )~ i designated state in its state space. Con sider a continuous time Markov chain with state space {1, 2, . . ., m + 1) and _ ~ + Xi-1 infinitesimal generator: pi Q = (qij) with qii < 0, qij 2 0 if i if j, ~ qij = 0, j and qm+l,i = 0,i'm + 1. For the given assumptions about the qijs, it follows that state m + 1 is an absorbing state. For any other state i, the chain is held in the state for an exponen- tial period of time win mean 1/~-qii). One must also introduce an initial distri- bution, p = (Pi, . . ., Pm+l) The chain is started in a state selected at random from the distribution p. Once the initial state is selected, the chain evolves according to Q. Eventually, the chain will reach state m + 1, and this is called the hitting time of state m + 1. This hitting time has a phase distribution with representation (p, Qua. Given this description, one can see that the (m + 1) x (m + 1) matrix Q has a block form given by Q = (0 0 is' PHASE DISTRIBUTIONS where QOismx m. One problem with any particular phase distribution may be that the (p, Q0) rep This section is intended to introduce the class of phase distributions and to list

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS resentation is not unique. This point is addressed in Neuts ( 19811. The following facts about phase distri- butions are useful. 1. A phase distribution puts mass Pm + ~ on O and has density pOexp(xQo)Q~ on (O,oo), where p = (Po, Pm + i) 2. The Laplace-Stieltes transform of the distribution is given by (s) = Pm+i + P(Si QO) Qua for Re(s) - 0. 3. The nth moment of the distribution . . IS given Dy an = ( - 1)n n! (pQO-nem), 397 phase distributions given by F and G. Then the distribution Hi(t) = F(t)G(t) corresponding to max (X, Y) and Hz(t) = 1 - t1-F(t)~1 - G(t)] corresponding to min (X, Y) are both of phase type (see Neuts, 1981:60). This property is useful for constructing a competing-risk mode! of the times between crimes for multiple crime types, as was used above. The class of phase distributions is very large and explicitly contains a number of important parametric families. In particu- lar, the exponential, gamma, and general- ized gamma distributions are of phase type. They can be obtained by setting ~i,i + ~ = -~ii and pl = 1. The clistribu where em = (1, . . ., 1) and is 1 x m. tion is then a sum of m exponential ran clom variables with possibly different pa rameter values. The hyperexponential can be obtained by setting hit+ ~ = - ~ii' and more complex mixtures can be ob tainecI similarly. The class of phase ctistri butions can be used to approximate any nonnegative continuous distribution. A construction is given by Kelly (1979). In cleec] the class of generalized gamma clen sities alone is dense in the family of nonnegative continuous distributions. This result is useful, since the class is smaller and easier to hancIle than others. Finally, note that the class of phase distributions is icleal for stochastic mod- eling. If one models some time (such as a recidivism time) as having a phase clis- tribution, by augmenting the state space with a single variable (which denotes the current phase) the model will retain a Markov structure, if it had one originally. This allows one to stay within a tractable family of models, while introducing the flexibility of being able to approximate any nonnegative probability distribution. 4. Suppose F and G are phase clistribu- tions with orders m ant! n and represen- tations (p, Qo) and (r, S), respectively. The convolution F *G is also a phase distribu- tion with representation I (Qo Q~R~ 1(P,Pm+1r) lo S where Q~R~ is the m x n matrix with elements Qijrj, for 1-c i c m and 1 c j c n. 5. If one considers a renewal process with phase distribution F governing the times between events, the equilibrium forward and backward! recurrence time distributions are also phase distributions with modified initial vector.6 This prop- erty is useful for correcting biases in ciata sets in which sampling is not random but rather is length biased (see next section). 6. The family of phase distributions is also closed under the operations of"max- imum" and "minimum." Suppose X and Y are independent random variables with 6See Neuts (1981:52) for the exact representa- tions and p. 63 for a discussion of special properties of renewal processes governed by phase distribu- tions. CORRECTING BIASES IN SAMPLES This section acl(lresses the problem of biases in data sets that arise from

398 nonranclom sampling from the offender population.7 The hierarchical-modeT ap- proach can be used to understand quan- titatively the nature of the bias and there- fore to correct for it. Several specific situations are consiclerec] in this section: 1. biases arising from restricting atten- tion to offenders with at least one offense ~ . . ,, in a wins low perloc i, 2. biases that arise in self-report data in which the sample is restricted to a prison population, and 3. biases that arise from estimating an incTiviclual crime rate from an individual record of a person who is caught in the midst of a period of high activity, so the estimates are biased upward. Window Arrest Data Sets CRIMINAL CAREERS AND CAREER CRIMINALS ration. The sampling plan thus is biased in favor of offenders with higher crime anct arrest rates. If no adjustment is macle, we will generate overestimates of crime rates, arrest rates, and the parameters of the superpopulation. It is, however, straightforward to correct the likelihood function to account for the bias in the win(low-arrest sampling proceclure. We begin by calculating the likelihood of a criterion arrest in It + b]. Let us take, first, the standard renewal theoretic case, where F = G. Define putt = P Renewal event occurs in (t,t+ b)],t' O. The function puts gives the probability an offender has a criterion arrest in the spec- ifiecl window It + b]. By conditioning on the time of the first event, we can write an integral equation for p~t),8 Consider a general, clelayed-renewal process with initial distribution G and general distribution F. Consequently, starting at time 0, the first event in the process occurs according to the clistribu tion G. while all subsequent interevent distributions occur according to F. In the setting of a hierarchical moclel, we allow F and G to clepenc] on parameters, and these parameters have some distribution given by calf H and density h. ptt) = F(t + ~ - F(t) Suppose we select an inctiviclual at ran dom from among the population of indi vicluals who have an arrest in ft,t + hi. That is, we restrict our sampling to indi viduals having this "window arrest" property. An offender satisfying this win dow-arrest criterion will typically have more arrests than an individual randomly selected from the general offender popu 7Professor A. Goldberger has pointed out to me that there is an extensive literature on correcting biases in samples in the educational psychology, economics, and evaluation research literature. This is treated under the rubric of selectivity bias, nonequivalent groups, and quasi-experiments. None of these, however, addresses the stochastic process aspects dealt with in this paper. p(t) = F(t + ~-F(t) J+ p(t- x~dF(x). n This equation can be solved (see Karlin and Taylor, 1975:184) to find + J [F(s + b) - F(s)]dM(s), o where M(t) = IFS n)(t) is the renewal function. The quantity p(t) is thus deter- mined completely by the cdf F. The expression is somewhat difficult to interpret, because there are two vari- ables, t and 6, in addition to F. Some insight can be gained by considering the behavior of p(t) for large t. As t ~ or, one can apply the key renewal theorem to find p(t) >p,where . 8Note that pit) is also a function of ~ but that it is ignored in the notation.

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS 399 J~ EF(t + B)-F(t)ldt Conditioning on the time ofthe first event ~ = .r t any gives m and , _ , m= | xdF(x) = | [1 - F(x)]dx Jo Jo is the mean time between events. p(t) = F(t + b)-F(t) Some simple algebra allows us to com- pute J6 Ll - F(t)] P= - dt. 0 m The integrand is itself a density func- tion. It represents the equilibrium back- ward or forward recurrence time distribu- tions associated with F. The factor p, when treated as a function of 3, is a cdf. It begins at O and increases monotonically to 1 as ~ increases to or. For very small values of 6, p is approximately given by Am, while for large values it is nearly 1. This is quite reasonable, since as the window size ~ is increased, more and more individuals in the population can be included, and the window effect is re- duced. We are interested in the behavior of p(t) for all values of t, not just the asymp- totic behavior. The reason that p(t) varies with t is that we have an initial condition, namely, that an event occurs at time 0. It takes some time for the effect of this condition to wear off and for equilibrium to be approached. If we begin with the renewal process in equilibrium, then p(t) will no longer depend on t. We can also achieve equilibrium by using a delayed renewal process formula tion with G,(t)= 1 F(t) There are m two relevant integral equations. Let pD(t) be the probability of a criterion arrest in the window Et,t + b] for the (F,G) delayed formulation, while p(t) is the same quan- tity for the standard (F,F) formulation. pD(t) = G(t + &) - G(t) rt Jo p(t - x)dG(x), + J p(t- x)dF(x). o The second equation was solved earlier, and the resulting p(t) can be substituted into the first to find pD(t). We take G'(t) = t1 - F(t)llm and do extensive algebra to find L1 - F(u)ldu PD(t) = t ~ O. m which is independent of t. The expression for pD(t) can be used as a correction factor for the likelihood func- tion. A given individual will have a crim- inal record that provides an enumeration of arrests that occurred prior to the win- dow Et,t + hi, as well as those that oc- curred within the window. There will, of course, be at least one arrest within the window. The likelihood function will be constructed by multiplying the densities for the observed inter-event times; how- ever, it must be modified to account for the presence of at least one event in Et,t + tel. This entails a division ofthe likelihood by pD(t). We can consider the effect of this factor pD(t) on the posterior distribution of the parameters of the superpopulation. The posterior distribution will be propor- tioned to h(~)LlpD(t), where he) is the prior density of the superpopulation and L represents the likelihood function. An informative special case occurs when ~ is small so that pD(t) is approxi

400 mately 8/m. The posterior distribution of ~ is proportional to h/~)mLIb or h(~)mL. The extra factor m accounts for the sam- pling bias and weights the distribution more in favor of larger values of 0. An example will help to illustrate the utility of this calculation. Suppose the arrest process is Poisson with parameter A, and A is treated as a random variable with distribution h. If we restrict atten- tion to individuals with an arrest in Et,t + b] for small &, the posterior distribution of A when corrected for this sampling plan will contain an extra factor of 1/A (since m = 1/A). This will tend to reduce the weight on large A and counteracts the artificially inflated likelihood. For exam- ple, suppose the prior were to have a gamma (cr,,`3) distribution. The posterior would be corrected to a gamma (or - 1,,B) distribution and then used with the like- lihood function, which has been inflated by the required window arrests. This cor- rection is closely related to the length- biased sampling phenomenon of renewal theory. This posterior representation al- lows one to correct for the biases intro- duced by using only individuals with an arrest in the particular time window. As ~ increases, the size of the biasing effect is reduced, and, assuming an equi- librium formulation is measured by {r0 F1 -F(x)ldxT/m for any b. For large &, this factor is near 1. Biases in Samples of Prisoners Two other biases can arise in sampling and analyzing criminal justice data sets involving prisoners. First, individuals are generally sentenced to prison as a result of a high frequency of offenses. Individu- als with a high observed offense rate are much more likely to be imprisoned than comparable individuals with a lesser ob- served offense rate. Since individuals with high propensities to commit crimes will in general have high empirical of CRIMINAL CAREERS AND CAREER CRIMINALS Sense rates, this group can be expected to be overrepresented in prison popula- tions. Data drawn from prisoners are, therefore, not representative of the of- fender population. A hierarchical model can, however, help to understand and correct for this bias. A second issue concerns the stochastic nature of the crime process. Imagine two individuals with the same crime-com- mitting propensity but different sample paths. The individual with the higher empirical frequency of offenses is more likely to be caught and sentenced. One may infer a higher crime rate for this individual than is actually appropriate, since individuals tend to be caught after a spurt of activity. This second type of bias has been the subject of a recent lively debate. The controversy has been fueled by a paper of Maltz and PolIack (1980), which chal- lenges the results of Murray and Cox (19791. The controversy centers on the evaluation of certain treatment programs for juveniles. It was noted empirically that juveniles selected for certain treat- ment programs exhibited a steep rise in the rate of police contact per unit time before admission. Surprisingly, these ju- veniles then exhibited a substantially di- minished contact rate after admission to the program. The strong drop in contact rate after admission to the program has been called a "suppression effect" and was attributed by Murray and Cox solely to the success of the program. This positive interpretation has been challenged by Maltz and PolIack (19801. They argue that the results could have been an artifact of a decision rule used by judges. Specifically, Maltz and PolIack assume that all individuals have the same value of A. They posit a selection rule whereby an individual is placed in a treatment program at a time t provided he experiences a contact at t and has at least k other contacts in the last ~ time

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS units. At the time an individual is placed in a program, he will exhibit a contact- rate significantly higher than A (of course, this clepencis on A and k). If ~ is taken to be random with some appropriate clistri- bution, the theoretical contact-rate curve matches the data very well. This is done under the assumption of a common A. The judicial decision rule produces the effect, not the treatment program. Once in~livid- uals are placed in the program, the rate returns to its normal, Tower value. The impact of the work of MaTtz and PolIack was reducer! by Tierney (1983), who pointed out an error in their analysis. They had, in fact, not correctly calculated the theoretical contact rate prior to assign- ment to the program. No firm conclusions have been reached by any of the authors. A recent paper by PolIack and Farrell (19~34) added some limited insight into the analysis but slid not help to interpret these data. It is very reasonable to assume in both a juvenile ant! an aclult context that sen- tencing is based on the type of crime committed, the number of crimes com- mittecI, and the recent crime-committing behavior. If an incliviclual has committed three crimes, he might or might not be sentenced to prison. If the three crimes were bunched near each other, commit- ment to prison is much more likely than if the previous crimes were committed over a long periocI. The decision rule articu- lated by Maltz and PolIack (1980) is quite reasonable. However, Maltz and PolIack and Tierney shouIc3 have pair! much closer attention than they did to the time at which the crimes were committed. Suppose we assume an inclividual begins the crime process at age 12 ancl is sen- tencec3 according to the Maltz and PolIack rule for some values of ~ and k. For a given value of A, we can compute the distribution of the time (age) T at which the incliviclual will first be sentenced. Clearly a large value of A tends to result in 407 small T. since the offender commits crimes at a high rate. Conversely, if we observe T and attempt to infer A, a large T tends to be associated with a small A. There is information about A in T.; how- ever, MaTtz and PolIack and Tierney ig- nore this information by putting all indi- viduals on a common time scale with time O representing the time of admission to the treatment program regardless of the actual age of the indiviclual. The inclusion of the time random vari- able can help to correct biases. Let us assume an individual begins a contact process at time 0. This might be age 12 for juveniles or age 18 for aclults. Assume that contacts occur according to a Poisson process. We imagine that sentencing oc- curs at the time of the first contact having the property that there are also k other contacts within ~ units of time. We now compute the density function associated with the time of the sentencing event. We assume the Poisson process has parameter c. This refers to the contact process in the juvenile context or the parameter of the exponential times be- tween convictions in an adult case. We wish to make inferences about c. For clarity, we adopt a simple assump- tion concerning the sentencing rule. We assume that sentencing ~ never occurs on the first contact, · occurs on the second contact only if the first was within the prior ~ time units, and ~ always occurs on the third contact, if not before. This rule is reasonable, is in the spirit of Waltz and PolIack, and can be ex- tendec] to more complex versions. Unlike the problems encountered in PolIack and Farrell (1984), calculations can be carried out for more complicated versions of this rule. First, we assume that individuals have drawn their rate parameter c at ran- (lom from a parent distribution htc). We

402 focus on the group of individuals (in ei- ther the juvenile or adult context) who receive a first sentence ant] note the value of T for each. Notice that our data set is restricted! to those who receive a sen- tence. This means that we will have a much greater likelihood of selecting a high c individual than a Tow c incliviclual. Interestingly, the presence ofthe value of T will have a modifying influence. If T is small, the value of c is even more likely to be large, since the offender was sen- tencec3 at the beginning of the career. If T is moderate to large, we should find that c is only slightly elevatecI. The individual was sentenced, but it took a Tong time to meet the criteria. If T is large, c should be small, since only Tow-rate offenders can avoid sentencing for a prolonged time period under this sentencing rule. We can quantify these heuristic comments by computing the density of T given C, fT(t). There are two cases to consider. In the first case t C 7, sentencing would occur on the second offense in (0,;~. The time to the second offense has a gamma (2,c) density, so the density has the form c2te-ct 0 < t < ~ After A, sentencing can occur in two ways. First, if To is the time ofthe first arrest and T2 is the time of the second arrest, incarceration will occur if T2 - To c a, i.e., if the individual falls into the window. Here T2 = t. Second, if T2 - To > a, the individual does not fall in the Winslow, and sentencing will occur on the third arrest. Suppose that arrests form a renewal process with interevent clensityf fin this example, we assume f is exponential Ices. For t ~ 7, the density of the time T at which the contact or conviction leacling to sentencing occurs has density fT(t)= I Upset- sills+ J fist o t-~ Jt flu s) fit - U)]U US. s + ~ CRIMINAL CAREERS AND CAREER CRIMINALS Fords) = ce-Cs, we find i C2 Te-Ct + 3 ~ t - T) - _ One can see that the heuristics men- tioned earlier do indeed hoist. For exam- ple, if c has a prior gamma (a, ,B) (listribu- tion, the posterior distribution of c after observing T - t would be a gamma (cr + 2, ,8 + t) distribution if t ' r. If t ~ a, c has a posterior distribution given by a mix- ture of gamma distributions, specifically with probability p = ~/~; + Act + 21(t - ;12/~2(,l3 + tall, it is gamma (a + 2, j3 + t), and with complementary probability 1 - p it is gamma (a + 3, ,l3 + t). The posterior mean, E(c~T = t), is given by a+ 2 E(c~T = t) = , t ' ~ /a + 2 /a + 3 P tp + t + (1 - P) 1,~ + t , t > a. The conditional mean is thus larger than E(c) = JIB for small T = t, but smaller for large t. This shows that the individuals should not be placed on a common time scale but analyzed separately using this hierarchical approach. One can update the prior distribution on c, is, and esti- mate all the individual c's. These can then be compared with the empirical ar- rest records subsequent to intervention to gain some insight into program effective- ness. SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH This paper has introduced two innova- tions to the quantitative modeling of criminal justice problems: a general structure of hierarchical models and a new stochastic model of a criminal career. These models allow one to distinguish

RANDOM PARAMETER STOCHASTIC-PROCESS MODELS variation between individual offenders and variations within an individual ca- reer. This is important given the very large variability in the o~encler popula- tion. In addition, the hierarchical ap- proach allows one to correct for natural biases in a data set. Biases can arise be- cause the sampling is not at random from the offender population but rather is con- ditional on some event. For example, one might consider a set of arrestees or pris- oners. This group tends to contain higher rate offenders than wouIc! be seen in the general population. The new stochastic model of a criminal career offers three advantages over the standard renewal- process models in common use. First, it introduces a two-state approach in which there are periods of high and Tow activity. Second, the "active crime set" approach results in a natural age eEect in which an o~ender's average crime-commission rate diminishes over time. Third, the mode] allows for some decision making on the part of the o~encler. There are a number of ways in which one could consider extending the sto- chastic-process moclel. The major need is to include imprisonment and behavioral changes that arise from imprisonment. Indeed, this author wouIc] like to include explicitly a parameter or parameters that allow for a change in the Fir and active- crime-set distributions depending on the fact of and length of sentencing. It appears that the next step is to fit this new class of models with data, after first correcting natural biases in the data. This should enable us to leam about the ex- planatory power in this class of models ant! to determine to what class of phase distributions the interevent (crime or ar- rest) times can be restricted. Data are also needled to begin to determine an appro- priate class of superpopulation distribu- tions, not only for the parameters that determine the phase distributions but also for the other behavioral parameters and the initial active-crime set. Finally, 403 the possibility of combining the hazarc3- function approach of Flinn and Heckman with the phase-dis~ibution approach given in this paper should be explored. REFERENCES AND BIBLIOGRAPHY Avi-Itzhak, B., and Shinnar, R. 1973 Quantitative models in crime control. Jour- nal of Criminal Justice 1 :18~217. Barton, R. R., and Turnbull, B. W. 1981 A failure rate regression model for the study of recidivism. Pp. 81-101 in I. A. Fox, ea., Models in Quantitative Criminology. New York: Academic Press. Chaiken, J. M., and Rolph, J. E. 1980 Selective incapacitation strategies based on estimated crime rates. Operations Research 28: 1259-1274. Copas, J. B. 1983 Regression, prediction and shrinkage. Jour- nal of the Royal Statistical Society B 45:311-354. Deely, J. J., and Lindley, D. V. 1981 Bayes, empirical Bayes.Journal oftheAmer- ican Statistical Association 76: 833~41. Dempster, A. P., Rubin, D. B., and Tsutakawa, R. K. 1981 Estimation in covariance components mod- els. Journal of the American Statistical As- sociation 76:341-353. Flinn, C. J., and Heckman, J. J. 1982a Models for the analysis of labor force dynam- ics. Advances in Econometrics 1:3~95. 1982b New methods for analyzing individual event histories. Pp. 9~140 in S. Leinhardt, ea., Sociological Methodology. San Francisco: Jossey-Bass. 1983 The Likelihood Function for the Multistate- multiepisode Model in "Models for the Analysis of Labor Force Dynamics." Discus- sion Paper Series 83-10, Economic Research Center. Chicago, Ill.: National Opinion Re- search Center. Harris, C. M., and Moitra, S. D. 1979 Improved statistical techniques for the mea- surement of recidivism.Journal of Research in Crime and Delinquency 16:194-213. Harris, C. M., Kaylan, A. R., and Maltz, M. D. 1981 Recent Advances in the Statistics of llecidi- vism Measurement. Pp. 61-79 in J. A. Fox, ea., Models in Quantitative Criminology. New York: Academic Press. Harville, D. A. 1977 Maximum likelihood approaches to variance components estimation and to related prob- lems. Journal of the American Statistical Association 72:32(}340.

404 Holden, R. T. 1983 Failure Time Models for Criminal Behav- ior. Department of Sociology, Yale Univer- sity. James, W., and Stein, C. 1961 Estimation with quadratic loss. Pp. 361-379 in Proceedings of the Fourth Berkeley Sym- posium. Sol. 1. Berkeley: University of Cal- ifornia Press. Karlin, S., and Taylor, H. M. 1975 A First Course in Stochastic Processes. 2nd ed. New York: Academic Press. Kelly, F. P. 1979 Reversibility and Stochastic Networks. New York: John Wiley & Sons. Maltz, M. D., and McCleary, R. 1977 The mathematics of behavioral change: re- cidivism and construct validity. Evaluation Quarterly 1:421-438. Maltz, M. D., and Pollack, S. M. 1980 Artificial inflation of a delinquency rate by a selection artifact. Operations Research 28: 547-559. Morris, C. N. 1983 Parametric empirical Bayes inference: the- ory and applications. Journal of the Ameri- can Statistical Association 78:47~5. CRIMINAL CAREERS AND CAREER CRIMINALS Murray, C. A., and Cox, L. A., Jr. 1979 Beyond Probation. Vol. 94. Sage Library of Social Research. Beverly Hills, Cali£: Sage Publications. Neuts, M. F. 1981 Matrix Geometric Solutions in Stochastic Models: An Algorithmic Approach. Balti- more, Md.: Johns Hopkins University Press. Peterson, M. A., and Braiker, H. B., with Polick, S. M. 1981 Who Commits Crimes: A Survey of Prison Inmates. Cambridge, Mass.: Oelgeschlager, Gunn, and Hain. Pollack, S. M., and Farrell, R. L. 1984 Past intensity of a terminated Poisson proc- ess. Operations Research Letters 2:261- 263. Rolph, J. E., Chaiken, J. M., and Houchens, R. E. 1981 Methods for Estimating Crime Rates of Indi- viduals. Report R-2730-NIJ. Santa Monica, Calif.: Rand Corporation. Stollmark, S., and Harris, C. M. 1974 Failure-rate analysis applied to recidivism data. Operations Research 22:1192-1205. Tierney, L. 1983 A selection artifact in delinquency data re- visited. Operations Research 31:852~65.

Criminal Careers and "Career Criminals,": Volume II (1986)

Chapter: 10. Random Parameter Stochastic-Process Models of Criminal Careers

Welcome to OpenBook!

Get Email Updates