Below are the first 10 and last 10 pages of uncorrected machineread text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapterrepresentative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 380
10
Random Parameter Stochastic Process
Models of Criminal Careers
John P. Lehoczky
INTRODUCTION
Background
In the past decade there has been great
growth in the development of quantita
tive methodologies to deal with criminal
justice problems. This has included ex
tensive data gathering and analysis anct
some modeling of offender behavior. As
this data analysis proceeds, one can gain
clearer insights into the nature of offender
behavior and these shouIc! be incorpo
rated into increasingly detailect models.
As the moclels increase in accuracy, one
John P. Lehoczky is professor and head, Depart
ment of Statistics, CamegieMellon University. I
wish to express my thanks to Alfred Blumstein and
Jacqueline Cohen for their major contributions to
the paper. The approaches developed in this paper
are the outgrowth of a long series of discussions
concerning appropriate models for criminal behav
ior and the empirical evidence supporting those
models. In addition, I wish to thank Donald Gaver
for his many discussions concerning hierarchical
models and corrections to biases in criminal justice
data sets. My thanks also to Arthur Gol~lberger, Jan
Chaiken, Chul Woo Ahn, and Mark Schervish for
their many comments on earlier drabs of this paper.
380
can begin to use them as policy tools to
analyze the impact of various approaches
to crime control, such as selective inca
pacitation.
Unfortunately, it seems that the quan
titative moclels of offender behavior that
have been cleveloped to date clo not cap
ture the recent insights about offender
behavior fount] in major ciata analysis
projects, such as the Rand prisoner self
report study. Indeed, the stochastic mod
eling approach began in 1973 with the
work of AviItzhak and Shinnar. This
work, described below, treats individual
offender recidivism as a Poisson process.
A great deal of subsequent modeling has
been clone, but most of the models are
simple extensions of the Poissonprocess
moclel, namely renewalprocess moclels.
This class of models assumes that recidi
vism times are independent and have the
same (listribution. Such moclels may fit
data better than a Poissonprocess model,
but they do not incorporate the current
improver] un~lerstanding of offender be
havior. This paper represents an attempt
to develop~a stochastic moclel that is in
better accord with this understanding.
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
Three major aspects of offender behav
ior have been observed with sufficient
frequency to merit incorporation into an
alytic models:
· Crimecommission propensities
change as a function of age.
· Offender populations are markecIly
heterogeneous.
· Offenders often are thought to com
mit crimes in spurts and then to have
periods with little or no activity.
The age effect is very pronounced. It is
widely recognized that offender behavior
is at its peak during the late teens ant!
early 20s and then cirops significantly clur
ing the Bus. Any stochastic moclel must
address this age effect. Standard renewal
process moclels do not incorporate such
effects, because they assume that the
times between arrests are independent
and identically distributed (i.i.c3.) and
hence stationary. There is need to de
velop moclels that account for age effects
but that simultaneously offer analytic
tractability. Such models will be pre
sented in this paper.
It is evident from recidivism ciata as
well as selfreport data that there is great
heterogeneity in the offender population.
This heterogeneity refers to differences
between offenders, including markedly
differing offense rates, career lengths, and
types of crimes engaged in. This variation
goes beyond differences that can reason
ably be observed from independent rep
lications of a single stochastic process.
Most moclels do not take this heterogene
ity into account. The few exceptions arose
from work at the Ranc! Corporation, in
cluding Chaiken and Rolph (1980) and
Rolph, Chaiken, and Houchens (19811. In
this paper, I argue for the use of hierar
chical moclels to represent heterogeneity.
In such models each indivicluaT's crimi
nal career is regardec! as a stochastic proc
ess governed by parameters. Those pa
rameters are themselves treated as
387
random variables drawn from a parent
distribution (superpopuJiation). The par
ent distribution captures the heterogene
ity of the population of offenders or the
variation between individuals. One
wishes to estimate the parameters of the
parent distribution to gain insight into the
population of offenders. In addition, one
wishes to estimate the rateinfluencing
parameters of inclivicluals to understand
the behavior of each of the offenders.
Hierarchical moclels form the basis of the
analysis in this paper. They are formally
described, applied, and estimated in the
discussions that follow.
There is another aspect to criminal ca
reers that has generally not been incorpo
ratecT into stochastic models. This is the
occurrence of quiescent periods in the
course of the career. Selfreport data re
veal that criminal behavior often occurs
in spurts and is followed by lulIs in activ
ity. This is not surprising if, for example,
the offender was attempting to gain suffi
cient money through a series of crimes
and then, having reached that goal,
stopped for a period. The typical renewal
process models do not incorporate such
behavior. A new class of models that in
clucles this behavior is cleveloped below.
Several other aspects of stochastic mod
eling of criminal careers are clealt with in
this paper. One of the most interesting is
the use of the hierarchical modeling ap
proach to correct for natural biases in data
sets. Generally, criminal justice data sets
JO not provide random samples from the
offender population. Rather, inclivicluals
are part of a sample because they meet a
specific criterion that may be directly or
inclirectly related to their parameter val
ues. For example, one might gather data
on prisoners. This group of offenders is,
however, not representative of the of
fencler population because it typically
consists of inclividuals with high offense
rates, more serious offenses, or longer
careers. Similarly, if one took a sample of
OCR for page 380
382
arrestees in some time period, such a
sample would overrepresent highrate of
fenders, since they have a greater proba
b~lity of falling into such a sample. As
illustrated below, the hierarchical model
ing approach can help to overcome this
problem. It offers the opportunity to cle
velop a correction for nonrandom sam
pling, so that one can make more nearly
correct inferences about the offender
population from inherently biased data
sets.
Overview
This paper introduces a hierarchical
~superpopulation) model for criminal ca
reers within a population of offenders or
potential offenders. There are two levels
to the hierarchy. The top level is used to
explain variation between individuals in
the population, that is, to explain the
heterogeneity of the population. At the
Tow level of the hierarchy, individuals
engage in criminal careers that are Ire ate c!
as independently evolving stochastic pro
cesses governed by certain distributions.
These distributions contain parameters
with values at the top level. The Tow level
thus uses a stochasticprocess model to
help explain differences within careers
governed by the same parameter values.
Covariates can be introcluced at both
levels of the hierarchical model. Covari
ates are of two types: "historical"
covariates, which are fixed at the start of
the career, and "dynamic" covariates,
which can change during the evolution of
the career. For an analysis of aclult of
fending careers, historical covariates
could include juvenile record or the age
at the time of the first juvenile arrest.
Relevant dynamic covariates might in
clude employment status or drug use.
The historical covariates can influence
the choice of parameters for each indivicI
ual at the highest level of the hierarchy.
Since these parameters are selected and
CRIMINAL CAREERS AND CAREER CRIMINALS
fixed at the beginning of the career, cly
namic covariates cannot be used. All
covariates are allowed to influence the
evolution of the career of any particular
offender, the lowest level of the hierar
chy.
A new family of stochastic models is
introcluced in the paper. The models are
characterized by two states, one of which
corresponds to a high rate of crime com
mission and the other of which represents
a low rate of activity (which is taken to be
zero). Parameters are inclucled for the
time spent in each state, stateswitching
probabilities, arrest probabilities, crime
type termination probabilities, and the
times between crimes. For multiple
crime types, a competingrisks formula
tion is used. The models offer tractability,
can include covariates, provide periods of
high and low activity, and introduce some
behavorial parameters.
Methods are also developed to assess
and correct the biases that occur in many
criminal justice data sets. Three specific
issues are adclressecl:
1. If a data set is gathered by taking
indivicluals who were arrester] during a
certain Winslow of time, the data set will
overrepresent individuals with high
crime rates among those at liberty and
underrepresent those who are in prison
for part of the window period.
2. If a data set comes from selfreports
of prisoners, it is not representative of the
population in general, since prisoners
tend to be highrate offenders and to com
mit more violent crimes.
3. Even at the level of the incliviclual
parameters, the inclividual crime or arrest
rates that are estimated from among ar
restees or prisoners will be biased up
ward. This is because indivicluals are
more likely to be caught in a period of
high activity (even if their parameter val
ues may be low) and hence empirically
show a high arrest rate.
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
The hierarchical model developer] in this
paper can help to assess and correct these
biases.
Finally, a class of "phase distributions"
is introduced. This class is very versatile
in that it can approximate arbitrarily
closely the distribution of any nonnega
tive random variable. In addition, the
class is closed uncler a number of opera
tions that are useful for the models pre
sentecI in this paper. The closure proper
ties include convolution and mixtures, as
well as maxima and minima of random
variables drawn from this class.
HIERARCHICAL MODELS
.
This section describes the use of hier
archical stochastic models for studying
criminal careers. There are several rea
sons why this class of models is especially
useful and of great conceptual value.
First, it has frequently been observed that
criminal behavior varies widely between
inclividuals. This is especially true for
crime rates, as measured by selfreport
ciata; some inclividuals report committing
crimes at a very high rate, while others
report they commit crimes rarely. Even
allowing for biases in these data and cle
liberate falsification, it is clear that there
is great variation between individuals. It
is, therefore, appropriate to use a model
that can represent this great variation.
A second benefit of using a hierarchical
model is that it can help to improve pa
rameter estimates for each inclividual.
Suppose one treated each individual in
isolation and attempted to estimate pa
rameter values for each individual using
only his data (for example, arrest record
ant! the values of covariates). One wouIcl
finch that these estimators have a large
variance. With a hierarchical moclel, how
ever, the data for other inclivicluals can be
usecl to help estimate parameter values
for a single individual. This follows be
cause the parameter values for all individ
383
uals are related, since they are modelecI
as coming from a common parent distri
button. This situation has been observed
and exploited with increasing frequency
in statistical studies. This statistical for
mulation leacis to "shrinkage" estimators.
This type of estimator was first intro
cluced by lames and Stein (19611. This
topic has been receiving substantial re
cent attention in the statistics literature
(see the review by Morris, 19831. Meth
ods based on maximum likelihood, em
pirical Bayes, and Bayes procedures have
been developed. These approaches will
be cliscussec3 in the section on parameter
estimation; however, a recent example
provided by Dempster, Rubin, and
Tsutakawa (1981) may help to explain the
benefits of the methodology. These au
thors present several examples, along
with a theoretical treatment of likelihoo
methods. One example deals with esti
mating the firstyear performance of law
school students using several explanatory
variables. For a single law school, the
estimates of regression coefficients are
highly variable. It is possible to improve
the estimates for a single law school
markedly by simultaneously carrying out
the analysis for many (82 in this case) law
schools. One believes there is a reason
able similarity among law schools, and so
the data for other schools are pertinent for
any incliviclual school. The estimates for
one school gain precision by considering
many similar schools simultaneously.
Other examples of this type are cited in
Morris (19831.
The situation is analogous to Model II
analysis of variance or ranclomeffects
moclels. One can distinguish the variation
within a particular career and the varia
tion between careers. Criminal careers
are modelecl using stochastic models.
This is appropriate because any career
has many random elements that control
its evolution. If two in~lividuals have the
same stochastic mechanism (i.e., the
OCR for page 380
384
same parameter values), the two resulting
careers may nevertheless be quite clif
ferent. The offenders have possibly dif
ferent criminal opportunities, possibly
different arrest realizations, possibly dif
ferent sentences, and so on. If an inclivic3
ual were allowed a second realization of
his career, it would cliffer from the first.
This is variation within a career. Variation
between careers arises when individuals
have different stochastic mechanisms (pa
rameter values) governing their careers.
Once the individuals have been linked
through a superpopulation or hierarchical
moclel, the data for all individuals can be
user] in aciclition to the data for a single
indiviclual. The individual parameter es
timates will be drawn toward the average
of the population. This is known as
shrinkage. The amount of shrinkage will
clepend on the size ofthe variation within
careers versus the variation between ca
reers. If there is relatively small variation
between individuals, the shrinkage can
be great. The formal idea of the hierarchi
cal mode] (presented in more detail in the
next section) is that one has a family of
parameters, 0, that controls the evolution
of an individual career, which is denoted
{X e, t0~. The parameters ~ are treated as
random variables with some distribution
Aft, which itself may contain some un
known parameters. One then wishes to
use a clata set to estimate individual
values and or or the unknown parameters
of A.
Given the above formulation, there is a
third advantage to using hierarchical
models. This is the possibility of assess
ing ancI correcting for sampling biases in
a data set. Generally, criminal justice data
sets are not randomly sampled from the
population of offenders. More tYnically~
one generates a (lata set by selecting from
inclivicluals having a particular attribute,
such as an arrest in a certain time period,
or who are in prison at a particular time.
Each of these sampling mechanisms
CRIMINAL CAREERS AND CAREER CRIMINALS
yields a sample that is not random from or.
If one is, therefore, to make inferences
concerning the offender population, one
must assess and correct those biases. The
hierarchical formulation is useful in car
rying out this process, as will be illus
trated below. (Cohort samples can over
come the biasing problem; however, they
generally yield too small a sample of
criminal activity to be of great utility.)
A NEW STOCHASTIC MODEL
This section introduces a new family of
stochastic moclels of the crime process
and arrest process associated with a sin
gle criminal career. These new models
are intended! to encompass more of the
salient aspects of criminal behavior than
has been possible with previous models.
The basic mocle! presented is itself still
oversimplifiecI but shouIcl serve as an
introduction to a set of ideas and tools that
future researchers will find useful. Only
the most tractable versions of this family
of models are presented in detail. This
section is organized in a sequential man
ner. First, some ofthe most familiar, early
stochastic moclels of criminal careers are
summarized, including their good and
bad points. Next, a model is presented
that overcomes some of the objections to
previous moclels. Finally, a class of flex
ible moclels is presented that seems to
improve previous efforts consiclerably. Of
course, further changes are to be antici
patecl as unclerstanding of the un(lerlying
processes increases.
Poisson Crime Processes
A frequently used moclel for the proc
ess of crimes committed by a single incli
viclual is that crimes form a Poisson proc
ess (see Marlin and Taylor, 1975) cluring
the times the individual is not in prison
(see, for example, AviItzhak and Shinnar,
1973; Rolph, Chaiken, ant! Houchens,
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
19811. The times between crimes (after
removing time in prison) are inclepen
clent random variables with an exponen
tial distribution having some mean, say
1/A. Associated with a crime process is an
arrest process. This arrest process is a
thinned version of the crime process in
that only a subset of the crimes results in
arrest) (and only a subset of the arrests
results in imprisonment). A common as
sumption in all work is that an arrest is
determined at random for each crime.
This means that there is an arrest proba
bility q and that a crime event yields an
arrest event with probability q indepen
dent of anything else. With this type of
thinning to construct the arrest process, if
the crime process is Poisson (A), the arrest
process is Poisson (Aq).
This simple mode! ofthe crime process
has several attractive features:
1. The Poisson process is well uncler
stooc] and very tractable. In addition,
given the assumption of random thinning,
both the crime and arrest processes are
Poisson. If the crime process is a renewal
process and one thins it at random, the
arrest process will be approximately Pois
son even if the crime process is not.
2. The Poisson process has a single
parameter, and the statistical inference
for it is well unclerstoocI.
On the other hand, several drawbacks
to the model must be ac3ciressed:
1. The Poisson moclel, as such, floes
not account for population heterogeneity.
2. The Poisson moclel for the arrest
process may not fit recidivism data (see
HoIclen, 1983, for a discussion). That is,
the times between arrests (after eliminat
ing time in prison) are not exponentially
clistribute(l.
3. It has been observed, especially
from prisoner selfreport data, that arrests
lit is assumed there is no problem of"false
arrest."
385
and crimes appear to be more clustered
than would be suggested by a Poisson
moclel. Moreover, this mode! does not
allow for any sort of aging effects. It has
been widely noted that the frequency of
arrests varies with time and at some point
drops to essentially zero, suggesting an
effective enc! to the career.2
4. A major drawback of the simple
Poisson mode! is that the individual ex
erts no control over the career other than
picking A. There are no decision points
built into such models at which, for exam
ple, the incliviclual could clecicle to stop,
or change, A. The events of the past career
do not influence the future. Clearly, one
would suspect that past events can have
an important effect on the future and so
one would like to broaclen the class of
moclels to allow for this.
Some of the previous issues can be
overcome in a straightforward way. For
example, one could introduce a random
lifetime. Each indiviclual has a career
length (often assumed to be exponen
tially clistributecT, to enhance tractability).
When the length is exceeclecI, the incli
vidual no longer engages in crime (and so
presumably is no longer arrested). One
problem with this approach is that the
career length, if determined at the start of
The concept of a finite career length is some
what controversial (see, for example, Holden, 1983:
261. It is argued that there can be no logical point at
which a criminal career can end, except death. Any
former criminal could be presented with an oppor
tunity such that he would again commit a crime.
While one may respect this point of view, it should
be realized that no single stochastic model can be
expected to represent an exact truth. Rather, one
strives to construct models that are approximately
true, that account for important effects, and that offer
a tractable analysis. It may be that some criminals
whose careers are said to have "ended" may, in fact,
commit a few additional crimes. In such a case, one
would expect the frequency of those crimes to be
very low. When coupled with the fact that the arrest
probability is generally very small, one expects that
the arrest processes may be no different.
OCR for page 380
386
the career, is not influenced by any fac
tors in the career. (Overcoming the poor
fit offered by the exponential distribution
is discussed in the next section.)
One can introduce population hetero
geneity in two ways:
1. One can allow A to depend on
covariates, such as juvenile record or age
at first juvenile arrest.3
2. One can allow A to be random (see,
for example, Rolph, Chaiken, and
Houchens, 19811. In this way, A can rep
resent the heterogeneity of a population
of offenders.
RenewalProcess Models
Some improvement in fit can be
achieved by replacing the Poissonpro
cess model for crimes with the more gen
eral renewalprocess model. Recall that a
renewal process is a point process in
which the times between points (crimes)
are independent, identically clistributec3
random variables having a cumulative
distribution function F. which is not nec
essariTy exponential. Arrests are then
commonly consiclered to be a randomly
thinned version of crimes. Arrests will
also form a renewal process with distribu
tion G. One can determine G in terms of
F and q; however, the relation is most
easily expressed in terms of the Laplace
StieltJes transform (Neuts, 1981), LEFTS)
Etexp(sT)], where T represents a ge
neric random time between crimes. We
fins!
=
where (A is the distribution of times be
G~ ~ 1  (l  q)*~F(s)
. ~ . ~ . . .
tween arrests.
Many authors have used renewalpro
3See, for example, Stollmack and Harris (1974),
Barton and Tumbull (1981), and Golden (1983).
CRIMINAL CAREERS AND CAREER CRIMINALS
cess models for the crime process (see, for
example, Holden, 19831. Many (listribu
tions have been used for F. These include
the exponential, Weibull, Tognormal,
gamma, mixtures of various distributions,
and defective distributions. In addition,
logistic regression methods have also
been used. In each case a particular cTis
tribution was used to fit a particular clata
set. One can readily see that no uni
versally appropriate family seems to fit
recidivism data. However, a family of
continuous distributions, callect phase
distributions (Phdistributions), seems
particularly useful for several reasons:
1. The family is dense in the set of all
positive distributions, that is, any positive
distribution can be approximate(1 arbi
trariTy closely by a phase (distribution.
2. Commonly used distributions, such
as the exponential, some gamma, and
their mixtures, are phase distributions
(Iognormal ancI Weibull are not but they
can be closely approximate(l).
3. These distributions have a Markov
ian structure (see Neuts, 1981) and so are
useful in stochastic model buiTcling, be
cause of their tractability.
4. The distributions are closed under
such operations as mixing and convolu
tion.
The renewalprocess approach to mod
eling the crime process helps in that re
ci(livism data can be better fit; however,
the other (li~culties cited earlier remain.
The principal problems are the follow
ing:
1. The general renewalprocess moclel
still assumes indepenclent, identically
(listributecl arrest times and does not offer
the clustering of crimes or arrests usually
reported or observecl.
2. The mocle! does not allow the incli
vi(lual to make (recisions concerning be
havior.
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
3. The moclel does not account for the
great amount of population variability
that has been observed in criminal justice
data sets.
4. The model floes not build in inter
actions between individuals and the
criminal justice system.
A New Class of Models
In this section, I present a new mocle]
designee] to overcome some of the diffi
culties with earlier moclels. The new
moclel makes use of a pair of states be
tween which the individual moves.4
When the individual is in one ofthe states
(the "high" state), he commits crimes at a
high intensity. When in the other state
(the "low" state), crimes are committed at
a Tow intensity (which is taken to be zero).
In acictition, switching between states al
Tows the individual some decision
making latitucle. I begin with a single
crime type, generalize to multiple crime
types, and then develop a hierarchical
formulation.
The following parameters are used in
the model:
T: number of distinct crime types.
initial crime types for an indivi(l
ual, A C {1, 2, . . . , 11.
Fin: the calf of the time between crimes
of type t, 1 ' t ' T.
A: the arrest probability for crime
type t, 1 ' t ' T.
,8~: the probability of terminating
crime type t, 1 ' t ' T.
a: the probability of switching from
the highrate state to the Towrate
state.
PA twostate model of this sort was studied by
Maltz and Pollack (1980~. It is also mentioned in
llolph, Cha~ken, and Houchens (1981:37).
387
G: the calf of the time in the lowrate
state.
Let us definc
· a crime process that is a renewal
process with the time between crimes
being a phase distribution F with mean
1/A;
· an arrest probability q;
· a stateswitching probability a;
· a phasetype distribution G govern
ing the amount of time the indiviclual
spends in the Tow state, during which no
crimes are committed; and
· a probability ,B giving the probability
that the career ends with the start of the
current Tow period.
We can describe the process intu
itively, as follows. A cycle begins with the
individual in the high state. Crimes are
committed according to a renewal proc
ess with distribution F. After each crime,
two issues must be resolvecI:
1. with probability q, the inclividual is
arrested, and
2. with probability car, the individual
switches to the Tow state.
If the inclividual switches to the low
state, he may terminate his career with
probability ,B. With the complementary
probability, he stays in this state for a
period detem~ined by the cumulative dis
tribution function G. While the inclivid
ual is in the low state, no crimes are
committecl.
The arrest, stateswitching, and te~mi
nation probabilities are applied at ran
(lom, that is, without any (lependence on
the realization of the process to that point.
Indeed, it is interesting to consider a
generalized model in which the process
up to that time can influence these tran
sitions. For example, one can introduce a
reinforcement eject. If any individual
commits a crime and is not arrested, that
might provide reinforcement to stay ac
OCR for page 380
388
five. Two unarrested crimes provide fur
ther reinforcement, and so on. One conic]
simply introduce a sequence I~°~n) in
which In represents the probability the
individual offender moves to the Tow
state after n consecutive unarrested
crimes. The sequence could be chosen so
that it strictly decreases to a positive limit.
Any arrest may result in a conviction
and a prison sentence. We are only inter
ested in the behavior during free time;
consequently, the model is applicable
only while the person is not in prison.
This brings up the issue of how the pro
cesses should be initiated at time 0
(which we take to be age 18 for adult
offending) and how they should be re
started after release from prison, if appli
cable. It is mathematically convenient to
keep the process in equilibrium as much
as possible. (The advantage ofthis will be
seen below in the discussion of correc
tions for sampling biases.) This suggests
that we should take the initial time to the
first crime to be given by the equilibrium
forward! recurrence time distribution.
Once the first crime occurs, the renewal
process mocle! begins.
Uncler the assumptions presenter! ear
lier, the crime process is a renewal proc
ess (if prison time is cleletec3~. After each
crime, a coin is flipped, and clepen(ling
on the result, the next crime comes ac
cording to F with probability (1  cz) or
according to F*G with probability cr(1
,l3), or no further crimes occur with prob
ability cog. (Here * denotes convolution,
since the time to the next crime is the sum
of the length of the Tow period and the
times to the first crime in the next high
period.) This gives us a mixture of two
phase distributions, which will also be a
phase distribution, and the resulting dis
tribution is defective in that it allows a
positive probability of the value Go. Let us
refer to this distribution of times between
crimes by H. and let its LaplaceStieTtJes
transform be given by AH. The arrest
CRIMINAL CAREERS AND CAREER CRIMINALS
process is also a terminating renewal
process. This is easily seen, since each
crime has an inclependent arrest proba
bility. The time between arrests is thus a
random sum of independent random vari
ables having distribution H. If we let the
distribution be K, then
Kit ~ 1  ~ ~  q)* ~H(S)
It should be noted that
.
H(O) = P(time between crimes < x'
= 1~h
and
~K(O) = P(time between arrests ~ ok
= .
q*(l ah)
1  (1  q)41  ah)
This model has several attractive fea
tures:
1. The model is, in reality, a terminat
ing renewal process for both crimes and
arrests. In this instance though, the model
explicitly builds in parameters to repre
sent ways in which the individual can
control activities and does so in a realistic
way. After each crime, the individual con
trols whether to continue or change states
and, if he or she continues, whether to
terminate the crime type or not. Rather
than merely fitting K, the arrestprocess
distribution, the model shows how it is
composed of more fundamental behav
ioral parameters. Thus, the model over
comes the objection to the lack of individ
ual control over the career while
retaining the simplicity of a renewal proc
ess.
2. The model introduces important be
havioral effects in a tractable way and
allows for the generality provided by
phase distributions.
It should be noted that the model can
be further generalized in several ways. As
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
mentioned before, the single cat parameter
can be replacecl by a sequence of param
eters to represent a reinforcement effect.
In addition, one can increase the number
of states (from "high" and "low". This
allows for more complex behavior. De
spite this acicled generality, I have re
tained the twostate moclel with no
crimes in the Tow state. This reduces the
number of parameters in the model. Cur
rently available ciata sets lack the size or
detail needed to estimate a more complex
model successfully. As better data sets
become available, they can be used to
extend the moclel. In fact, if the duration
of the Tow state is sufficiently short, the
twostate model is little different from a
onestate moclel. The twostate model is
beneficial when Tow periods are of at least
moderate duration.
Multiple Crime Types
The previous moclel can be generalized
to allow for multiple crime types. Such a
generalization can be carried out in several
ways. In this section I explore several pos
sibilities and arrive at a final version of the
model. Throughout this discussion, T de
notes the total number of crime types,
which in turn are indexer] by t.
Crimeswitch MocteZs
The simplest approach is to consider
crime types as having no influence on the
stochastic structure described above. The
distributions F and G. as well as the
parameters q, a, anti ,(3, are unchangecl.
Rather, crime type serves only to label
the crimes. This can be clone by assuming
T distinct types and introducing a T x T
Markov crimeswitchtransition matrix C
= (cij), where cij is the probability that the
offender, having last committed a crime
of type i, will next commit a crime of type
_i
389
The Markov crimeswitch approach is
much used in stochastic moclels of the
crime process. Moreover, it can be made
more general by allowing arrest probabil
ities to clepend on crime type. I do not
pursue this approach any further in this
paper, however, for two reasons. First,
one adds T(T  1) parameters to the
mocle] through C while gaining only a
little more explanatory power. Seconcl, I
prefer to pursue an alternate approach
that enables one to clear better with the
age ejects, which are clear in crime
data but are not yet incorporated in the
model.
Competing Risk Model
An alternate approach to constructing a
multiplecrimetype moclel is to intro
cluce a set of distributions of phase type
F.`, 1 ' t ' T. where T is the number of
crime types. Suppose that the in~liviclual
commits a crime and stays in the high
state or that the individual leaves the low
state and enters the high state. We neecl
to define the time until the next crime
occurs. This can be done using a "com
peting risks" formulation. In this formu
lation we imagine random variables X
being drawn independently with (listri
bution Fir, 1 ' t ' T. The time until
the next crime is given by X =
min X`, the crime with the shortest
~ ~ ~ ~T
time to its occurrence. The type of crime
is given by the index t, which gives X.
The family of phase distributions is well
suited to this approach, since if each of
the X~ has phase distributions, then X will
also have a phase distribution (see be
Tow). This version of the multiplecrime
type model is therefore essentially equiv
alent to the singlecrimetype moclel with
the exception that the distribution F be
longs to a special subset of the phase
distributions, those that arise as mini
mums of other phase distributions.
OCR for page 380
390
The Final Version of the Modest
The competingrisk version of the mul
tiplecrimetype moclel leaves one issue
unaddressed. In many criminal justice
data sets pertaining to individual offend
inp;, there is a pronounced ape effect.
CRIMINAL CAREERS AND CAREER CRIMINALS
~ an arrest probability for crime type t,
qt. 1 C to T.
· a stateswitching probability cY,
· a crime type t termination probabil
ity, IBM, 1 c t c T. and
· a phasetype distribution G. denoting
the length of the low period.
~ , ~ ~
Individuals seem to have high crime
commission rates as older juveniles or Note that ~ and G could be allowed to be
young aclults, and those rates sharply di crimetype dependent, as well.
minish at older ages. None of the models An intuitive description of the criminal
presented thus far addresses this issue. career is as follows. The incliviclual be
Indeed, a renewalprocess moclel of crimes
or arrests would not allow for such an age
effect. Fortunately, there appears to be a
straightforward way to introduce such ef
fects, and this approach is supported
somewhat by empirical evidence. Studies
by Peterson ancI Braiker (1981) indicate
that for a single crime type, there is no age
effect. Rather, an individual commits a
particular crime type in a timestationary
way (although in a clustered fashion, like
that given by the twostate models, then at
some time point in the career the inclivid
ual essentially stops committing that
crime type altogether. This process goes
on inclepenclently for the T crime types
(although some offenders specialize in a
subset of these types). The incliviclual has
a set of active crime types. The time until
the next crime (when the individual is in
the high state) is taken to be the minimum
of the times for the active crime types. As
time progresses, the portion of the of
fencler's career involving crime type t
will end, and so the set of active crime
types decreases in size. As more crime
types are eliminated, the time between
crimes will naturally increase. This will,
in turn, result in an age effect.
The multiple crime type can be sum
marized. It consists of the following:
· a set of phasetype distributions, Fir,
1 _ t_ T.
~ an initial set of active crime types
A c 11,. .., hi,
gins his or her career with a set of active
crime types A. Each crime type has a
, 1 _
distribution associated with it. Let X, 1 c
t ' T represent random variables, where
X~ has distribution Fir. The time until the
next crime is given by mind, and the
type of that crime is given by the index
associated with the minimum. When a
crime of type t is committed, with proba
bility ,8~, that crime type is removed from
the active set A. With probability 1  Al,
this crime type is kept in A. With proba
bflity qua the incliviclual is arrested, and
with probability cr the indiviclual
switches to the low state. After a period
having distribution G. he moves back to
the high state. The process continues in
the same fashion, except that over time
the active set A will be re(luce(1 in size.
When A becomes empty, the career is
encled. As A becomes smaller, the times
between crimes (ancI hence arrests) will
increase. This wfl] produce an age effect.
(One could allow A to increase in size,
i.e., new crime types to be addled. I do not
consider such a possibility in this paper.)
In the next section, I discuss the hier
archical version of the model en cl the
addition of covariates. It is clear that, at
any point in time, the decision to termi
nate a crime type, to drop the low state, to
adjust the arrest probability, and so on
will be influenced by covariates, such as
drug use or employment status. The basic
model clescribed above can be enhanced
to allow such consicLerations.
OCR for page 380
394
One must calculate the posterior joint
distribution of ~ ant! pi, 1 ' i c n given Xi,
1 ' i ~ n. This calculation may involve
significant numerical integration. Once
the posterior distribution has been caTcu
latec3, it can be used to estimate any of the
parameters or to predict future values of
xi, 1 ~ i ~ n. The reacler shouIc] consult
Deely en cl LindIey (1981) for details and
examples.
It seems generally cli~cult to deter
mine fall accurately, say by elicitation.
Fortunately, for many criminal justice
clata sets, n can be very large. If this is the
case, the prior distribution of ¢,f(¢J), wit!
have very little influence on the esti
mates. It will be clominatecI by the data.
One can, therefore, select a prior distribu
tion that maximizes computational conve
nience, say by picking a conjugate prior
distribution if one exists. Far more care is
requirec! if n it small.
The Empirical Bayes Approach
Morris (1983) discusses the empirical
Bayes approach and includes many cita
tions for the use of this methoclology. The
general approach in this instance is to
proceed in two steps. First, one integrates
out the conditional distribution of ~ given
¢, to find the conclitional distribution of
each xi given ¢. The data Xi, 1 c i ~ n, are
then used to estimate ¢. This is typically
clone using maximumlikelihoocl estima
tion, although one couIc3 follow Deely
and LincTley (1981) in using Bayesian
estimation. In aclclition, the method of
moments is often very convenient; how
ever, its small sample behavior is un
clear. Notice that all the data are used to
estimate ¢,, and this will result in an
estimate ¢. It remains to estimate the
individual di parameters. This is done
using likelihood! methods by assuming d
has distribution gym and using xi as
data. Again a choice of methods is possi
CRIMINAL CAREERS AND CAREER CRIMINALS
ble, but the Bayes approach using the
posterior distribution of ~ given ¢, and xi
is preferable.
The SimultaneousLikelihood
Approach
A third approach is the simultaneous
likelihood approach. This approach is
very unreliable and should be ignored
because it can produce inconsistent esti
mators. It entails writing the joint likeli
hooc] of di and Xic. This [unction is then
simultaneously maximized over ~ and pi,
1 s i ' n. The method is unreliable, in
part because the number of parameters
grows with the number of observations.
This situation is one in which the maxi
mumTikelihoocl method may perform in
an undesirable fashion. Such behavior is
shown in the examples below.
Some Simple Examples
The following example is clesigned to
illustrate the ideas developed in the pre
vious three sections. The example is
based on the simplest model of a criminal
career, given in the discussion of the
Poisson process above.
Assume that each of n individual of
fenders is arrested according to a Poisson
process with parameter 6. In addition, the
n values of ~ are drawn independently
from a gamma (a, ,8) distribution. The
shape parameter cr is known, but the scale
parameter ,`3 is unknown. The individual
Poisson processes are observed over an
interval of length L. By sufficiency and
the memoryless property, we merely need
to consider the total number of arrests over
this time interval. We denote this quantity
by Xi. Conditional on pi, Xi has a Poisson
distribution with mean die.
We can calculate the conditional diski
bution of Xi given ,B by integrating out the
parameter Bi. This results in
OCR for page 380
RANDOM PaRAMETER STOCHASTICPROCESS MODELS
P(Xi = dim) = ~'r`Xi + 1) (p + L')
~ L \xi
t~ + LJ ~ Xi = 0, 1, 2, . . . .
a negative binomial distribution. Again
and L are known.
The Fu77 Bayesian A p proach
In this approach a prior distribution for
,ll is introduced. The most convenient
choice is to introduce the conjugate prior
distribution for the negative binomial dis
tribution, the beta distribution. We let
,B/(,B + L) have a beta (a, b) prior distribu
tion. The full hierarchical model then
becomes:
· ,(3/(,(3 + L) has a beta (a, b) distribu
tion,
· (61,, t3n)/3 are independent with
gamma (cr. ,(3) distribution,
· X1, . . ., Xnlfl ,8 are independent and
Xi has a Poisson (die) distribution.
One now wishes to find the posterior
joint distribution off and 61, . . ., en given
ail,..., Xn. This can be easily carried
out, and we find
.
p/(h + L)IXl, . . ., Xn has a beta (a +
n
n(x,b + ~ xi),and
i = 1
· 01, ~ ~nll3, X1,..., Xn are condi
tionally independent with gamma (cr +
Xi, ,B + L) distribution.
One can now construct Bayes estimates
ofthe parameters. This requires the intro
duction of a loss function. For simplicity,
consider nonsimultaneous estimation of
the parameters based on the conditional
mean. This would result in
395
_ L(a + na)

n
X ~ b + ~ xi is
d i n 11
a + no + b + ~ xi/
i = 1
For large n, these estimates are given
approximately by
A _
,8 = Logs,
_ (a + Xi)(X)
. _
L(a + x)
Note that the estimate of pi involves all
the data. In addition, for large n, the
choice of prior parameters (a, b) becomes
immaterial.
Empirical B ayes Approach
The first step of this approach is to
estimate ,B after integrating out 0. We treat
X1, . ., Xn~,B as being i.i.d. with negative
binomial distribution. The maximum
likelihood estimate is ,8 = cxL/x, the same
as the limiting version of the Bayes esti
mate off.
To find the estimate of each ~i, we treat
the following problem:
· di has gamma (
396
The SimultaneousLike7tihood
Approach
This method involves writing a simul
taneous likelihood, including that of the
Bi and Xi, 1 c i c n. One then maximizes
over pi and ,B. The loglikelihood is given
by
constant + ncrIog,(3 + Ad, Act + Xi1)
The likelihood equations are
and
give
and
~ di = n a/,6.
These can be simultaneously solved to
~ aL
is=_
x  1
c~+xi 1
~ _
vi 
,i] + L
This estimate of '3 can be negative and
even if positive is inconsistent. The point
of this example is to give a simple illus
tration of a situation in which the method
gives an unreasonable estimate. In other
cases the method of maximum likelihood
may not even provide an answer because
infinite likelihood can be generated at
some boundary of the parameter space.
In summary, either the full Bayes or
empirical Bayes method should be used,
if possible. The simultaneouslikelihooc]
method should be avoided.
CRIMINAL CAREERS AND CAREER CRIMINALS
several properties of this class. These
distributions have recently been redis
coverec3 and extensively developed by
Neuts (19811. The reader should consult
this text for a complete treatment.
Only continuous time phase distribu
tions are considered here. These distribu
tions arise naturally in the context of con
tinuous time Markov chains. A phase
distribution arises as the amount oftime it
~  L ~takes such a Markov chain to first reach a
°g i (0 + )~ i designated state in its state space. Con
sider a continuous time Markov chain
with state space {1, 2, . . ., m + 1) and
_ ~ + Xi1 infinitesimal generator:
pi
Q = (qij) with qii < 0, qij 2 0
if i if j, ~ qij = 0,
j
and
qm+l,i = 0,i'm + 1.
For the given assumptions about the
qijs, it follows that state m + 1 is an
absorbing state. For any other state i, the
chain is held in the state for an exponen
tial period of time win mean 1/~qii).
One must also introduce an initial distri
bution, p = (Pi, . . ., Pm+l) The chain is
started in a state selected at random from
the distribution p. Once the initial state is
selected, the chain evolves according to
Q. Eventually, the chain will reach state
m + 1, and this is called the hitting time
of state m + 1. This hitting time has a
phase distribution with representation (p,
Qua. Given this description, one can see
that the (m + 1) x (m + 1) matrix Q has
a block form given by
Q = (0 0 is'
PHASE DISTRIBUTIONS
where QOismx m.
One problem with any particular phase
distribution may be that the (p, Q0) rep
This section is intended to introduce
the class of phase distributions and to list
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
resentation is not unique. This point is
addressed in Neuts ( 19811.
The following facts about phase distri
butions are useful.
1. A phase distribution puts mass Pm + ~
on O and has density pOexp(xQo)Q~ on
(O,oo), where p = (Po, Pm + i)
2. The LaplaceStieltes transform of
the distribution is given by
(s) = Pm+i + P(Si QO) Qua
for Re(s)  0.
3. The nth moment of the distribution
. .
IS given Dy
an = (  1)n n! (pQOnem),
397
phase distributions given by F and G.
Then the distribution Hi(t) = F(t)G(t)
corresponding to max (X, Y) and Hz(t) = 1
 t1F(t)~1  G(t)] corresponding to
min (X, Y) are both of phase type (see
Neuts, 1981:60). This property is useful
for constructing a competingrisk mode!
of the times between crimes for multiple
crime types, as was used above.
The class of phase distributions is very
large and explicitly contains a number of
important parametric families. In particu
lar, the exponential, gamma, and general
ized gamma distributions are of phase
type. They can be obtained by setting
~i,i + ~ = ~ii and pl = 1. The clistribu
where em = (1, . . ., 1) and is 1 x m. tion is then a sum of m exponential ran
clom variables with possibly different pa
rameter values. The hyperexponential
can be obtained by setting hit+ ~ =  ~ii'
and more complex mixtures can be ob
tainecI similarly. The class of phase ctistri
butions can be used to approximate any
nonnegative continuous distribution. A
construction is given by Kelly (1979). In
cleec] the class of generalized gamma clen
sities alone is dense in the family of
nonnegative continuous distributions.
This result is useful, since the class is
smaller and easier to hancIle than others.
Finally, note that the class of phase
distributions is icleal for stochastic mod
eling. If one models some time (such as
a recidivism time) as having a phase clis
tribution, by augmenting the state space
with a single variable (which denotes the
current phase) the model will retain a
Markov structure, if it had one originally.
This allows one to stay within a tractable
family of models, while introducing the
flexibility of being able to approximate
any nonnegative probability distribution.
4. Suppose F and G are phase clistribu
tions with orders m ant! n and represen
tations (p, Qo) and (r, S), respectively. The
convolution F *G is also a phase distribu
tion with representation
I (Qo Q~R~
1(P,Pm+1r) lo S
where Q~R~ is the m x n matrix with
elements Qijrj, for 1c i c m and 1 c j c
n.
5. If one considers a renewal process
with phase distribution F governing the
times between events, the equilibrium
forward and backward! recurrence time
distributions are also phase distributions
with modified initial vector.6 This prop
erty is useful for correcting biases in ciata
sets in which sampling is not random but
rather is length biased (see next section).
6. The family of phase distributions is
also closed under the operations of"max
imum" and "minimum." Suppose X and
Y are independent random variables with
6See Neuts (1981:52) for the exact representa
tions and p. 63 for a discussion of special properties
of renewal processes governed by phase distribu
tions.
CORRECTING BIASES IN SAMPLES
This section acl(lresses the problem of
biases in data sets that arise from
OCR for page 380
398
nonranclom sampling from the offender
population.7 The hierarchicalmodeT ap
proach can be used to understand quan
titatively the nature of the bias and there
fore to correct for it. Several specific
situations are consiclerec] in this section:
1. biases arising from restricting atten
tion to offenders with at least one offense
~ . . ,,
in a wins low perloc i,
2. biases that arise in selfreport data in
which the sample is restricted to a prison
population, and
3. biases that arise from estimating an
incTiviclual crime rate from an individual
record of a person who is caught in the
midst of a period of high activity, so the
estimates are biased upward.
Window Arrest Data Sets
CRIMINAL CAREERS AND CAREER CRIMINALS
ration. The sampling plan thus is biased
in favor of offenders with higher crime
anct arrest rates. If no adjustment is macle,
we will generate overestimates of crime
rates, arrest rates, and the parameters of
the superpopulation. It is, however,
straightforward to correct the likelihood
function to account for the bias in the
win(lowarrest sampling proceclure. We
begin by calculating the likelihood of a
criterion arrest in It + b].
Let us take, first, the standard renewal
theoretic case, where F = G. Define
putt = P Renewal event occurs in
(t,t+ b)],t' O.
The function puts gives the probability an
offender has a criterion arrest in the spec
ifiecl window It + b].
By conditioning on the time of the first
event, we can write an integral equation
for p~t),8
Consider a general, clelayedrenewal
process with initial distribution G and
general distribution F. Consequently,
starting at time 0, the first event in the
process occurs according to the clistribu
tion G. while all subsequent interevent
distributions occur according to F. In the
setting of a hierarchical moclel, we allow
F and G to clepenc] on parameters, and
these parameters have some distribution
given by calf H and density h. ptt) = F(t + ~  F(t)
Suppose we select an inctiviclual at ran
dom from among the population of indi
vicluals who have an arrest in ft,t + hi.
That is, we restrict our sampling to indi
viduals having this "window arrest"
property. An offender satisfying this win
dowarrest criterion will typically have
more arrests than an individual randomly
selected from the general offender popu
7Professor A. Goldberger has pointed out to me
that there is an extensive literature on correcting
biases in samples in the educational psychology,
economics, and evaluation research literature. This
is treated under the rubric of selectivity bias,
nonequivalent groups, and quasiexperiments.
None of these, however, addresses the stochastic
process aspects dealt with in this paper.
p(t) = F(t + ~F(t)
J+ p(t x~dF(x).
n
This equation can be solved (see Karlin
and Taylor, 1975:184) to find
+ J [F(s + b)  F(s)]dM(s),
o
where M(t) = IFS n)(t) is the renewal
function. The quantity p(t) is thus deter
mined completely by the cdf F.
The expression is somewhat difficult to
interpret, because there are two vari
ables, t and 6, in addition to F. Some
insight can be gained by considering the
behavior of p(t) for large t. As t ~ or, one
can apply the key renewal theorem to
find p(t) >p,where
.
8Note that pit) is also a function of ~ but that it is
ignored in the notation.
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
399
J~ EF(t + B)F(t)ldt Conditioning on the time ofthe first event
~ = .r
t any gives
m
and
, _ ,
m=  xdF(x) =  [1  F(x)]dx
Jo Jo
is the mean time between events. p(t) = F(t + b)F(t)
Some simple algebra allows us to com
pute
J6 Ll  F(t)]
P=  dt.
0 m
The integrand is itself a density func
tion. It represents the equilibrium back
ward or forward recurrence time distribu
tions associated with F. The factor p,
when treated as a function of 3, is a cdf. It
begins at O and increases monotonically
to 1 as ~ increases to or. For very small
values of 6, p is approximately given by
Am, while for large values it is nearly
1. This is quite reasonable, since as the
window size ~ is increased, more and more
individuals in the population can be
included, and the window effect is re
duced.
We are interested in the behavior of
p(t) for all values of t, not just the asymp
totic behavior. The reason that p(t) varies
with t is that we have an initial condition,
namely, that an event occurs at time 0. It
takes some time for the effect of this
condition to wear off and for equilibrium
to be approached. If we begin with the
renewal process in equilibrium, then p(t)
will no longer depend on t.
We can also achieve equilibrium by
using a delayed renewal process formula
tion with G,(t)= 1 F(t) There are
m
two relevant integral equations. Let pD(t)
be the probability of a criterion arrest in
the window Et,t + b] for the (F,G) delayed
formulation, while p(t) is the same quan
tity for the standard (F,F) formulation.
pD(t) = G(t + &)  G(t)
rt
Jo
p(t  x)dG(x),
+ J p(t x)dF(x).
o
The second equation was solved earlier,
and the resulting p(t) can be substituted
into the first to find pD(t). We take G'(t) =
t1  F(t)llm and do extensive algebra to
find
L1  F(u)ldu
PD(t) = t ~ O.
m
which is independent of t.
The expression for pD(t) can be used as
a correction factor for the likelihood func
tion. A given individual will have a crim
inal record that provides an enumeration
of arrests that occurred prior to the win
dow Et,t + hi, as well as those that oc
curred within the window. There will, of
course, be at least one arrest within the
window. The likelihood function will be
constructed by multiplying the densities
for the observed interevent times; how
ever, it must be modified to account for
the presence of at least one event in Et,t +
tel. This entails a division ofthe likelihood
by pD(t).
We can consider the effect of this factor
pD(t) on the posterior distribution of the
parameters of the superpopulation. The
posterior distribution will be propor
tioned to h(~)LlpD(t), where he) is the
prior density of the superpopulation and
L represents the likelihood function.
An informative special case occurs
when ~ is small so that pD(t) is approxi
OCR for page 380
400
mately 8/m. The posterior distribution of
~ is proportional to h/~)mLIb or h(~)mL.
The extra factor m accounts for the sam
pling bias and weights the distribution
more in favor of larger values of 0.
An example will help to illustrate the
utility of this calculation. Suppose the
arrest process is Poisson with parameter
A, and A is treated as a random variable
with distribution h. If we restrict atten
tion to individuals with an arrest in Et,t +
b] for small &, the posterior distribution of
A when corrected for this sampling plan
will contain an extra factor of 1/A (since m
= 1/A). This will tend to reduce the
weight on large A and counteracts the
artificially inflated likelihood. For exam
ple, suppose the prior were to have a
gamma (cr,,`3) distribution. The posterior
would be corrected to a gamma (or  1,,B)
distribution and then used with the like
lihood function, which has been inflated
by the required window arrests. This cor
rection is closely related to the length
biased sampling phenomenon of renewal
theory. This posterior representation al
lows one to correct for the biases intro
duced by using only individuals with an
arrest in the particular time window.
As ~ increases, the size of the biasing
effect is reduced, and, assuming an equi
librium formulation is measured by {r0 F1
F(x)ldxT/m for any b. For large &, this
factor is near 1.
Biases in Samples of Prisoners
Two other biases can arise in sampling
and analyzing criminal justice data sets
involving prisoners. First, individuals are
generally sentenced to prison as a result
of a high frequency of offenses. Individu
als with a high observed offense rate are
much more likely to be imprisoned than
comparable individuals with a lesser ob
served offense rate. Since individuals
with high propensities to commit crimes
will in general have high empirical of
CRIMINAL CAREERS AND CAREER CRIMINALS
Sense rates, this group can be expected to
be overrepresented in prison popula
tions. Data drawn from prisoners are,
therefore, not representative of the of
fender population. A hierarchical model
can, however, help to understand and
correct for this bias.
A second issue concerns the stochastic
nature of the crime process. Imagine two
individuals with the same crimecom
mitting propensity but different sample
paths. The individual with the higher
empirical frequency of offenses is more
likely to be caught and sentenced. One
may infer a higher crime rate for this
individual than is actually appropriate,
since individuals tend to be caught after a
spurt of activity.
This second type of bias has been the
subject of a recent lively debate. The
controversy has been fueled by a paper of
Maltz and PolIack (1980), which chal
lenges the results of Murray and Cox
(19791. The controversy centers on the
evaluation of certain treatment programs
for juveniles. It was noted empirically
that juveniles selected for certain treat
ment programs exhibited a steep rise in
the rate of police contact per unit time
before admission. Surprisingly, these ju
veniles then exhibited a substantially di
minished contact rate after admission to
the program. The strong drop in contact
rate after admission to the program has
been called a "suppression effect" and
was attributed by Murray and Cox solely
to the success of the program.
This positive interpretation has been
challenged by Maltz and PolIack (19801.
They argue that the results could have
been an artifact of a decision rule used by
judges. Specifically, Maltz and PolIack
assume that all individuals have the same
value of A. They posit a selection rule
whereby an individual is placed in a
treatment program at a time t provided
he experiences a contact at t and has at
least k other contacts in the last ~ time
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
units. At the time an individual is placed
in a program, he will exhibit a contact
rate significantly higher than A (of course,
this clepencis on A and k). If ~ is taken to
be random with some appropriate clistri
bution, the theoretical contactrate curve
matches the data very well. This is done
under the assumption of a common A. The
judicial decision rule produces the effect,
not the treatment program. Once in~livid
uals are placed in the program, the rate
returns to its normal, Tower value.
The impact of the work of MaTtz and
PolIack was reducer! by Tierney (1983),
who pointed out an error in their analysis.
They had, in fact, not correctly calculated
the theoretical contact rate prior to assign
ment to the program. No firm conclusions
have been reached by any of the authors.
A recent paper by PolIack and Farrell
(19~34) added some limited insight into
the analysis but slid not help to interpret
these data.
It is very reasonable to assume in both
a juvenile ant! an aclult context that sen
tencing is based on the type of crime
committed, the number of crimes com
mittecI, and the recent crimecommitting
behavior. If an incliviclual has committed
three crimes, he might or might not be
sentenced to prison. If the three crimes
were bunched near each other, commit
ment to prison is much more likely than if
the previous crimes were committed over
a long periocI. The decision rule articu
lated by Maltz and PolIack (1980) is quite
reasonable. However, Maltz and PolIack
and Tierney shouIc3 have pair! much
closer attention than they did to the time
at which the crimes were committed.
Suppose we assume an inclividual begins
the crime process at age 12 ancl is sen
tencec3 according to the Maltz and PolIack
rule for some values of ~ and k. For a
given value of A, we can compute the
distribution of the time (age) T at which
the incliviclual will first be sentenced.
Clearly a large value of A tends to result in
407
small T. since the offender commits
crimes at a high rate. Conversely, if we
observe T and attempt to infer A, a large T
tends to be associated with a small A.
There is information about A in T.; how
ever, MaTtz and PolIack and Tierney ig
nore this information by putting all indi
viduals on a common time scale with time
O representing the time of admission to
the treatment program regardless of the
actual age of the indiviclual.
The inclusion of the time random vari
able can help to correct biases. Let us
assume an individual begins a contact
process at time 0. This might be age 12 for
juveniles or age 18 for aclults. Assume that
contacts occur according to a Poisson
process. We imagine that sentencing oc
curs at the time of the first contact having
the property that there are also k other
contacts within ~ units of time. We now
compute the density function associated
with the time of the sentencing event.
We assume the Poisson process has
parameter c. This refers to the contact
process in the juvenile context or the
parameter of the exponential times be
tween convictions in an adult case. We
wish to make inferences about c.
For clarity, we adopt a simple assump
tion concerning the sentencing rule. We
assume that sentencing
~ never occurs on the first contact,
· occurs on the second contact only if
the first was within the prior ~ time units,
and
~ always occurs on the third contact, if
not before.
This rule is reasonable, is in the spirit
of Waltz and PolIack, and can be ex
tendec] to more complex versions. Unlike
the problems encountered in PolIack and
Farrell (1984), calculations can be carried
out for more complicated versions of this
rule. First, we assume that individuals
have drawn their rate parameter c at ran
(lom from a parent distribution htc). We
OCR for page 380
402
focus on the group of individuals (in ei
ther the juvenile or adult context) who
receive a first sentence ant] note the value
of T for each. Notice that our data set is
restricted! to those who receive a sen
tence. This means that we will have a
much greater likelihood of selecting a
high c individual than a Tow c incliviclual.
Interestingly, the presence ofthe value of
T will have a modifying influence. If T is
small, the value of c is even more likely to
be large, since the offender was sen
tencec3 at the beginning of the career. If T
is moderate to large, we should find that c
is only slightly elevatecI. The individual
was sentenced, but it took a Tong time to
meet the criteria. If T is large, c should be
small, since only Towrate offenders can
avoid sentencing for a prolonged time
period under this sentencing rule. We
can quantify these heuristic comments by
computing the density of T given C, fT(t).
There are two cases to consider.
In the first case t C 7, sentencing would
occur on the second offense in (0,;~. The
time to the second offense has a gamma
(2,c) density, so the density has the form
c2tect 0 < t < ~
After A, sentencing can occur in two
ways. First, if To is the time ofthe first arrest
and T2 is the time of the second arrest,
incarceration will occur if T2  To c a, i.e.,
if the individual falls into the window.
Here T2 = t. Second, if T2  To > a, the
individual does not fall in the Winslow, and
sentencing will occur on the third arrest.
Suppose that arrests form a renewal
process with interevent clensityf fin this
example, we assume f is exponential Ices.
For t ~ 7, the density of the time T at
which the contact or conviction leacling to
sentencing occurs has density
fT(t)= I Upset sills+ J fist
o
t~
Jt
flu s) fit  U)]U US.
s + ~
CRIMINAL CAREERS AND CAREER CRIMINALS
Fords) = ceCs, we find
i C2 TeCt + 3 ~ t  T)  _
One can see that the heuristics men
tioned earlier do indeed hoist. For exam
ple, if c has a prior gamma (a, ,B) (listribu
tion, the posterior distribution of c after
observing T  t would be a gamma (cr +
2, ,8 + t) distribution if t ' r. If t ~ a, c has
a posterior distribution given by a mix
ture of gamma distributions, specifically
with probability p = ~/~; + Act + 21(t 
;12/~2(,l3 + tall, it is gamma (a + 2, j3 + t),
and with complementary probability 1 
p it is gamma (a + 3, ,l3 + t). The posterior
mean, E(c~T = t), is given by
a+ 2
E(c~T = t) = , t ' ~
/a + 2 /a + 3
P tp + t + (1  P) 1,~ + t , t > a.
The conditional mean is thus larger than
E(c) = JIB for small T = t, but smaller for
large t. This shows that the individuals
should not be placed on a common time
scale but analyzed separately using this
hierarchical approach. One can update
the prior distribution on c, is, and esti
mate all the individual c's. These can
then be compared with the empirical ar
rest records subsequent to intervention to
gain some insight into program effective
ness.
SUMMARY AND SUGGESTIONS FOR
FURTHER RESEARCH
This paper has introduced two innova
tions to the quantitative modeling of
criminal justice problems: a general
structure of hierarchical models and a
new stochastic model of a criminal career.
These models allow one to distinguish
OCR for page 380
RANDOM PARAMETER STOCHASTICPROCESS MODELS
variation between individual offenders
and variations within an individual ca
reer. This is important given the very
large variability in the o~encler popula
tion. In addition, the hierarchical ap
proach allows one to correct for natural
biases in a data set. Biases can arise be
cause the sampling is not at random from
the offender population but rather is con
ditional on some event. For example, one
might consider a set of arrestees or pris
oners. This group tends to contain higher
rate offenders than wouIc! be seen in the
general population. The new stochastic
model of a criminal career offers three
advantages over the standard renewal
process models in common use. First, it
introduces a twostate approach in which
there are periods of high and Tow activity.
Second, the "active crime set" approach
results in a natural age eEect in which an
o~ender's average crimecommission rate
diminishes over time. Third, the mode]
allows for some decision making on the
part of the o~encler.
There are a number of ways in which
one could consider extending the sto
chasticprocess moclel. The major need is
to include imprisonment and behavioral
changes that arise from imprisonment.
Indeed, this author wouIc] like to include
explicitly a parameter or parameters that
allow for a change in the Fir and active
crimeset distributions depending on the
fact of and length of sentencing.
It appears that the next step is to fit this
new class of models with data, after first
correcting natural biases in the data. This
should enable us to leam about the ex
planatory power in this class of models
ant! to determine to what class of phase
distributions the interevent (crime or ar
rest) times can be restricted. Data are also
needled to begin to determine an appro
priate class of superpopulation distribu
tions, not only for the parameters that
determine the phase distributions but
also for the other behavioral parameters
and the initial activecrime set. Finally,
403
the possibility of combining the hazarc3
function approach of Flinn and Heckman
with the phasedis~ibution approach
given in this paper should be explored.
REFERENCES AND BIBLIOGRAPHY
AviItzhak, B., and Shinnar, R.
1973 Quantitative models in crime control. Jour
nal of Criminal Justice 1 :18~217.
Barton, R. R., and Turnbull, B. W.
1981 A failure rate regression model for the study
of recidivism. Pp. 81101 in I. A. Fox, ea.,
Models in Quantitative Criminology. New
York: Academic Press.
Chaiken, J. M., and Rolph, J. E.
1980 Selective incapacitation strategies based on
estimated crime rates. Operations Research
28: 12591274.
Copas, J. B.
1983 Regression, prediction and shrinkage. Jour
nal of the Royal Statistical Society B
45:311354.
Deely, J. J., and Lindley, D. V.
1981 Bayes, empirical Bayes.Journal oftheAmer
ican Statistical Association 76: 833~41.
Dempster, A. P., Rubin, D. B., and Tsutakawa, R. K.
1981 Estimation in covariance components mod
els. Journal of the American Statistical As
sociation 76:341353.
Flinn, C. J., and Heckman, J. J.
1982a Models for the analysis of labor force dynam
ics. Advances in Econometrics 1:3~95.
1982b New methods for analyzing individual event
histories. Pp. 9~140 in S. Leinhardt, ea.,
Sociological Methodology. San Francisco:
JosseyBass.
1983 The Likelihood Function for the Multistate
multiepisode Model in "Models for the
Analysis of Labor Force Dynamics." Discus
sion Paper Series 8310, Economic Research
Center. Chicago, Ill.: National Opinion Re
search Center.
Harris, C. M., and Moitra, S. D.
1979 Improved statistical techniques for the mea
surement of recidivism.Journal of Research in
Crime and Delinquency 16:194213.
Harris, C. M., Kaylan, A. R., and Maltz, M. D.
1981 Recent Advances in the Statistics of llecidi
vism Measurement. Pp. 6179 in J. A. Fox,
ea., Models in Quantitative Criminology.
New York: Academic Press.
Harville, D. A.
1977 Maximum likelihood approaches to variance
components estimation and to related prob
lems. Journal of the American Statistical
Association 72:32(}340.
OCR for page 380
404
Holden, R. T.
1983 Failure Time Models for Criminal Behav
ior. Department of Sociology, Yale Univer
sity.
James, W., and Stein, C.
1961 Estimation with quadratic loss. Pp. 361379
in Proceedings of the Fourth Berkeley Sym
posium. Sol. 1. Berkeley: University of Cal
ifornia Press.
Karlin, S., and Taylor, H. M.
1975 A First Course in Stochastic Processes. 2nd
ed. New York: Academic Press.
Kelly, F. P.
1979 Reversibility and Stochastic Networks. New
York: John Wiley & Sons.
Maltz, M. D., and McCleary, R.
1977 The mathematics of behavioral change: re
cidivism and construct validity. Evaluation
Quarterly 1:421438.
Maltz, M. D., and Pollack, S. M.
1980 Artificial inflation of a delinquency rate by a
selection artifact. Operations Research 28:
547559.
Morris, C. N.
1983 Parametric empirical Bayes inference: the
ory and applications. Journal of the Ameri
can Statistical Association 78:47~5.
CRIMINAL CAREERS AND CAREER CRIMINALS
Murray, C. A., and Cox, L. A., Jr.
1979 Beyond Probation. Vol. 94. Sage Library of
Social Research. Beverly Hills, Cali£: Sage
Publications.
Neuts, M. F.
1981 Matrix Geometric Solutions in Stochastic
Models: An Algorithmic Approach. Balti
more, Md.: Johns Hopkins University Press.
Peterson, M. A., and Braiker, H. B., with Polick,
S. M.
1981 Who Commits Crimes: A Survey of Prison
Inmates. Cambridge, Mass.: Oelgeschlager,
Gunn, and Hain.
Pollack, S. M., and Farrell, R. L.
1984 Past intensity of a terminated Poisson proc
ess. Operations Research Letters 2:261
263.
Rolph, J. E., Chaiken, J. M., and Houchens, R. E.
1981 Methods for Estimating Crime Rates of Indi
viduals. Report R2730NIJ. Santa Monica,
Calif.: Rand Corporation.
Stollmark, S., and Harris, C. M.
1974 Failurerate analysis applied to recidivism
data. Operations Research 22:11921205.
Tierney, L.
1983 A selection artifact in delinquency data re
visited. Operations Research 31:852~65.