Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 380
10
Random Parameter Stochastic Process
Models of Criminal Careers
John P. Lehoczky
INTRODUCTION
Background
In the past decade there has been great
growth in the development of quantita-
tive methodologies to deal with criminal
justice problems. This has included ex-
tensive data gathering and analysis anct
some modeling of offender behavior. As
this data analysis proceeds, one can gain
clearer insights into the nature of offender
behavior and these shouIc! be incorpo-
rated into increasingly detailect models.
As the moclels increase in accuracy, one
John P. Lehoczky is professor and head, Depart-
ment of Statistics, Camegie-Mellon University. I
wish to express my thanks to Alfred Blumstein and
Jacqueline Cohen for their major contributions to
the paper. The approaches developed in this paper
are the outgrowth of a long series of discussions
concerning appropriate models for criminal behav-
ior and the empirical evidence supporting those
models. In addition, I wish to thank Donald Gaver
for his many discussions concerning hierarchical
models and corrections to biases in criminal justice
data sets. My thanks also to Arthur Gol~lberger, Jan
Chaiken, Chul Woo Ahn, and Mark Schervish for
their many comments on earlier drabs of this paper.
380
can begin to use them as policy tools to
analyze the impact of various approaches
to crime control, such as selective inca-
pacitation.
Unfortunately, it seems that the quan-
titative moclels of offender behavior that
have been cleveloped to date clo not cap-
ture the recent insights about offender
behavior fount] in major ciata analysis
projects, such as the Rand prisoner self-
report study. Indeed, the stochastic mod-
eling approach began in 1973 with the
work of Avi-Itzhak and Shinnar. This
work, described below, treats individual-
offender recidivism as a Poisson process.
A great deal of subsequent modeling has
been clone, but most of the models are
simple extensions of the Poisson-process
moclel, namely renewal-process moclels.
This class of models assumes that recidi-
vism times are independent and have the
same (listribution. Such moclels may fit
data better than a Poisson-process model,
but they do not incorporate the current
improver] un~lerstanding of offender be-
havior. This paper represents an attempt
to develop~a stochastic moclel that is in
better accord with this understanding.
OCR for page 381
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
Three major aspects of offender behav-
ior have been observed with sufficient
frequency to merit incorporation into an-
alytic models:
· Crime-commission propensities
change as a function of age.
· Offender populations are markecIly
heterogeneous.
· Offenders often are thought to com-
mit crimes in spurts and then to have
periods with little or no activity.
The age effect is very pronounced. It is
widely recognized that offender behavior
is at its peak during the late teens ant!
early 20s and then cirops significantly clur-
ing the Bus. Any stochastic moclel must
address this age effect. Standard renewal-
process moclels do not incorporate such
effects, because they assume that the
times between arrests are independent
and identically distributed (i.i.c3.) and
hence stationary. There is need to de-
velop moclels that account for age effects
but that simultaneously offer analytic
tractability. Such models will be pre-
sented in this paper.
It is evident from recidivism ciata as
well as self-report data that there is great
heterogeneity in the offender population.
This heterogeneity refers to differences
between offenders, including markedly
differing offense rates, career lengths, and
types of crimes engaged in. This variation
goes beyond differences that can reason-
ably be observed from independent rep-
lications of a single stochastic process.
Most moclels do not take this heterogene-
ity into account. The few exceptions arose
from work at the Ranc! Corporation, in-
cluding Chaiken and Rolph (1980) and
Rolph, Chaiken, and Houchens (19811. In
this paper, I argue for the use of hierar-
chical moclels to represent heterogeneity.
In such models each indivicluaT's crimi-
nal career is regardec! as a stochastic proc-
ess governed by parameters. Those pa-
rameters are themselves treated as
387
random variables drawn from a parent
distribution (superpopuJiation). The par-
ent distribution captures the heterogene-
ity of the population of offenders or the
variation between individuals. One
wishes to estimate the parameters of the
parent distribution to gain insight into the
population of offenders. In addition, one
wishes to estimate the rate-influencing
parameters of inclivicluals to understand
the behavior of each of the offenders.
Hierarchical moclels form the basis of the
analysis in this paper. They are formally
described, applied, and estimated in the
discussions that follow.
There is another aspect to criminal ca-
reers that has generally not been incorpo-
ratecT into stochastic models. This is the
occurrence of quiescent periods in the
course of the career. Self-report data re-
veal that criminal behavior often occurs
in spurts and is followed by lulIs in activ-
ity. This is not surprising if, for example,
the offender was attempting to gain suffi-
cient money through a series of crimes
and then, having reached that goal,
stopped for a period. The typical renewal-
process models do not incorporate such
behavior. A new class of models that in-
clucles this behavior is cleveloped below.
Several other aspects of stochastic mod-
eling of criminal careers are clealt with in
this paper. One of the most interesting is
the use of the hierarchical modeling ap-
proach to correct for natural biases in data
sets. Generally, criminal justice data sets
JO not provide random samples from the
offender population. Rather, inclivicluals
are part of a sample because they meet a
specific criterion that may be directly or
inclirectly related to their parameter val-
ues. For example, one might gather data
on prisoners. This group of offenders is,
however, not representative of the of-
fencler population because it typically
consists of inclividuals with high offense
rates, more serious offenses, or longer
careers. Similarly, if one took a sample of
OCR for page 382
382
arrestees in some time period, such a
sample would overrepresent high-rate of-
fenders, since they have a greater proba-
b~lity of falling into such a sample. As
illustrated below, the hierarchical model-
ing approach can help to overcome this
problem. It offers the opportunity to cle-
velop a correction for nonrandom sam-
pling, so that one can make more nearly
correct inferences about the offender
population from inherently biased data
sets.
Overview
This paper introduces a hierarchical
~superpopulation) model for criminal ca-
reers within a population of offenders or
potential offenders. There are two levels
to the hierarchy. The top level is used to
explain variation between individuals in
the population, that is, to explain the
heterogeneity of the population. At the
Tow level of the hierarchy, individuals
engage in criminal careers that are Ire ate c!
as independently evolving stochastic pro-
cesses governed by certain distributions.
These distributions contain parameters
with values at the top level. The Tow level
thus uses a stochastic-process model to
help explain differences within careers
governed by the same parameter values.
Covariates can be introcluced at both
levels of the hierarchical model. Covari-
ates are of two types: "historical"
covariates, which are fixed at the start of
the career, and "dynamic" covariates,
which can change during the evolution of
the career. For an analysis of aclult of-
fending careers, historical covariates
could include juvenile record or the age
at the time of the first juvenile arrest.
Relevant dynamic covariates might in-
clude employment status or drug use.
The historical covariates can influence
the choice of parameters for each indivicI-
ual at the highest level of the hierarchy.
Since these parameters are selected and
CRIMINAL CAREERS AND CAREER CRIMINALS
fixed at the beginning of the career, cly-
namic covariates cannot be used. All
covariates are allowed to influence the
evolution of the career of any particular
offender, the lowest level of the hierar-
chy.
A new family of stochastic models is
introcluced in the paper. The models are
characterized by two states, one of which
corresponds to a high rate of crime com-
mission and the other of which represents
a low rate of activity (which is taken to be
zero). Parameters are inclucled for the
time spent in each state, state-switching
probabilities, arrest probabilities, crime-
type termination probabilities, and the
times between crimes. For multiple
crime types, a competing-risks formula-
tion is used. The models offer tractability,
can include covariates, provide periods of
high and low activity, and introduce some
behavorial parameters.
Methods are also developed to assess
and correct the biases that occur in many
criminal justice data sets. Three specific
issues are adclressecl:
1. If a data set is gathered by taking
indivicluals who were arrester] during a
certain Winslow of time, the data set will
overrepresent individuals with high
crime rates among those at liberty and
underrepresent those who are in prison
for part of the window period.
2. If a data set comes from self-reports
of prisoners, it is not representative of the
population in general, since prisoners
tend to be high-rate offenders and to com-
mit more violent crimes.
3. Even at the level of the incliviclual
parameters, the inclividual crime or arrest
rates that are estimated from among ar-
restees or prisoners will be biased up-
ward. This is because indivicluals are
more likely to be caught in a period of
high activity (even if their parameter val-
ues may be low) and hence empirically
show a high arrest rate.
OCR for page 383
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
The hierarchical model developer] in this
paper can help to assess and correct these
biases.
Finally, a class of "phase distributions"
is introduced. This class is very versatile
in that it can approximate arbitrarily
closely the distribution of any nonnega-
tive random variable. In addition, the
class is closed uncler a number of opera-
tions that are useful for the models pre-
sentecI in this paper. The closure proper-
ties include convolution and mixtures, as
well as maxima and minima of random
variables drawn from this class.
HIERARCHICAL MODELS
.
This section describes the use of hier-
archical stochastic models for studying
criminal careers. There are several rea-
sons why this class of models is especially
useful and of great conceptual value.
First, it has frequently been observed that
criminal behavior varies widely between
inclividuals. This is especially true for
crime rates, as measured by self-report
ciata; some inclividuals report committing
crimes at a very high rate, while others
report they commit crimes rarely. Even
allowing for biases in these data and cle-
liberate falsification, it is clear that there
is great variation between individuals. It
is, therefore, appropriate to use a model
that can represent this great variation.
A second benefit of using a hierarchical
model is that it can help to improve pa-
rameter estimates for each inclividual.
Suppose one treated each individual in
isolation and attempted to estimate pa-
rameter values for each individual using
only his data (for example, arrest record
ant! the values of covariates). One wouIcl
finch that these estimators have a large
variance. With a hierarchical moclel, how-
ever, the data for other inclivicluals can be
usecl to help estimate parameter values
for a single individual. This follows be-
cause the parameter values for all individ
383
uals are related, since they are modelecI
as coming from a common parent distri-
button. This situation has been observed
and exploited with increasing frequency
in statistical studies. This statistical for-
mulation leacis to "shrinkage" estimators.
This type of estimator was first intro-
cluced by lames and Stein (19611. This
topic has been receiving substantial re-
cent attention in the statistics literature
(see the review by Morris, 19831. Meth-
ods based on maximum likelihood, em-
pirical Bayes, and Bayes procedures have
been developed. These approaches will
be cliscussec3 in the section on parameter
estimation; however, a recent example
provided by Dempster, Rubin, and
Tsutakawa (1981) may help to explain the
benefits of the methodology. These au-
thors present several examples, along
with a theoretical treatment of likelihoo
methods. One example deals with esti-
mating the first-year performance of law
school students using several explanatory
variables. For a single law school, the
estimates of regression coefficients are
highly variable. It is possible to improve
the estimates for a single law school
markedly by simultaneously carrying out
the analysis for many (82 in this case) law
schools. One believes there is a reason-
able similarity among law schools, and so
the data for other schools are pertinent for
any incliviclual school. The estimates for
one school gain precision by considering
many similar schools simultaneously.
Other examples of this type are cited in
Morris (19831.
The situation is analogous to Model II
analysis of variance or ranclom-effects
moclels. One can distinguish the variation
within a particular career and the varia-
tion between careers. Criminal careers
are modelecl using stochastic models.
This is appropriate because any career
has many random elements that control
its evolution. If two in~lividuals have the
same stochastic mechanism (i.e., the
OCR for page 384
384
same parameter values), the two resulting
careers may nevertheless be quite clif-
ferent. The offenders have possibly dif-
ferent criminal opportunities, possibly
different arrest realizations, possibly dif-
ferent sentences, and so on. If an inclivic3-
ual were allowed a second realization of
his career, it would cliffer from the first.
This is variation within a career. Variation
between careers arises when individuals
have different stochastic mechanisms (pa-
rameter values) governing their careers.
Once the individuals have been linked
through a superpopulation or hierarchical
moclel, the data for all individuals can be
user] in aciclition to the data for a single
indiviclual. The individual parameter es-
timates will be drawn toward the average
of the population. This is known as
shrinkage. The amount of shrinkage will
clepend on the size ofthe variation within
careers versus the variation between ca-
reers. If there is relatively small variation
between individuals, the shrinkage can
be great. The formal idea of the hierarchi-
cal mode] (presented in more detail in the
next section) is that one has a family of
parameters, 0, that controls the evolution
of an individual career, which is denoted
{X e, t-0~. The parameters ~ are treated as
random variables with some distribution
Aft, which itself may contain some un-
known parameters. One then wishes to
use a clata set to estimate individual
values and or or the unknown parameters
of A.
Given the above formulation, there is a
third advantage to using hierarchical
models. This is the possibility of assess-
ing ancI correcting for sampling biases in
a data set. Generally, criminal justice data
sets are not randomly sampled from the
population of offenders. More tYnically~
one generates a (lata set by selecting from
inclivicluals having a particular attribute,
such as an arrest in a certain time period,
or who are in prison at a particular time.
Each of these sampling mechanisms
CRIMINAL CAREERS AND CAREER CRIMINALS
yields a sample that is not random from or.
If one is, therefore, to make inferences
concerning the offender population, one
must assess and correct those biases. The
hierarchical formulation is useful in car-
rying out this process, as will be illus-
trated below. (Cohort samples can over-
come the biasing problem; however, they
generally yield too small a sample of
criminal activity to be of great utility.)
A NEW STOCHASTIC MODEL
This section introduces a new family of
stochastic moclels of the crime process
and arrest process associated with a sin-
gle criminal career. These new models
are intended! to encompass more of the
salient aspects of criminal behavior than
has been possible with previous models.
The basic mocle! presented is itself still
oversimplifiecI but shouIcl serve as an
introduction to a set of ideas and tools that
future researchers will find useful. Only
the most tractable versions of this family
of models are presented in detail. This
section is organized in a sequential man-
ner. First, some ofthe most familiar, early
stochastic moclels of criminal careers are
summarized, including their good and
bad points. Next, a model is presented
that overcomes some of the objections to
previous moclels. Finally, a class of flex-
ible moclels is presented that seems to
improve previous efforts consiclerably. Of
course, further changes are to be antici-
patecl as unclerstanding of the un(lerlying
processes increases.
Poisson Crime Processes
A frequently used moclel for the proc-
ess of crimes committed by a single incli-
viclual is that crimes form a Poisson proc-
ess (see Marlin and Taylor, 1975) cluring
the times the individual is not in prison
(see, for example, Avi-Itzhak and Shinnar,
1973; Rolph, Chaiken, ant! Houchens,
OCR for page 385
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
19811. The times between crimes (after
removing time in prison) are inclepen-
clent random variables with an exponen-
tial distribution having some mean, say
1/A. Associated with a crime process is an
arrest process. This arrest process is a
thinned version of the crime process in
that only a subset of the crimes results in
arrest) (and only a subset of the arrests
results in imprisonment). A common as-
sumption in all work is that an arrest is
determined at random for each crime.
This means that there is an arrest proba-
bility q and that a crime event yields an
arrest event with probability q indepen-
dent of anything else. With this type of
thinning to construct the arrest process, if
the crime process is Poisson (A), the arrest
process is Poisson (Aq).
This simple mode! ofthe crime process
has several attractive features:
1. The Poisson process is well uncler-
stooc] and very tractable. In addition,
given the assumption of random thinning,
both the crime and arrest processes are
Poisson. If the crime process is a renewal
process and one thins it at random, the
arrest process will be approximately Pois-
son even if the crime process is not.
2. The Poisson process has a single
parameter, and the statistical inference
for it is well unclerstoocI.
On the other hand, several drawbacks
to the model must be ac3ciressed:
1. The Poisson moclel, as such, floes
not account for population heterogeneity.
2. The Poisson moclel for the arrest
process may not fit recidivism data (see
HoIclen, 1983, for a discussion). That is,
the times between arrests (after eliminat-
ing time in prison) are not exponentially
clistribute(l.
3. It has been observed, especially
from prisoner self-report data, that arrests
lit is assumed there is no problem of"false
arrest."
385
and crimes appear to be more clustered
than would be suggested by a Poisson
moclel. Moreover, this mode! does not
allow for any sort of aging effects. It has
been widely noted that the frequency of
arrests varies with time and at some point
drops to essentially zero, suggesting an
effective enc! to the career.2
4. A major drawback of the simple
Poisson mode! is that the individual ex-
erts no control over the career other than
picking A. There are no decision points
built into such models at which, for exam-
ple, the incliviclual could clecicle to stop,
or change, A. The events of the past career
do not influence the future. Clearly, one
would suspect that past events can have
an important effect on the future and so
one would like to broaclen the class of
moclels to allow for this.
Some of the previous issues can be
overcome in a straightforward way. For
example, one could introduce a random
lifetime. Each indiviclual has a career
length (often assumed to be exponen-
tially clistributecT, to enhance tractability).
When the length is exceeclecI, the incli-
vidual no longer engages in crime (and so
presumably is no longer arrested). One
problem with this approach is that the
career length, if determined at the start of
The concept of a finite career length is some-
what controversial (see, for example, Holden, 1983:
261. It is argued that there can be no logical point at
which a criminal career can end, except death. Any
former criminal could be presented with an oppor-
tunity such that he would again commit a crime.
While one may respect this point of view, it should
be realized that no single stochastic model can be
expected to represent an exact truth. Rather, one
strives to construct models that are approximately
true, that account for important effects, and that offer
a tractable analysis. It may be that some criminals
whose careers are said to have "ended" may, in fact,
commit a few additional crimes. In such a case, one
would expect the frequency of those crimes to be
very low. When coupled with the fact that the arrest
probability is generally very small, one expects that
the arrest processes may be no different.
OCR for page 386
386
the career, is not influenced by any fac-
tors in the career. (Overcoming the poor
fit offered by the exponential distribution
is discussed in the next section.)
One can introduce population hetero-
geneity in two ways:
1. One can allow A to depend on
covariates, such as juvenile record or age
at first juvenile arrest.3
2. One can allow A to be random (see,
for example, Rolph, Chaiken, and
Houchens, 19811. In this way, A can rep-
resent the heterogeneity of a population
of offenders.
Renewal-Process Models
Some improvement in fit can be
achieved by replacing the Poisson-pro-
cess model for crimes with the more gen-
eral renewal-process model. Recall that a
renewal process is a point process in
which the times between points (crimes)
are independent, identically clistributec3
random variables having a cumulative
distribution function F. which is not nec-
essariTy exponential. Arrests are then
commonly consiclered to be a randomly
thinned version of crimes. Arrests will
also form a renewal process with distribu-
tion G. One can determine G in terms of
F and q; however, the relation is most
easily expressed in terms of the Laplace-
StieltJes transform (Neuts, 1981), LEFTS)
Etexp(-sT)], where T represents a ge-
neric random time between crimes. We
fins!
=
where (A is the distribution of times be
G~ ~ 1 - (l - q)*~F(s)
. ~ . ~ . . .
tween arrests.
Many authors have used renewal-pro
3See, for example, Stollmack and Harris (1974),
Barton and Tumbull (1981), and Golden (1983).
CRIMINAL CAREERS AND CAREER CRIMINALS
cess models for the crime process (see, for
example, Holden, 19831. Many (listribu-
tions have been used for F. These include
the exponential, Weibull, Tognormal,
gamma, mixtures of various distributions,
and defective distributions. In addition,
logistic regression methods have also
been used. In each case a particular cTis-
tribution was used to fit a particular clata
set. One can readily see that no uni-
versally appropriate family seems to fit
recidivism data. However, a family of
continuous distributions, callect phase
distributions (Ph-distributions), seems
particularly useful for several reasons:
1. The family is dense in the set of all
positive distributions, that is, any positive
distribution can be approximate(1 arbi-
trariTy closely by a phase (distribution.
2. Commonly used distributions, such
as the exponential, some gamma, and
their mixtures, are phase distributions
(Iognormal ancI Weibull are not but they
can be closely approximate(l).
3. These distributions have a Markov-
ian structure (see Neuts, 1981) and so are
useful in stochastic model buiTcling, be-
cause of their tractability.
4. The distributions are closed under
such operations as mixing and convolu-
tion.
The renewal-process approach to mod-
eling the crime process helps in that re-
ci(livism data can be better fit; however,
the other (li~culties cited earlier remain.
The principal problems are the follow-
ing:
1. The general renewal-process moclel
still assumes indepenclent, identically
(listributecl arrest times and does not offer
the clustering of crimes or arrests usually
reported or observecl.
2. The mocle! does not allow the incli-
vi(lual to make (recisions concerning be-
havior.
OCR for page 387
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
3. The moclel does not account for the
great amount of population variability
that has been observed in criminal justice
data sets.
4. The model floes not build in inter-
actions between individuals and the
criminal justice system.
A New Class of Models
In this section, I present a new mocle]
designee] to overcome some of the diffi-
culties with earlier moclels. The new
moclel makes use of a pair of states be-
tween which the individual moves.4
When the individual is in one ofthe states
(the "high" state), he commits crimes at a
high intensity. When in the other state
(the "low" state), crimes are committed at
a Tow intensity (which is taken to be zero).
In acictition, switching between states al-
Tows the individual some decision-
making latitucle. I begin with a single
crime type, generalize to multiple crime
types, and then develop a hierarchical
formulation.
The following parameters are used in
the model:
T: number of distinct crime types.
initial crime types for an indivi(l-
ual, A C {1, 2, . . . , 11.
Fin: the calf of the time between crimes
of type t, 1 ' t ' T.
A: the arrest probability for crime
type t, 1 ' t ' T.
,8~: the probability of terminating
crime type t, 1 ' t ' T.
a: the probability of switching from
the high-rate state to the Tow-rate
state.
PA two-state model of this sort was studied by
Maltz and Pollack (1980~. It is also mentioned in
llolph, Cha~ken, and Houchens (1981:37).
387
G: the calf of the time in the low-rate
state.
Let us definc
· a crime process that is a renewal
process with the time between crimes
being a phase distribution F with mean
1/A;
· an arrest probability q;
· a state-switching probability a;
· a phase-type distribution G govern-
ing the amount of time the indiviclual
spends in the Tow state, during which no
crimes are committed; and
· a probability ,B giving the probability
that the career ends with the start of the
current Tow period.
We can describe the process intu-
itively, as follows. A cycle begins with the
individual in the high state. Crimes are
committed according to a renewal proc-
ess with distribution F. After each crime,
two issues must be resolvecI:
1. with probability q, the inclividual is
arrested, and
2. with probability car, the individual
switches to the Tow state.
If the inclividual switches to the low
state, he may terminate his career with
probability ,B. With the complementary
probability, he stays in this state for a
period detem~ined by the cumulative dis-
tribution function G. While the inclivid-
ual is in the low state, no crimes are
committecl.
The arrest, state-switching, and te~mi-
nation probabilities are applied at ran-
(lom, that is, without any (lependence on
the realization of the process to that point.
Indeed, it is interesting to consider a
generalized model in which the process
up to that time can influence these tran-
sitions. For example, one can introduce a
reinforcement eject. If any individual
commits a crime and is not arrested, that
might provide reinforcement to stay ac
OCR for page 388
388
five. Two unarrested crimes provide fur-
ther reinforcement, and so on. One conic]
simply introduce a sequence I~°~n) in
which In represents the probability the
individual offender moves to the Tow
state after n consecutive unarrested
crimes. The sequence could be chosen so
that it strictly decreases to a positive limit.
Any arrest may result in a conviction
and a prison sentence. We are only inter-
ested in the behavior during free time;
consequently, the model is applicable
only while the person is not in prison.
This brings up the issue of how the pro-
cesses should be initiated at time 0
(which we take to be age 18 for adult
offending) and how they should be re-
started after release from prison, if appli-
cable. It is mathematically convenient to
keep the process in equilibrium as much
as possible. (The advantage ofthis will be
seen below in the discussion of correc-
tions for sampling biases.) This suggests
that we should take the initial time to the
first crime to be given by the equilibrium
forward! recurrence time distribution.
Once the first crime occurs, the renewal-
process mocle! begins.
Uncler the assumptions presenter! ear-
lier, the crime process is a renewal proc-
ess (if prison time is cleletec3~. After each
crime, a coin is flipped, and clepen(ling
on the result, the next crime comes ac-
cording to F with probability (1 - cz) or
according to F*G with probability cr(1-
,l3), or no further crimes occur with prob-
ability cog. (Here * denotes convolution,
since the time to the next crime is the sum
of the length of the Tow period and the
times to the first crime in the next high
period.) This gives us a mixture of two-
phase distributions, which will also be a
phase distribution, and the resulting dis-
tribution is defective in that it allows a
positive probability of the value Go. Let us
refer to this distribution of times between
crimes by H. and let its Laplace-StieTtJes
transform be given by AH. The arrest
CRIMINAL CAREERS AND CAREER CRIMINALS
process is also a terminating renewal
process. This is easily seen, since each
crime has an inclependent arrest proba-
bility. The time between arrests is thus a
random sum of independent random vari-
ables having distribution H. If we let the
distribution be K, then
Kit ~ 1 - ~ ~ - q)* ~H(S)
It should be noted that
.
H(O) = P(time between crimes < x'
= 1-~h
and
~K(O) = P(time between arrests ~ ok
= .
q*(l- ah)
1 - (1 - q)41 - ah)
This model has several attractive fea
tures:
1. The model is, in reality, a terminat-
ing renewal process for both crimes and
arrests. In this instance though, the model
explicitly builds in parameters to repre-
sent ways in which the individual can
control activities and does so in a realistic
way. After each crime, the individual con-
trols whether to continue or change states
and, if he or she continues, whether to
terminate the crime type or not. Rather
than merely fitting K, the arrest-process
distribution, the model shows how it is
composed of more fundamental behav-
ioral parameters. Thus, the model over-
comes the objection to the lack of individ-
ual control over the career while
retaining the simplicity of a renewal proc-
ess.
2. The model introduces important be-
havioral effects in a tractable way and
allows for the generality provided by
phase distributions.
It should be noted that the model can
be further generalized in several ways. As
OCR for page 389
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
mentioned before, the single cat parameter
can be replacecl by a sequence of param-
eters to represent a reinforcement effect.
In addition, one can increase the number
of states (from "high" and "low". This
allows for more complex behavior. De-
spite this acicled generality, I have re-
tained the two-state moclel with no
crimes in the Tow state. This reduces the
number of parameters in the model. Cur-
rently available ciata sets lack the size or
detail needed to estimate a more complex
model successfully. As better data sets
become available, they can be used to
extend the moclel. In fact, if the duration
of the Tow state is sufficiently short, the
two-state model is little different from a
one-state moclel. The two-state model is
beneficial when Tow periods are of at least
moderate duration.
Multiple Crime Types
The previous moclel can be generalized
to allow for multiple crime types. Such a
generalization can be carried out in several
ways. In this section I explore several pos-
sibilities and arrive at a final version of the
model. Throughout this discussion, T de-
notes the total number of crime types,
which in turn are indexer] by t.
Crime-switch MocteZs
The simplest approach is to consider
crime types as having no influence on the
stochastic structure described above. The
distributions F and G. as well as the
parameters q, a, anti ,(3, are unchangecl.
Rather, crime type serves only to label
the crimes. This can be clone by assuming
T distinct types and introducing a T x T
Markov crime-switch-transition matrix C
= (cij), where cij is the probability that the
offender, having last committed a crime
of type i, will next commit a crime of type
_i
389
The Markov crime-switch approach is
much used in stochastic moclels of the
crime process. Moreover, it can be made
more general by allowing arrest probabil-
ities to clepend on crime type. I do not
pursue this approach any further in this
paper, however, for two reasons. First,
one adds T(T - 1) parameters to the
mocle] through C while gaining only a
little more explanatory power. Seconcl, I
prefer to pursue an alternate approach
that enables one to clear better with the
age ejects, which are clear in crime
data but are not yet incorporated in the
model.
Competing Risk Model
An alternate approach to constructing a
multiple-crime-type moclel is to intro-
cluce a set of distributions of phase type
F.`, 1 ' t ' T. where T is the number of
crime types. Suppose that the in~liviclual
commits a crime and stays in the high
state or that the individual leaves the low
state and enters the high state. We neecl
to define the time until the next crime
occurs. This can be done using a "com-
peting risks" formulation. In this formu-
lation we imagine random variables X
being drawn independently with (listri-
bution Fir, 1 ' t ' T. The time until
the next crime is given by X =
min X`, the crime with the shortest
~ ~ ~ ~-T
time to its occurrence. The type of crime
is given by the index t, which gives X.
The family of phase distributions is well
suited to this approach, since if each of
the X~ has phase distributions, then X will
also have a phase distribution (see be-
Tow). This version of the multiple-crime-
type model is therefore essentially equiv-
alent to the single-crime-type moclel with
the exception that the distribution F be-
longs to a special subset of the phase
distributions, those that arise as mini-
mums of other phase distributions.
OCR for page 390
390
The Final Version of the Modest
The competing-risk version of the mul-
tiple-crime-type moclel leaves one issue
unaddressed. In many criminal justice
data sets pertaining to individual offend-
inp;, there is a pronounced ape effect.
CRIMINAL CAREERS AND CAREER CRIMINALS
~ an arrest probability for crime type t,
qt. 1 C to T.
· a state-switching probability cY,
· a crime type t termination probabil-
ity, IBM, 1 c t c T. and
· a phase-type distribution G. denoting
the length of the low period.
~ , ~ ~
Individuals seem to have high crime
commission rates as older juveniles or Note that ~ and G could be allowed to be
young aclults, and those rates sharply di- crime-type dependent, as well.
minish at older ages. None of the models An intuitive description of the criminal
presented thus far addresses this issue. career is as follows. The incliviclual be
Indeed, a renewal-process moclel of crimes
or arrests would not allow for such an age
effect. Fortunately, there appears to be a
straightforward way to introduce such ef
fects, and this approach is supported
somewhat by empirical evidence. Studies
by Peterson ancI Braiker (1981) indicate
that for a single crime type, there is no age
effect. Rather, an individual commits a
particular crime type in a time-stationary
way (although in a clustered fashion, like
that given by the two-state models, then at
some time point in the career the inclivid
ual essentially stops committing that
crime type altogether. This process goes
on inclepenclently for the T crime types
(although some offenders specialize in a
subset of these types). The incliviclual has
a set of active crime types. The time until
the next crime (when the individual is in
the high state) is taken to be the minimum
of the times for the active crime types. As
time progresses, the portion of the of
fencler's career involving crime type t
will end, and so the set of active crime
types decreases in size. As more crime
types are eliminated, the time between
crimes will naturally increase. This will,
in turn, result in an age effect.
The multiple crime type can be sum
marized. It consists of the following:
· a set of phase-type distributions, Fir,
1 _ t_ T.
~ an initial set of active crime types
A c 11,. .., hi,
gins his or her career with a set of active
crime types A. Each crime type has a
, 1 _
distribution associated with it. Let X, 1 c
t ' T represent random variables, where
X~ has distribution Fir. The time until the
next crime is given by mind, and the
type of that crime is given by the index
associated with the minimum. When a
crime of type t is committed, with proba-
bility ,8~, that crime type is removed from
the active set A. With probability 1 - Al,
this crime type is kept in A. With proba-
bflity qua the incliviclual is arrested, and
with probability cr the indiviclual
switches to the low state. After a period
having distribution G. he moves back to
the high state. The process continues in
the same fashion, except that over time
the active set A will be re(luce(1 in size.
When A becomes empty, the career is
encled. As A becomes smaller, the times
between crimes (ancI hence arrests) will
increase. This wfl] produce an age effect.
(One could allow A to increase in size,
i.e., new crime types to be addled. I do not
consider such a possibility in this paper.)
In the next section, I discuss the hier-
archical version of the model en cl the
addition of covariates. It is clear that, at
any point in time, the decision to termi-
nate a crime type, to drop the low state, to
adjust the arrest probability, and so on
will be influenced by covariates, such as
drug use or employment status. The basic
model clescribed above can be enhanced
to allow such consicLerations.
OCR for page 394
394
One must calculate the posterior joint
distribution of ~ ant! pi, 1 ' i c n given Xi,
1 ' i ~ n. This calculation may involve
significant numerical integration. Once
the posterior distribution has been caTcu-
latec3, it can be used to estimate any of the
parameters or to predict future values of
xi, 1 ~ i ~ n. The reacler shouIc] consult
Deely en cl LindIey (1981) for details and
examples.
It seems generally cli~cult to deter-
mine fall accurately, say by elicitation.
Fortunately, for many criminal justice
clata sets, n can be very large. If this is the
case, the prior distribution of ˘,f(˘J), wit!
have very little influence on the esti-
mates. It will be clominatecI by the data.
One can, therefore, select a prior distribu-
tion that maximizes computational conve-
nience, say by picking a conjugate prior
distribution if one exists. Far more care is
requirec! if n it small.
The Empirical Bayes Approach
Morris (1983) discusses the empirical
Bayes approach and includes many cita-
tions for the use of this methoclology. The
general approach in this instance is to
proceed in two steps. First, one integrates
out the conditional distribution of ~ given
˘, to find the conclitional distribution of
each xi given ˘. The data Xi, 1 c i ~ n, are
then used to estimate ˘. This is typically
clone using maximum-likelihoocl estima-
tion, although one couIc3 follow Deely
and LincTley (1981) in using Bayesian
estimation. In aclclition, the method of
moments is often very convenient; how-
ever, its small sample behavior is un-
clear. Notice that all the data are used to
estimate ˘,, and this will result in an
estimate ˘. It remains to estimate the
individual di parameters. This is done
using likelihood! methods by assuming d
has distribution gym and using xi as
data. Again a choice of methods is possi
CRIMINAL CAREERS AND CAREER CRIMINALS
ble, but the Bayes approach using the
posterior distribution of ~ given ˘, and xi
is preferable.
The Simultaneous-Likelihood
Approach
A third approach is the simultaneous-
likelihood approach. This approach is
very unreliable and should be ignored
because it can produce inconsistent esti-
mators. It entails writing the joint likeli-
hooc] of di and Xic. This [unction is then
simultaneously maximized over ~ and pi,
1 s i ' n. The method is unreliable, in
part because the number of parameters
grows with the number of observations.
This situation is one in which the maxi-
mum-Tikelihoocl method may perform in
an undesirable fashion. Such behavior is
shown in the examples below.
Some Simple Examples
The following example is clesigned to
illustrate the ideas developed in the pre-
vious three sections. The example is
based on the simplest model of a criminal
career, given in the discussion of the
Poisson process above.
Assume that each of n individual of-
fenders is arrested according to a Poisson
process with parameter 6. In addition, the
n values of ~ are drawn independently
from a gamma (a, ,8) distribution. The
shape parameter cr is known, but the scale
parameter ,`3 is unknown. The individual
Poisson processes are observed over an
interval of length L. By sufficiency and
the memoryless property, we merely need
to consider the total number of arrests over
this time interval. We denote this quantity
by Xi. Conditional on pi, Xi has a Poisson
distribution with mean die.
We can calculate the conditional diski-
bution of Xi given ,B by integrating out the
parameter Bi. This results in
OCR for page 395
OCR for page 397
OCR for page 398
OCR for page 399
OCR for page 400
OCR for page 401
OCR for page 402
OCR for page 403
OCR for page 404
Representative terms from entire chapter:
crime process
RANDOM PaRAMETER STOCHASTIC-PROCESS MODELS
P(Xi = dim) = ~'r`Xi + 1) (p + L')
~ L \xi
t~ + LJ ~ Xi = 0, 1, 2, . . . .
a negative binomial distribution. Again
and L are known.
The Fu77 Bayesian A p proach
In this approach a prior distribution for
,ll is introduced. The most convenient
choice is to introduce the conjugate prior
distribution for the negative binomial dis-
tribution, the beta distribution. We let
,B/(,B + L) have a beta (a, b) prior distribu-
tion. The full hierarchical model then
becomes:
· ,(3/(,(3 + L) has a beta (a, b) distribu-
tion,
· (61,---, t3n)|/3 are independent with
gamma (cr. ,(3) distribution,
· X1, . . ., Xnlfl ,8 are independent and
Xi has a Poisson (die) distribution.
One now wishes to find the posterior
joint distribution off and 61, . . ., en given
ail,..., Xn. This can be easily carried
out, and we find
.
p/(h + L)IXl, . . ., Xn has a beta (a +
n
n(x,b + ~ xi),and
i = 1
· 01, ~ ~nll3, X1,..., Xn are condi-
tionally independent with gamma (cr +
Xi, ,B + L) distribution.
One can now construct Bayes estimates
ofthe parameters. This requires the intro-
duction of a loss function. For simplicity,
consider nonsimultaneous estimation of
the parameters based on the conditional
mean. This would result in
395
_ L(a + na)
-
n
X ~ b + ~ xi is
d i- n 11
a + no + b + ~ xi/
i = 1
For large n, these estimates are given
approximately by
A _
,8 = Logs,
_ (a + Xi)(X)
. _
L(a + x)
Note that the estimate of pi involves all
the data. In addition, for large n, the
choice of prior parameters (a, b) becomes
immaterial.
Empirical B ayes Approach
The first step of this approach is to
estimate ,B after integrating out 0. We treat
X1, . ., Xn~,B as being i.i.d. with negative
binomial distribution. The maximum
likelihood estimate is ,8 = cxL/x, the same
as the limiting version of the Bayes esti-
mate off.
To find the estimate of each ~i, we treat
the following problem:
· di has gamma (
396
The Simultaneous-Like7tihood
Approach
This method involves writing a simul
taneous likelihood, including that of the
Bi and Xi, 1 c i c n. One then maximizes
over pi and ,B. The log-likelihood is given
by
constant + ncrIog,(3 + Ad, Act + Xi-1)
The likelihood equations are
and
give
and
~ di = n a/,6.
These can be simultaneously solved to
~ aL
is=_
x - 1
c~+xi- 1
~ _
vi -
,i] + L
This estimate of '3 can be negative and
even if positive is inconsistent. The point
of this example is to give a simple illus-
tration of a situation in which the method
gives an unreasonable estimate. In other
cases the method of maximum likelihood
may not even provide an answer because
infinite likelihood can be generated at
some boundary of the parameter space.
In summary, either the full Bayes or
empirical Bayes method should be used,
if possible. The simultaneous-likelihooc]
method should be avoided.
CRIMINAL CAREERS AND CAREER CRIMINALS
several properties of this class. These
distributions have recently been redis
coverec3 and extensively developed by
Neuts (19811. The reader should consult
this text for a complete treatment.
Only continuous time phase distribu
tions are considered here. These distribu
tions arise naturally in the context of con
tinuous time Markov chains. A phase
distribution arises as the amount oftime it
~ - L ~takes such a Markov chain to first reach a
°g i (0 + )~ i designated state in its state space. Con
sider a continuous time Markov chain
with state space {1, 2, . . ., m + 1) and
_ ~ + Xi-1 infinitesimal generator:
pi
Q = (qij) with qii < 0, qij 2 0
if i if j, ~ qij = 0,
j
and
qm+l,i = 0,i'm + 1.
For the given assumptions about the
qijs, it follows that state m + 1 is an
absorbing state. For any other state i, the
chain is held in the state for an exponen-
tial period of time win mean 1/~-qii).
One must also introduce an initial distri-
bution, p = (Pi, . . ., Pm+l) The chain is
started in a state selected at random from
the distribution p. Once the initial state is
selected, the chain evolves according to
Q. Eventually, the chain will reach state
m + 1, and this is called the hitting time
of state m + 1. This hitting time has a
phase distribution with representation (p,
Qua. Given this description, one can see
that the (m + 1) x (m + 1) matrix Q has
a block form given by
Q = (0 0 is'
PHASE DISTRIBUTIONS
where QOismx m.
One problem with any particular phase
distribution may be that the (p, Q0) rep
This section is intended to introduce
the class of phase distributions and to list
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
resentation is not unique. This point is
addressed in Neuts ( 19811.
The following facts about phase distri-
butions are useful.
1. A phase distribution puts mass Pm + ~
on O and has density pOexp(xQo)Q~ on
(O,oo), where p = (Po, Pm + i)
2. The Laplace-Stieltes transform of
the distribution is given by
(s) = Pm+i + P(Si QO) Qua
for Re(s) - 0.
3. The nth moment of the distribution
. .
IS given Dy
an = ( - 1)n n! (pQO-nem),
397
phase distributions given by F and G.
Then the distribution Hi(t) = F(t)G(t)
corresponding to max (X, Y) and Hz(t) = 1
- t1-F(t)~1 - G(t)] corresponding to
min (X, Y) are both of phase type (see
Neuts, 1981:60). This property is useful
for constructing a competing-risk mode!
of the times between crimes for multiple
crime types, as was used above.
The class of phase distributions is very
large and explicitly contains a number of
important parametric families. In particu-
lar, the exponential, gamma, and general-
ized gamma distributions are of phase
type. They can be obtained by setting
~i,i + ~ = -~ii and pl = 1. The clistribu
where em = (1, . . ., 1) and is 1 x m. tion is then a sum of m exponential ran
clom variables with possibly different pa
rameter values. The hyperexponential
can be obtained by setting hit+ ~ = - ~ii'
and more complex mixtures can be ob
tainecI similarly. The class of phase ctistri
butions can be used to approximate any
nonnegative continuous distribution. A
construction is given by Kelly (1979). In
cleec] the class of generalized gamma clen
sities alone is dense in the family of
nonnegative continuous distributions.
This result is useful, since the class is
smaller and easier to hancIle than others.
Finally, note that the class of phase
distributions is icleal for stochastic mod-
eling. If one models some time (such as
a recidivism time) as having a phase clis-
tribution, by augmenting the state space
with a single variable (which denotes the
current phase) the model will retain a
Markov structure, if it had one originally.
This allows one to stay within a tractable
family of models, while introducing the
flexibility of being able to approximate
any nonnegative probability distribution.
4. Suppose F and G are phase clistribu-
tions with orders m ant! n and represen-
tations (p, Qo) and (r, S), respectively. The
convolution F *G is also a phase distribu-
tion with representation
I (Qo Q~R~
1(P,Pm+1r) lo S
where Q~R~ is the m x n matrix with
elements Qijrj, for 1-c i c m and 1 c j c
n.
5. If one considers a renewal process
with phase distribution F governing the
times between events, the equilibrium
forward and backward! recurrence time
distributions are also phase distributions
with modified initial vector.6 This prop-
erty is useful for correcting biases in ciata
sets in which sampling is not random but
rather is length biased (see next section).
6. The family of phase distributions is
also closed under the operations of"max-
imum" and "minimum." Suppose X and
Y are independent random variables with
6See Neuts (1981:52) for the exact representa-
tions and p. 63 for a discussion of special properties
of renewal processes governed by phase distribu-
tions.
CORRECTING BIASES IN SAMPLES
This section acl(lresses the problem of
biases in data sets that arise from
398
nonranclom sampling from the offender
population.7 The hierarchical-modeT ap-
proach can be used to understand quan-
titatively the nature of the bias and there-
fore to correct for it. Several specific
situations are consiclerec] in this section:
1. biases arising from restricting atten-
tion to offenders with at least one offense
~ . . ,,
in a wins low perloc i,
2. biases that arise in self-report data in
which the sample is restricted to a prison
population, and
3. biases that arise from estimating an
incTiviclual crime rate from an individual
record of a person who is caught in the
midst of a period of high activity, so the
estimates are biased upward.
Window Arrest Data Sets
CRIMINAL CAREERS AND CAREER CRIMINALS
ration. The sampling plan thus is biased
in favor of offenders with higher crime
anct arrest rates. If no adjustment is macle,
we will generate overestimates of crime
rates, arrest rates, and the parameters of
the superpopulation. It is, however,
straightforward to correct the likelihood
function to account for the bias in the
win(low-arrest sampling proceclure. We
begin by calculating the likelihood of a
criterion arrest in It + b].
Let us take, first, the standard renewal
theoretic case, where F = G. Define
putt = P Renewal event occurs in
(t,t+ b)],t' O.
The function puts gives the probability an
offender has a criterion arrest in the spec-
ifiecl window It + b].
By conditioning on the time of the first
event, we can write an integral equation
for p~t),8
Consider a general, clelayed-renewal
process with initial distribution G and
general distribution F. Consequently,
starting at time 0, the first event in the
process occurs according to the clistribu
tion G. while all subsequent interevent
distributions occur according to F. In the
setting of a hierarchical moclel, we allow
F and G to clepenc] on parameters, and
these parameters have some distribution
given by calf H and density h. ptt) = F(t + ~ - F(t)
Suppose we select an inctiviclual at ran
dom from among the population of indi
vicluals who have an arrest in ft,t + hi.
That is, we restrict our sampling to indi
viduals having this "window arrest"
property. An offender satisfying this win
dow-arrest criterion will typically have
more arrests than an individual randomly
selected from the general offender popu
7Professor A. Goldberger has pointed out to me
that there is an extensive literature on correcting
biases in samples in the educational psychology,
economics, and evaluation research literature. This
is treated under the rubric of selectivity bias,
nonequivalent groups, and quasi-experiments.
None of these, however, addresses the stochastic
process aspects dealt with in this paper.
p(t) = F(t + ~-F(t)
J+ p(t- x~dF(x).
n
This equation can be solved (see Karlin
and Taylor, 1975:184) to find
+ J [F(s + b) - F(s)]dM(s),
o
where M(t) = IFS n)(t) is the renewal
function. The quantity p(t) is thus deter-
mined completely by the cdf F.
The expression is somewhat difficult to
interpret, because there are two vari-
ables, t and 6, in addition to F. Some
insight can be gained by considering the
behavior of p(t) for large t. As t ~ or, one
can apply the key renewal theorem to
find p(t) >p,where
.
8Note that pit) is also a function of ~ but that it is
ignored in the notation.
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
399
J~ EF(t + B)-F(t)ldt Conditioning on the time ofthe first event
~ = .r
t any gives
m
and
, _ ,
m= | xdF(x) = | [1 - F(x)]dx
Jo Jo
is the mean time between events. p(t) = F(t + b)-F(t)
Some simple algebra allows us to com-
pute
J6 Ll - F(t)]
P= - dt.
0 m
The integrand is itself a density func-
tion. It represents the equilibrium back-
ward or forward recurrence time distribu-
tions associated with F. The factor p,
when treated as a function of 3, is a cdf. It
begins at O and increases monotonically
to 1 as ~ increases to or. For very small
values of 6, p is approximately given by
Am, while for large values it is nearly
1. This is quite reasonable, since as the
window size ~ is increased, more and more
individuals in the population can be
included, and the window effect is re-
duced.
We are interested in the behavior of
p(t) for all values of t, not just the asymp-
totic behavior. The reason that p(t) varies
with t is that we have an initial condition,
namely, that an event occurs at time 0. It
takes some time for the effect of this
condition to wear off and for equilibrium
to be approached. If we begin with the
renewal process in equilibrium, then p(t)
will no longer depend on t.
We can also achieve equilibrium by
using a delayed renewal process formula
tion with G,(t)= 1 F(t) There are
m
two relevant integral equations. Let pD(t)
be the probability of a criterion arrest in
the window Et,t + b] for the (F,G) delayed
formulation, while p(t) is the same quan-
tity for the standard (F,F) formulation.
pD(t) = G(t + &) - G(t)
rt
Jo
p(t - x)dG(x),
+ J p(t- x)dF(x).
o
The second equation was solved earlier,
and the resulting p(t) can be substituted
into the first to find pD(t). We take G'(t) =
t1 - F(t)llm and do extensive algebra to
find
L1 - F(u)ldu
PD(t) = t ~ O.
m
which is independent of t.
The expression for pD(t) can be used as
a correction factor for the likelihood func-
tion. A given individual will have a crim-
inal record that provides an enumeration
of arrests that occurred prior to the win-
dow Et,t + hi, as well as those that oc-
curred within the window. There will, of
course, be at least one arrest within the
window. The likelihood function will be
constructed by multiplying the densities
for the observed inter-event times; how-
ever, it must be modified to account for
the presence of at least one event in Et,t +
tel. This entails a division ofthe likelihood
by pD(t).
We can consider the effect of this factor
pD(t) on the posterior distribution of the
parameters of the superpopulation. The
posterior distribution will be propor-
tioned to h(~)LlpD(t), where he) is the
prior density of the superpopulation and
L represents the likelihood function.
An informative special case occurs
when ~ is small so that pD(t) is approxi
400
mately 8/m. The posterior distribution of
~ is proportional to h/~)mLIb or h(~)mL.
The extra factor m accounts for the sam-
pling bias and weights the distribution
more in favor of larger values of 0.
An example will help to illustrate the
utility of this calculation. Suppose the
arrest process is Poisson with parameter
A, and A is treated as a random variable
with distribution h. If we restrict atten-
tion to individuals with an arrest in Et,t +
b] for small &, the posterior distribution of
A when corrected for this sampling plan
will contain an extra factor of 1/A (since m
= 1/A). This will tend to reduce the
weight on large A and counteracts the
artificially inflated likelihood. For exam-
ple, suppose the prior were to have a
gamma (cr,,`3) distribution. The posterior
would be corrected to a gamma (or - 1,,B)
distribution and then used with the like-
lihood function, which has been inflated
by the required window arrests. This cor-
rection is closely related to the length-
biased sampling phenomenon of renewal
theory. This posterior representation al-
lows one to correct for the biases intro-
duced by using only individuals with an
arrest in the particular time window.
As ~ increases, the size of the biasing
effect is reduced, and, assuming an equi-
librium formulation is measured by {r0 F1
-F(x)ldxT/m for any b. For large &, this
factor is near 1.
Biases in Samples of Prisoners
Two other biases can arise in sampling
and analyzing criminal justice data sets
involving prisoners. First, individuals are
generally sentenced to prison as a result
of a high frequency of offenses. Individu-
als with a high observed offense rate are
much more likely to be imprisoned than
comparable individuals with a lesser ob-
served offense rate. Since individuals
with high propensities to commit crimes
will in general have high empirical of
CRIMINAL CAREERS AND CAREER CRIMINALS
Sense rates, this group can be expected to
be overrepresented in prison popula-
tions. Data drawn from prisoners are,
therefore, not representative of the of-
fender population. A hierarchical model
can, however, help to understand and
correct for this bias.
A second issue concerns the stochastic
nature of the crime process. Imagine two
individuals with the same crime-com-
mitting propensity but different sample
paths. The individual with the higher
empirical frequency of offenses is more
likely to be caught and sentenced. One
may infer a higher crime rate for this
individual than is actually appropriate,
since individuals tend to be caught after a
spurt of activity.
This second type of bias has been the
subject of a recent lively debate. The
controversy has been fueled by a paper of
Maltz and PolIack (1980), which chal-
lenges the results of Murray and Cox
(19791. The controversy centers on the
evaluation of certain treatment programs
for juveniles. It was noted empirically
that juveniles selected for certain treat-
ment programs exhibited a steep rise in
the rate of police contact per unit time
before admission. Surprisingly, these ju-
veniles then exhibited a substantially di-
minished contact rate after admission to
the program. The strong drop in contact
rate after admission to the program has
been called a "suppression effect" and
was attributed by Murray and Cox solely
to the success of the program.
This positive interpretation has been
challenged by Maltz and PolIack (19801.
They argue that the results could have
been an artifact of a decision rule used by
judges. Specifically, Maltz and PolIack
assume that all individuals have the same
value of A. They posit a selection rule
whereby an individual is placed in a
treatment program at a time t provided
he experiences a contact at t and has at
least k other contacts in the last ~ time
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
units. At the time an individual is placed
in a program, he will exhibit a contact-
rate significantly higher than A (of course,
this clepencis on A and k). If ~ is taken to
be random with some appropriate clistri-
bution, the theoretical contact-rate curve
matches the data very well. This is done
under the assumption of a common A. The
judicial decision rule produces the effect,
not the treatment program. Once in~livid-
uals are placed in the program, the rate
returns to its normal, Tower value.
The impact of the work of MaTtz and
PolIack was reducer! by Tierney (1983),
who pointed out an error in their analysis.
They had, in fact, not correctly calculated
the theoretical contact rate prior to assign-
ment to the program. No firm conclusions
have been reached by any of the authors.
A recent paper by PolIack and Farrell
(19~34) added some limited insight into
the analysis but slid not help to interpret
these data.
It is very reasonable to assume in both
a juvenile ant! an aclult context that sen-
tencing is based on the type of crime
committed, the number of crimes com-
mittecI, and the recent crime-committing
behavior. If an incliviclual has committed
three crimes, he might or might not be
sentenced to prison. If the three crimes
were bunched near each other, commit-
ment to prison is much more likely than if
the previous crimes were committed over
a long periocI. The decision rule articu-
lated by Maltz and PolIack (1980) is quite
reasonable. However, Maltz and PolIack
and Tierney shouIc3 have pair! much
closer attention than they did to the time
at which the crimes were committed.
Suppose we assume an inclividual begins
the crime process at age 12 ancl is sen-
tencec3 according to the Maltz and PolIack
rule for some values of ~ and k. For a
given value of A, we can compute the
distribution of the time (age) T at which
the incliviclual will first be sentenced.
Clearly a large value of A tends to result in
407
small T. since the offender commits
crimes at a high rate. Conversely, if we
observe T and attempt to infer A, a large T
tends to be associated with a small A.
There is information about A in T.; how-
ever, MaTtz and PolIack and Tierney ig-
nore this information by putting all indi-
viduals on a common time scale with time
O representing the time of admission to
the treatment program regardless of the
actual age of the indiviclual.
The inclusion of the time random vari-
able can help to correct biases. Let us
assume an individual begins a contact
process at time 0. This might be age 12 for
juveniles or age 18 for aclults. Assume that
contacts occur according to a Poisson
process. We imagine that sentencing oc-
curs at the time of the first contact having
the property that there are also k other
contacts within ~ units of time. We now
compute the density function associated
with the time of the sentencing event.
We assume the Poisson process has
parameter c. This refers to the contact
process in the juvenile context or the
parameter of the exponential times be-
tween convictions in an adult case. We
wish to make inferences about c.
For clarity, we adopt a simple assump-
tion concerning the sentencing rule. We
assume that sentencing
~ never occurs on the first contact,
· occurs on the second contact only if
the first was within the prior ~ time units,
and
~ always occurs on the third contact, if
not before.
This rule is reasonable, is in the spirit
of Waltz and PolIack, and can be ex-
tendec] to more complex versions. Unlike
the problems encountered in PolIack and
Farrell (1984), calculations can be carried
out for more complicated versions of this
rule. First, we assume that individuals
have drawn their rate parameter c at ran-
(lom from a parent distribution htc). We
402
focus on the group of individuals (in ei-
ther the juvenile or adult context) who
receive a first sentence ant] note the value
of T for each. Notice that our data set is
restricted! to those who receive a sen-
tence. This means that we will have a
much greater likelihood of selecting a
high c individual than a Tow c incliviclual.
Interestingly, the presence ofthe value of
T will have a modifying influence. If T is
small, the value of c is even more likely to
be large, since the offender was sen-
tencec3 at the beginning of the career. If T
is moderate to large, we should find that c
is only slightly elevatecI. The individual
was sentenced, but it took a Tong time to
meet the criteria. If T is large, c should be
small, since only Tow-rate offenders can
avoid sentencing for a prolonged time
period under this sentencing rule. We
can quantify these heuristic comments by
computing the density of T given C, fT(t).
There are two cases to consider.
In the first case t C 7, sentencing would
occur on the second offense in (0,;~. The
time to the second offense has a gamma
(2,c) density, so the density has the form
c2te-ct 0 < t < ~
After A, sentencing can occur in two
ways. First, if To is the time ofthe first arrest
and T2 is the time of the second arrest,
incarceration will occur if T2 - To c a, i.e.,
if the individual falls into the window.
Here T2 = t. Second, if T2 - To > a, the
individual does not fall in the Winslow, and
sentencing will occur on the third arrest.
Suppose that arrests form a renewal
process with interevent clensityf fin this
example, we assume f is exponential Ices.
For t ~ 7, the density of the time T at
which the contact or conviction leacling to
sentencing occurs has density
fT(t)= I Upset- sills+ J fist
o
t-~
Jt
flu s) fit - U)]U US.
s + ~
CRIMINAL CAREERS AND CAREER CRIMINALS
Fords) = ce-Cs, we find
i C2 Te-Ct + 3 ~ t - T) - _
One can see that the heuristics men-
tioned earlier do indeed hoist. For exam-
ple, if c has a prior gamma (a, ,B) (listribu-
tion, the posterior distribution of c after
observing T - t would be a gamma (cr +
2, ,8 + t) distribution if t ' r. If t ~ a, c has
a posterior distribution given by a mix-
ture of gamma distributions, specifically
with probability p = ~/~; + Act + 21(t -
;12/~2(,l3 + tall, it is gamma (a + 2, j3 + t),
and with complementary probability 1 -
p it is gamma (a + 3, ,l3 + t). The posterior
mean, E(c~T = t), is given by
a+ 2
E(c~T = t) = , t ' ~
/a + 2 /a + 3
P tp + t + (1 - P) 1,~ + t , t > a.
The conditional mean is thus larger than
E(c) = JIB for small T = t, but smaller for
large t. This shows that the individuals
should not be placed on a common time
scale but analyzed separately using this
hierarchical approach. One can update
the prior distribution on c, is, and esti-
mate all the individual c's. These can
then be compared with the empirical ar-
rest records subsequent to intervention to
gain some insight into program effective-
ness.
SUMMARY AND SUGGESTIONS FOR
FURTHER RESEARCH
This paper has introduced two innova-
tions to the quantitative modeling of
criminal justice problems: a general
structure of hierarchical models and a
new stochastic model of a criminal career.
These models allow one to distinguish
RANDOM PARAMETER STOCHASTIC-PROCESS MODELS
variation between individual offenders
and variations within an individual ca-
reer. This is important given the very
large variability in the o~encler popula-
tion. In addition, the hierarchical ap-
proach allows one to correct for natural
biases in a data set. Biases can arise be-
cause the sampling is not at random from
the offender population but rather is con-
ditional on some event. For example, one
might consider a set of arrestees or pris-
oners. This group tends to contain higher
rate offenders than wouIc! be seen in the
general population. The new stochastic
model of a criminal career offers three
advantages over the standard renewal-
process models in common use. First, it
introduces a two-state approach in which
there are periods of high and Tow activity.
Second, the "active crime set" approach
results in a natural age eEect in which an
o~ender's average crime-commission rate
diminishes over time. Third, the mode]
allows for some decision making on the
part of the o~encler.
There are a number of ways in which
one could consider extending the sto-
chastic-process moclel. The major need is
to include imprisonment and behavioral
changes that arise from imprisonment.
Indeed, this author wouIc] like to include
explicitly a parameter or parameters that
allow for a change in the Fir and active-
crime-set distributions depending on the
fact of and length of sentencing.
It appears that the next step is to fit this
new class of models with data, after first
correcting natural biases in the data. This
should enable us to leam about the ex-
planatory power in this class of models
ant! to determine to what class of phase
distributions the interevent (crime or ar-
rest) times can be restricted. Data are also
needled to begin to determine an appro-
priate class of superpopulation distribu-
tions, not only for the parameters that
determine the phase distributions but
also for the other behavioral parameters
and the initial active-crime set. Finally,
403
the possibility of combining the hazarc3-
function approach of Flinn and Heckman
with the phase-dis~ibution approach
given in this paper should be explored.
REFERENCES AND BIBLIOGRAPHY
Avi-Itzhak, B., and Shinnar, R.
1973 Quantitative models in crime control. Jour-
nal of Criminal Justice 1 :18~217.
Barton, R. R., and Turnbull, B. W.
1981 A failure rate regression model for the study
of recidivism. Pp. 81-101 in I. A. Fox, ea.,
Models in Quantitative Criminology. New
York: Academic Press.
Chaiken, J. M., and Rolph, J. E.
1980 Selective incapacitation strategies based on
estimated crime rates. Operations Research
28: 1259-1274.
Copas, J. B.
1983 Regression, prediction and shrinkage. Jour-
nal of the Royal Statistical Society B
45:311-354.
Deely, J. J., and Lindley, D. V.
1981 Bayes, empirical Bayes.Journal oftheAmer-
ican Statistical Association 76: 833~41.
Dempster, A. P., Rubin, D. B., and Tsutakawa, R. K.
1981 Estimation in covariance components mod-
els. Journal of the American Statistical As-
sociation 76:341-353.
Flinn, C. J., and Heckman, J. J.
1982a Models for the analysis of labor force dynam-
ics. Advances in Econometrics 1:3~95.
1982b New methods for analyzing individual event
histories. Pp. 9~140 in S. Leinhardt, ea.,
Sociological Methodology. San Francisco:
Jossey-Bass.
1983 The Likelihood Function for the Multistate-
multiepisode Model in "Models for the
Analysis of Labor Force Dynamics." Discus-
sion Paper Series 83-10, Economic Research
Center. Chicago, Ill.: National Opinion Re-
search Center.
Harris, C. M., and Moitra, S. D.
1979 Improved statistical techniques for the mea-
surement of recidivism.Journal of Research in
Crime and Delinquency 16:194-213.
Harris, C. M., Kaylan, A. R., and Maltz, M. D.
1981 Recent Advances in the Statistics of llecidi-
vism Measurement. Pp. 61-79 in J. A. Fox,
ea., Models in Quantitative Criminology.
New York: Academic Press.
Harville, D. A.
1977 Maximum likelihood approaches to variance
components estimation and to related prob-
lems. Journal of the American Statistical
Association 72:32(}340.
404
Holden, R. T.
1983 Failure Time Models for Criminal Behav-
ior. Department of Sociology, Yale Univer-
sity.
James, W., and Stein, C.
1961 Estimation with quadratic loss. Pp. 361-379
in Proceedings of the Fourth Berkeley Sym-
posium. Sol. 1. Berkeley: University of Cal-
ifornia Press.
Karlin, S., and Taylor, H. M.
1975 A First Course in Stochastic Processes. 2nd
ed. New York: Academic Press.
Kelly, F. P.
1979 Reversibility and Stochastic Networks. New
York: John Wiley & Sons.
Maltz, M. D., and McCleary, R.
1977 The mathematics of behavioral change: re-
cidivism and construct validity. Evaluation
Quarterly 1:421-438.
Maltz, M. D., and Pollack, S. M.
1980 Artificial inflation of a delinquency rate by a
selection artifact. Operations Research 28:
547-559.
Morris, C. N.
1983 Parametric empirical Bayes inference: the-
ory and applications. Journal of the Ameri-
can Statistical Association 78:47~5.
CRIMINAL CAREERS AND CAREER CRIMINALS
Murray, C. A., and Cox, L. A., Jr.
1979 Beyond Probation. Vol. 94. Sage Library of
Social Research. Beverly Hills, CaliŁ: Sage
Publications.
Neuts, M. F.
1981 Matrix Geometric Solutions in Stochastic
Models: An Algorithmic Approach. Balti-
more, Md.: Johns Hopkins University Press.
Peterson, M. A., and Braiker, H. B., with Polick,
S. M.
1981 Who Commits Crimes: A Survey of Prison
Inmates. Cambridge, Mass.: Oelgeschlager,
Gunn, and Hain.
Pollack, S. M., and Farrell, R. L.
1984 Past intensity of a terminated Poisson proc-
ess. Operations Research Letters 2:261-
263.
Rolph, J. E., Chaiken, J. M., and Houchens, R. E.
1981 Methods for Estimating Crime Rates of Indi-
viduals. Report R-2730-NIJ. Santa Monica,
Calif.: Rand Corporation.
Stollmark, S., and Harris, C. M.
1974 Failure-rate analysis applied to recidivism
data. Operations Research 22:1192-1205.
Tierney, L.
1983 A selection artifact in delinquency data re-
visited. Operations Research 31:852~65.