6
Forecasting Crime: A City-Level Analysis

John V. Pepper

It’s tough to make predictions, especially about the future.

Yogi Berra

INTRODUCTION

Over the past three decades, a handful of criminologists have tried unsuccessfully to forecast aggregate crime rates. Long-run forecasts have been notoriously poor. Crime rates have risen when forecasted to fall (e.g., the mid-1980s) and have fallen when predicted to rise (e.g., the 1990s).1 Despite the need these difficulties suggest, there is little relevant research to guide future forecasting efforts. Without a developed body of methodological and applied research in forecasting crime rates, errors of the past are likely to be repeated.

In this light, I explore some of the practical issues involved in forecasting city-level crime rates using a common panel dataset. In particular, I focus on the problem of predicting future crime rates from observed data, not the problem of predicting how different policy levers impact crime. Although clearly important, causal questions are distinct from the forecasting question considered in this chapter. Research on cause and effect must address the fundamental identification problem that arises when trying to predict outcomes under some hypothetical regime, say new sentencing or policing practices. My more modest objective is to examine whether historical time-series data can be used to provide accurate forecasts of future crime rates.

To do this, I analyze forecasts from a number of basic and parsimoniously specified mean regression models. While the problem of effectively

1

Land and McCall (2001) and Levitt (2004) review and critique the crime forecasting literature.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 177
6 Forecasting Crime: A City-Level Analysis John V. Pepper It’s tough to make predictions, especially about the future. Yogi Berra INTRODUCTION Over the past three decades, a handful of criminologists have tried unsuccessfully to forecast aggregate crime rates. Long-run forecasts have been notoriously poor. Crime rates have risen when forecasted to fall (e.g., the mid-1980s) and have fallen when predicted to rise (e.g., the 1990s).1 Despite the need these difficulties suggest, there is little relevant research to guide future forecasting efforts. Without a developed body of methodologi- cal and applied research in forecasting crime rates, errors of the past are likely to be repeated. In this light, I explore some of the practical issues involved in forecasting city-level crime rates using a common panel dataset. In particular, I focus on the problem of predicting future crime rates from observed data, not the problem of predicting how different policy levers impact crime. Although clearly important, causal questions are distinct from the forecasting ques- tion considered in this chapter. Research on cause and effect must address the fundamental identification problem that arises when trying to predict outcomes under some hypothetical regime, say new sentencing or policing practices. My more modest objective is to examine whether historical time- series data can be used to provide accurate forecasts of future crime rates. To do this, I analyze forecasts from a number of basic and parsimoni- ously specified mean regression models. While the problem of effectively 1 Land and McCall (2001) and Levitt (2004) review and critique the crime forecasting literature. 

OCR for page 177
 UNDERSTANDING CRIME TRENDS forecasting crime may ultimately require more complex models, there is ample precedent for applying simple alternatives (Baltagi, 2006; Diebold, 1998).2 I thus focus on basic linear models that do not allow for structural breaks in the time-series process, do not incorporate cross-state or cross- crime interactions, and include only a small number of observed covariates. Finally, I focus on point rather than interval forecasts. Sampling variability plays a key role in forecasting, but a natural starting point is to examine the sensitivity of point forecasts to different modeling assumptions. Thus, my focus is on forecasting variability across different models. Adding confidence intervals will only increase the uncertainty associated with these forecasts. I begin by considering the problem of forecasting the national homicide rate. This homicide series lies at the center of much of the controversy sur- rounding the few earlier forecasting exercises that have proven so futile. Using annual data on homicide rates, I estimate a basic autoregressive model that captures some important features of the time-series variation in homicide rates and does reasonably well at shorter run forecasts. As for the longer run forecasts, the statistical models clearly predict a sharp drop in crime during the 1990s, but they fail to forecast the steep rise in crime during the late 1980s. After illustrating the basic approach using the national homicide series, I then focus on the problem of forecasting city-level crime rates. Using panel data on annual city-level crime rates for the period 1980-2000, I again estimate a series of autoregressive lag models for four different crimes: homicide, robbery, burglary, and motor vehicle theft (MVT). Data for 2001-2004 are used for out-of-sample analyses. The key objective is to compare the performance of various city-level forecasting models. First, I examine basic panel data models with and with- out covariates and with and without autoregressive lags. Most importantly, I contrast the homogeneous panel data model with heterogeneous models in which the process can vary arbitrarily across cities. I also consider two naïve models, one in which the forecast simply equals the city-level mean or fixed effect—the best constant forecast—and the other in which the forecast equals the last observed rate—a random walk forecast. In addi- tion to considering the basic plausibility of the various model estimates, I examine differences in prediction accuracy and bias over 1-, 2-, 4-, and 10-year forecast horizons. 2 Diebold refers to this idea as the parsimony principle; all else equal, simple models are pref- erable to complex models. Certainly, imposing correct restrictions on a model should improve the forecasting performance, but even incorrect restrictions may be useful in finite samples. Simple models can be more precisely estimated and may lessen the likelihood of overfitting the observed data at the expense of effective forecasting of unrealized outcomes. Finally, empirical evidence from other settings reveals that simpler models can do at least as well and possibly better at forecasting than more complex alternatives.

OCR for page 177
 FORECASTING CRIME I found considerable variability in the parameters and forecasting per- formance across models, cities, crimes, and horizons. While there is evi- dence of heterogeneity across cities, heterogeneous models do not perform notably better than the homogeneous alternatives. A naïve random walk forecasting model performs quite well for shorter run forecast horizons, but the regression models are superior for longer horizon forecasts. Finally, I use the basic homogeneous panel data models to provide point forecasts for city-level crime rates in 2005, 2006, and 2009. This out-of-sample forecasting exercise reveals predictions that are sensitive to the covariate specification. All models generally indicate modest changes in city-level crime rates over the next several years. However, forecasts found using one model imply that city-level crime rates will tend to increase over the remainder of the decade, whereas forecasts from another model imply that crime rates will fall. In closing, I draw conclusions about the limitations of forecasting in general and the specific problems associated with forecasting crime. Fore- casting city-level crime rates appears to be a volatile exercise, with few generalizable lessons for how best to proceed. NATIONAL HOMICIDE RATE TRENDS While my primary interest is to forecast city-level crime rates, I begin by considering the national time series in homicide rates. Some of the basic issues involved in forecasting crime can be illustrated effectively by consid- ering this single national time series. Attempts to forecast this series in the 1980s and 1990s have been notoriously inaccurate. Using data on annual homicide rates per 100,000 persons from the National Center for Health Statistics, I display the annual time series in the log rate for 1935-2002 in Figure 6-1.3 The series appears to be quite persistent over time, with some periods of fluctuation and notable turns. From 1935 until around 1960, the homicide rate tended downward and then began sharply rising, reaching a peak of just over 10 homicides per 100,000 (log rate of 2.31) in 1974. Over the next 15 years, homicide rates fluctuated between 8 and 10 per 100,000 (log rates between 2.13 and 2.33) and then unexpectedly began to sharply and steadily fall in the 1990s. By the end of the century, the homicide rate hit a 34-year low of 6.1 per 100,000 (log rate of 1.81). 3 Datacome from the National Center for Health Statistics and were downloaded in January 2007 from the Bureau of Justice Statistics Historical Crime Data Series at http://www.ojp. usdoj.gov/bjs/glance/tables/hmrttab.htm. The victims of the terrorist attacks of September 11, 2001, are not included in this analysis. Some concerns have been raised about the reliability of the annual time-series data on crime prior to 1960, but the effect on homicide trends is thought to be minimal. For further discussion of these issues, see Zahn and McCall (1999).

OCR for page 177
0 UNDERSTANDING CRIME TRENDS 2.4 2. 2 National Homicide Rate 2.0 1.8 1.6 1.4 1940 1960 1980 2000 Year FIGURE 6-1 National annual homicide rate, 1935-2002. Figure 6-1, edtable A variety of demographic, economic, and criminal justice factors are known to be correlated with this series and have been used to predict aggregate crime rates. Demographic characteristics of the population— namely, gender, age, and race distributions—have all played a primary role in crime forecasting models (see Land and McCall, 2001). Criminal justice policies, including the number of police and the incarceration rates, are also thought to be important factors in explaining aggregate crime rates and trends. Macroeconomic variables appear to play only a modest role in explaining aggregate crime rates, especially for violent crimes such as homicide (Levitt, 2004). For this study, I use two primary covariates, the percentage of the population who are 18-year-old men and the fraction of the population (per 100,000) that is incarcerated.4 Figure 6-2 displays the time series for these two random variables along with the homicide rate series. All three series are normalized to be relative to a 1935 base. This figure reveals that 4 Dataon the population size and demographics comes from the U.S. Census Bureau, and year-end incarceration counts for prisoners sentenced to more than one year were obtained from the Bureau of Justice Statistics. The national incarceration series can be found at http:// www.census.gov/statab/hist/HS-24.pdf.

OCR for page 177
 FORECASTING CRIME 1.4 1. 2 Ratio, 1935 Base Year 1.0 .8 .6 1940 1960 1980 2000 Year Log-Homicide Log-Incarceration Males, 18 years FIGURE 6-2 Homicide, incarceration, and demographics, 1935-2002. Figure 6-2, editable the fraction of young men (18-year-olds) is closely related to the homicide rate. In contrast, the variation in incarceration rates does not mirror the analogous variation in crime rates. Rather, incarceration rates tended to increase over the entire century, with the sharpest increases beginning in the mid-1970s. The notable exceptions are during peak draft years during World War II and the Vietnam War. I follow the convention in the literature by taking the natural logarithm of the crime and incarceration rates. I estimate the regression models using the annual data for 1935-2000, leaving out pre-1935 data because accurate homicide rate and covariate information is not readily available, and the post-2000 data to assess forecasting performance. The means and standard deviations of the variables used in the analy- sis are displayed in Table 6-1. Figures for 2001-2002 are separated out, as these data are not used to estimate the model. Notice the difference between the historical series for 1935-2000, in which mean log-homicide rate equals 7.26 per 100,000 persons, and the 2001-2002 rate, which is over one point less.

OCR for page 177
 UNDERSTANDING CRIME TRENDS TABLE 6-1 Means and Standard Deviations by Selected Years for the National Homicide Rate Series Data 1935-2000 2001-2002 Variable Mean SD Mean Homicide rate 7.26 2.00 6.10 Log-homicide rate 1.94 0.28 1.81 Log-incarceration rate 4.98 0.48 6.16 Fraction male, 18 0.008 0.001 0.007 N 66 2 The Best Linear Predictor To forecast the homicide series in Figure 6-1, I fit the following auto- regressive regression model: yt = γ 1yt −1 + γ 2 yt − 2 + xt β + ε t , (1) where yt is the log-homicide rate in year t, xt is a 1xK vector of observed covariates, and εt is an iid unobserved random variable assumed to be uncorrelated with xt.5 Finally, {γ, β} are unobserved covariates that are consistently estimated using least squares. Table 6-2 displays estimates and standard errors from two variations on this specification: Model A includes the AR(2) lags and Model C pres- ents estimates from the full unrestricted specification. Consistent with Fig- ure 6-1, there is a strong autoregressive component to the series, with the period t homicide rate being strongly associated with the lagged rates. In the unrestricted Model C, the regression coefficients associated with the incar- ceration rate are positive, small, and statistically insignificant. Likewise, the coefficient on the demographic variable is statistically insignificant and relatively small in magnitude. In-Sample Forecasts How well does this model do at forecasting crime in the 1980s and 1990s? Figure 6-3 presents the predicted series under different starting dates 5 Several statistical tests were used to aid in the selection of the specification in Equation (1). Based on a visual inspection of the correlogram and on an augmented Dickey-Fuller test, I found no evidence of a unit root in the homicide series. Thus, there appears to be no need to difference this series. The AR(2) model was then selected using the AIC and BIC criteria, among the class of ARIMA(3,0,3) models. Finally, McDowall (2002) provides evidence favor- ing a linear specification over a number of nonlinear alternatives.

OCR for page 177
 FORECASTING CRIME TABLE 6-2 National Homicide Rate Regression Model Estimates and Standard Errors Model A Model C 1.46 1.42 yt–1 (0.12) (0.12) –0.50 –0.48 yt–2 (0.12) (0.13) Ln(inc) 0.01 (0.03) Fraction male, 18 11.38 (11.25) RMSE 0.06 0.06 R2 0.96 0.96 N 64 64 NOTE: Ln(inc) = log-incarceration rate; yt = log-homicide rate in year t. for the forecast. In Panel A, the forecasted series begins in 1981 (i.e., 1980 is assumed to be the last observed year), in Panel B the forecasts begin in 1986, in Panel C the forecasts begin in 1991, and finally, in Panel D, the series begins in 1996. In each case, the forecasts are dynamic in yt–1 and yt–2; the forecasted lagged homicide rates, not the actual rates, are used. Importantly, these forecasts are not dynamic in the covariates; for Model C, actual covariate data are used for all forecasts. The forecasts beginning in 1981 (Panel A) and 1986 (Panel B) have the same qualitative errors found in the predictions made nearly three decades ago. In particular, the model forecasts a steady drop in homicide rates throughout the 1980s, yet the actual rates rose in the late 1980s. Ultimately, the ability of this model to effectively forecast crime depends on observed relationships continuing into future periods. The model can- not effectively capture new phenomena, such as the rise or fall of new drug markets. What, then, should forecasters have predicted at the start of the 1990s? Is one to believe that the mid-1980s were just a deviation from the norm, or that there had been a regime shift? Normal deviations and turns in a series are notoriously difficult to predict, and the 1980s might be nothing more. If so, then the historical time series might have been used to accu- rately forecast crime in the 1990s, even if it mischaracterized crime trends in the late 1980s. Instead, however, the forecasting errors in the 1980s might have reflected a structural change in the time-series process that cannot be identified by the historical data. With hindsight, one can see that the forecasts made for the 1990s based on the historical series are relatively accurate. Crime is forecasted to fall

OCR for page 177
 UNDERSTANDING CRIME TRENDS A 2.4 2. 2 Log-Homicide Rate 2.0 1.8 1970 1980 1990 2000 Year Log-Homicide Rate Model A Forecasts: Lag Only Model B Forecasts: Full Model B 2.4 Figure 6-3a, editable 2. 2 Log-Homicide Rate 2.0 1.8 1970 1980 1990 2000 Year Log-Homicide Rate Model A Forecasts: Lag Only Model B Forecasts: Full Model FIGURE 6-3 Realized and forecasted homicide rates. Figure 6-3b, editable

OCR for page 177
 FORECASTING CRIME C 2.4 2. 2 Log-Homicide Rate 2.0 1.8 1970 1980 1990 2000 Year Log-Homicide Rate Model A Forecasts: Lag Only Model B Forecasts: Full Model D 2.4 Figure 6-3c, editable Log-Homicide Rate 2. 2 2.0 1.8 1970 1980 1990 2000 Year Log-Homicide Rate Model A Forecasts: Lag Only Model B Forecasts: Full Model FIGURE 6-3 Continued Figure 6-3d, editable

OCR for page 177
 UNDERSTANDING CRIME TRENDS (see Figure 6-3c), although the regression models miss the steepness of the realized decline. Thus, the historical time series, as modeled in Equation (1), are sufficient to predict the direction if not the full magnitude of the drop in homicide rates during the 1990s. These general patterns are consistent with the hypothesized notion of a short “bubble” in the homicide rate that was induced by violence associated with the crack cocaine epidemic in the 1980s (Land and McCall, 2001). A more systematic evaluation is found by measuring the errors associ- ated with different fixed forecast horizons. In particular, I compute the two- and five-year-ahead forecasts for each year from 1980 to 2002. Given these predictions, I then report measures of forecast bias and accuracy. I compute mean error (ME) as an indicator of the statistical bias of the forecasts and the root mean squared error (RMSE) and mean absolute error (MAE) as measures of the accuracy of the forecasts (Congressional Budget Office, 2005). The MAE and RMSE show the size of the error without regard to sign, with RMSE giving more weight to larger errors. If small errors are less important, the RMSE error will give the best indication of accuracy. Also, as a different indictor of systematic one-sided error or forecasting bias, I compute the fraction of positive errors (FPE). Table 6-3 displays the realized log-homicide rates and the two- and five- year-ahead forecasts for each year from 1980 to 2002. I also include fore- casts derived from a naïve random walk model that uses the last observed rate to predict future outcomes. So, in the two-period-ahead analysis, the naïve forecast would be the rate observed two periods prior, and likewise the five-period-ahead forecast is the rate in period t–5. The bottom rows of Table 6-3 display the four summary measures of bias and accuracy of the forecasts. Several general conclusions emerge from the results displayed in this table. First, as expected, the two-period-ahead forecasts are more accurate than the five-period-ahead counterparts. The RMSE for the two-period- ahead forecasts is about 0.10 and the MAE is around 0.09, whereas for the five-period-ahead forecasts these measures are around 0.18 and 0.15, respectively. For comparison, the RMSE for the in-sample predictions is about 0.06 (see Table 6-2). Second, in general, the forecasting models out- perform the naïve random walk model, especially for the longer run fore- casts. For the shorter two-period-ahead forecast, the naïve model performs nearly as well as the AR(2) model in Equation (1). For the shorter horizons, the differences in forecasting performance across these three models appear small and, to a large degree, may reflect sampling variability. Finally, during this 23-year period, the forecasting models consistently underpredict during the period from 1989 to 1994 and overpredict homicide rates after 1995.

OCR for page 177
 FORECASTING CRIME Out-of-Sample Forecasts In Figure 6-4, I display the actual log-homicide series and the one-step- ahead predictions for each year from 1970 to 2000. These in-sample pre- dictions nearly match the realized crime rates; the regression model closely fits the observed data. I also display dynamic forecasts of the homicide rate for 2001-2010. For these forecasts, I assume that last observed year is 2000.6 In this setting, forecasts of the homicide series are sensitive to variations in the choice of explanatory variables included in the regression model. Both models predict relatively modest changes to the homicide rate over the period, yet they have different qualitative implications. The Model A forecasts imply that homicide rates will continue to fall during the period, whereas the Model C forecasts suggest that homicide rates will increase. FORECASTING CITY-LEVEL CRIME RATES To forecast city-level crime rates, the Committee on Law and Justice provided a panel dataset of annual crime rates in the 101 largest U.S. cities (approximately all cities with greater than 200,000 persons) over the period 1980-2004.7 The data consist of rates of homicide, robbery, burglary, and motor vehicle theft as measured by the Federal Bureau of Investigations Uniform Crime Reports. The data also include annual measures of drug- related arrest, state-level incarceration rates, and the number of police per 100,000 population. I supplemented these data with annual county-level demographic information on the fraction of the population who are men ages 20-29 and ages 30-39 and the natural logarithm of the total county population. As with the national series, I follow the convention in the literature by taking the natural logarithm of the crime, incarceration, and policing rates. Using these data, I provide out-of-sample city-level forecasts for 2005 and 2006. When providing out-of-sample forecasts, one must either pre- dict contemporaneous covariates or use lagged covariates in the forecasting model. I use lagged covariates. That is, to address the practical problem that arises when forecasting using covariates, I lag all covariates by two periods. Given this lag structure, I estimate the models using data for 1982- 2000, leaving out pre-1982 data to incorporate the lagged covariates and the post-2000 data to assess forecasting performance. Thus, for each of the 6 For the Model C forecasts, observed covariate data from 2001 and 2002 are used in the corresponding forecasts. Unobserved covariate data for 2003-2010 are assumed to be un- changed from the 2002 realizations. 7 Most of the crime data from Kansas City are missing. Thus, while there are 101 cities in this sample, Kansas City is dropped from most of the analysis.

OCR for page 177
00 UNDERSTANDING CRIME TRENDS 5. 2 5.0 Log-Robber y Rate 4.8 4.6 4.4 1985 1990 1995 2000 2005 Year Log-Robber y Rate Homogenous Model Forecasts Heterogeneous Model Forecasts FIGURE 6-5c Realized and forecasted log-robbery rates, Madison. 7.5 Figure 6.5c 7.0 Log-Robber y Rate 6.5 6.0 5.5 5.0 1985 1990 1995 2000 2005 Year Log-Robber y Rate Homogenous Model Forecasts Heterogeneous Model Forecasts FIGURE 6-5d Realized and forecasted log-robbery rates, New York. 6.5d

OCR for page 177
0 FORECASTING CRIME 6.6 6.5 Log-Robber y Rate 6.4 6.3 6. 2 6.1 1985 1990 1995 2000 2005 Year Log-Robber y Rate Homogenous Model Forecasts Heterogeneous Model Forecasts FIGURE 6-5e Realized and forecasted log-robbery rates, Richmond. Figure 6-5e, editable 7.0 Log-Robber y Rate 6.5 6.0 1985 1990 1995 2000 2005 Year Log-Robber y Rate Homogenous Model Forecasts Heterogeneous Model Forecasts FIGURE 6-5f Realized and forecasted log-robbery rates, San Francisco. Figure 6-5f, editable

OCR for page 177
0 UNDERSTANDING CRIME TRENDS robbery rates over the four-year period, whereas the heterogeneous model leads to the opposite conclusion. Realized robbery rates over this four-year period closely track the forecasts from the homogeneous model in Denver and from the heterogeneous model in San Francisco, and they lie between the two forecasts for New York. Clearly, the heterogeneous model does not provide uniformly superior out-of-sample forecasts. Table 6-10 displays the RMSE across all cities for these different models and different forecast horizons. In addition to analyzing the forecasting performance of the models in Equations (2) and (3), I also consider two naïve models, one in which the forecast equals the city-level mean or fixed effect—the best constant forecast—and the other in which the forecast equals the last observed rate—the random walk fore- cast. Finally, I display the RMSE from the one-step-ahead forecasts that, in practice, is only feasible if the period t-1 realization is known (or perfectly forecasted). Each model is used to provide forecasts of annual crime rates for three different overlapping horizons, 2003-2004, 2001-2004, and 1995-2004, and three different starting points, 2002, 2000, and 1994. Thus, dynamic forecasts starting in 1994 are used to make 10-year-ahead predictions for the 2004 crime rate. Importantly, these forecasts are not dynamic in the covariates; for Model C, actual covariate data are used for all forecasts, even those that go beyond two-year-ahead predictions. Many of the findings reported in this table confirm the earlier results. In particular, for shorter run forecasts, the restricted Model A seems to do at least as well as the unrestricted Model C, and both of these homogeneous models provide slightly less accurate forecasts than the naïve random walk model. As before, these differences are modest and may simply reflect sam- pling variability rather than true differences in forecasting performance. In both cases, however, these patterns are not consistent across forecast horizons; models that work relatively well for the shorter run do not neces- sarily provide accurate forecasts for longer horizons. Long-horizon random walk forecasts, for example, perform poorly. The RMSE for the random walk forecasts of homicide rates for 1995-2004, for instance, is 0.52, much greater than the RMSE of 0.41 found using the sample average (i.e., the best constant predictor) and Model A, in which the RMSE is 0.39. Likewise, for longer run forecasting problems, the unrestricted Model C provides more accurate forecasts than the restricted alternative. For exam- ple, the RMSE for the 1995-2004 forecasts of the homicide rate using the unrestricted Model C is 0.33, 0.06 less than the analogous RMSE of the forecasts made from the restricted Model A. This finding, however, may reflect the fact that the long-run (over two years forward) Model C fore- casts utilize realized covariate data. In practice, the necessary covariate data will not be observed.

OCR for page 177
TABLE 6-10 Root Mean Squared Forecast Error for Different Prediction Horizons and Models, All Cities Homicide Robbery Burglary MVT Model 2003-04 2001-04 1995-04 2003-04 2001-04 1995-04 2003-04 2001-04 1995-04 2003-04 2001-04 1995-04 Homogeneous models Lag, No Cov, 2002 0.37 0.24 0.16 0.23 Lag, No Cov, 2000 0.43 0.39 0.32 0.26 0.22 0.19 0.32 0.26 Lag, No Cov, 1995 0.44 0.43 0.39 0.41 0.38 0.31 0.31 0.29 0.25 0.42 0.38 0.32 Lag, No Cov, t–1 0.35 0.34 0.32 0.21 0.18 0.16 0.17 0.15 0.14 0.21 0.19 0.17 No Lag, Cov 0.42 0.39 0.35 0.36 0.34 0.27 0.33 0.32 0.24 0.48 0.46 0.37 Lag, Cov, 2002 0.38 0.24 0.23 0.28 Lag, Cov, 2000 0.42 0.38 0.33 0.28 0.33 0.28 0.43 0.36 Lag, Cov, 1995 0.40 0.36 0.33 0.37 0.33 0.25 0.30 0.27 0.21 0.49 0.43 0.32 Lag, Cov, t–1 0.37 0.35 0.31 0.21 0.20 0.16 0.21 0.19 0.16 0.25 0.23 0.19 Naïve, 2002 0.37 0.20 0.16 0.22 Naïve, 2000 0.42 0.40 0.26 0.21 0.22 0.19 0.30 0.24 Naïve, 1995 0.61 0.60 0.52 0.57 0.54 0.45 0.50 0.48 0.39 0.55 0.51 0.42 Average 0.46 0.45 0.41 0.45 0.42 0.34 0.58 0.57 0.48 0.45 0.42 0.37 Heterogeneous models Lag, No Cov, 2002 0.19 0.19 0.16 0.19 Lag, No Cov, 2000 0.23 0.20 0.21 0.17 0.20 0.16 0.23 0.20 Lag, No Cov, 1995 0.36 0.33 0.30 0.36 0.33 0.29 0.24 0.24 0.22 0.36 0.33 0.30 Lag, No Cov, t–1 0.18 0.16 0.16 0.17 0.15 0.14 0.15 0.13 0.13 0.18 0.16 0.16 NOTES: Lag: Autoregressive lag inluded in the regression. No Cov/Cov Indicates if covariates are included. 1995, 2000, 2002 Last observed year for dynamic forecasts. t–1 Year t–1 is assumed to be observed. This is the one step-ahead forecast. Naïve Forecast equals the crime rate in the “last observed” year, namely 2002, 2000, and 1995. 0 Average Forecast is the city-specific average crime rate from 1980-2000.

OCR for page 177
0 UNDERSTANDING CRIME TRENDS Finally, the added flexibility of the heterogeneous forecasting model in Equation (3) leads to some improvements in forecasting accuracy. As might be expected, the results are especially striking for homicide, in which there is evidence of much heterogeneity in the parameter estimates. Assuming the 2002 log-homicide rate is the last observed data point, the RMSE for the 2003-2004 forecasts is 0.37 using the homogeneous Model A and 0.19 using the heterogeneous alternative. For the other crimes, however, the forecasting gains from the hetero- geneous model are less pronounced. For example, the RMSE for forecasts of burglary rates in 2003-2004 is 0.16 for both models, and the analogous RMSE for motor vehicle theft is 0.23 for the homogeneous model and 0.19 for the heterogeneous alternative. Except for the homicide series, the efficiency gains from the homogeneous model appear to nearly offset any biases due to heterogeneities. Out-of-Sample Forecasts As noted above, I forecasted city-level crime rates using the observations through 2004. For this illustration, I use the panel data models from Equa- tion (2) to provide forecasts of city-level crime rates for 2005 and 2006. I also use Model A to forecasts crime rates in 2009. Without covariate data over this period, long-run Model C forecasts are not feasible. In Table 6-11, I present these out-of-sample forecasts for the six selected cities analyzed throughout this chapter. Except for New York City, the fore- casted changes across these six cities are generally modest. For example, the log-robbery rates in Denver, Knoxville, and Madison are all predicted to change by less than 0.03 points over the five-year period from 2004 to 2009. During the preceding five years, 1999-2004, log-robbery rates increased by 0.23 in Denver and 0.04 in Madison and decreased by 0.14 in Knoxville. The specific changes vary by city and by crime. To see this, notice the five-year-ahead forecasts. In San Francisco, log-robbery rates are forecasted to increase by 0.38 points, whereas forecasts for the other three crime rates are slightly less than the 2004 levels. In Madison, log-homicide rates are forecasted to increase by 0.31 and log-MVT rates by 0.15, whereas the log-crime rates for both robbery and burglary are forecasted to drop over the same period. Finally, there are notable differences in the predictions made from the two models. In several cases, Model A implies an increase in crime, whereas Model C predicts a slight drop, and in nearly every case the Model A fore- casts exceed the Model C counterparts. Overall patterns regarding these forecasts can be found by examining Table 6-12, which displays summary measures for the forecasts in every

OCR for page 177
0 FORECASTING CRIME TABLE 6-11 Homogeneous Model Forecasts for Selected Cities, 2005-2009 2005 2006 2009 Model Model Model 2009- 2004 A C A C A 2004 Homicide Denver 2.73 2.65 2.55 2.61 2.47 2.59 –0.15 Knoxville 2.43 2.51 2.25 2.54 2.20 2.56 0.13 Madison 0.34 0.51 0.25 0.59 0.22 0.64 0.31 New York 1.95 2.47 2.70 2.88 0.93 Richmond 3.86 3.82 3.66 3.81 3.58 3.79 –0.06 San Francisco 2.45 2.45 2.41 2.45 2.40 2.45 –0.01 Robbery Denver 5.54 5.55 5.58 5.56 5.59 5.58 0.03 Knoxville 5.69 5.70 5.60 5.71 5.54 5.72 0.02 Madison 4.89 4.88 4.72 4.88 4.60 4.87 –0.02 New York 5.71 5.94 6.12 6.44 0.73 Richmond 6.53 6.51 6.35 6.49 6.22 6.47 –0.06 San Francisco 5.99 6.11 6.19 6.20 6.32 6.37 0.38 Burglary Denver 7.17 7.14 7.23 7.11 7.27 7.03 –0.14 Knoxville 7.27 7.24 7.01 7.21 6.99 7.14 –0.12 Madison 6.49 6.47 6.41 6.45 6.36 6.41 –0.08 New York 5.78 5.80 5.82 5.88 0.11 Richmond 7.24 7.24 7.21 7.25 7.19 7.26 0.02 San Francisco 6.69 6.67 6.76 6.66 6.81 6.62 –0.07 MVT Denver 7.20 7.17 7.18 7.14 7.15 7.09 –0.11 Knoxville 6.67 6.69 6.61 6.71 6.57 6.75 0.08 Madison 5.54 5.58 5.44 5.61 5.37 5.69 0.15 New York 5.56 5.74 5.89 6.22 0.66 Richmond 7.01 7.09 6.89 7.08 6.73 7.06 –0.03 San Francisco 6.97 6.95 7.01 6.93 7.03 6.90 –0.07 city. In particular, for each crime and each forecast period, I report the mean forecast, the mean predicted change, the fraction of positive predicted changes, the IQR of the predicted change, and the mean absolute predicted change. The results in this table confirm that Models A and C provide different pictures about what to expect for crime in large cities over this period. Fore- casts made using Model A generally imply modest increases (e.g., homicide)

OCR for page 177
0 TABLE 6-12 Homogeneous Model Forecasts Summary, All Cities, 2005-2009 2005 2006 2009 2004 Model A Model C Model A Model C Model A Homicide Mean forecast 2.29 2.42 2.17 2.49 2.14 2.53 Mean change from 2004 0.14 –0.07 0.20 –0.10 0.24 Fraction positive change 0.75 0.35 0.75 0.34 0.75 IQR [0.01, 0.26] [–0.23, 0.07] [0.01, 0.37] [–0.30, 0.10] [0.01, 0.46] Mean absolute change 0.19 0.20 0.27 0.27 0.33 Robbery Mean forecast 5.67 5.73 5.57 5.78 5.54 5.87 Mean change from 2004 0.06 –0.04 0.11 –0.07 0.20 Fraction positive change 0.82 0.35 0.82 0.36 0.82 IQR [0.01, 0.12] [–0.12, 0.01] [0.02, 0.21] [–0.20, 0.02] [0.04, 0.37] Mean absolute change 0.08 0.09 0.13 0.15 0.23 Burglary Mean forecast 6.99 6.99 6.91 6.99 6.88 6.99 Mean change from 2004 0.00 –0.06 0.00 –0.09 0.00 Fraction positive change 0.54 0.28 0.54 0.28 0.54 IQR [–0.01, 0.01] [–0.10, 0.01] [–.0.02, 0.02] [–0.18, 0.01] [–.0.04, 0.05] Mean absolute change 0.01 0.09 0.03 0.15 0.06 MVT Mean forecast 6.66 6.68 6.59 6.70 6.52 6.74 Mean change from 2004 0.02 –0.08 0.03 –0.14 0.07 Fraction positive change 0.68 0.19 0.68 0.17 0.68 IQR [–0.02, 0.05] [–0.14, –0.03] [–0.04, 0.10] [–0.25, –0.05] [–0.08, 0.20] Mean absolute change 0.05 0.11 0.09 0.20 0.18

OCR for page 177
0 FORECASTING CRIME or little overall change (e.g., burglary) in city-level crime rates throughout the reminder of this decade. Forecasts made using Model C paint a different picture, with crime rates continuing to fall, in general, over the period. For example, the IQR in the forecasted change in log-robbery rates from 2004 to 2006 is [0.02, 0.21] when using Model A but is [–0.20, 0.02] when using Model C. Likewise, the fraction of cities forecasted to see increases in the robbery rate is 82 percent under Model A but only 36 percent in Model C. Finally, Model C generally predicts slightly larger absolute changes in the crime rates, and both predict much larger absolute changes in the homicide rates than the other three crimes. Forecasts of the log-crime rate series are sensitive to variation in the choice of explanatory variables in the regression model. That is, whether one concludes that city-level crime rates will increase or decrease based on models of this type depends on which control variables are included. This variability in the forecasts is difficult to reconcile given the cur- rent state of the literature. As far as I can tell, there is almost no research on how best to forecast crime, and there is much disagreement about the proper set of covariates to include. The limited results presented here sug- gest that Model A provides somewhat more accurate forecasts for one- and two-year horizons. If true, this would imply that city-level crime rates will tend to increase over the period. Yet these results also reveal that, for short-run forecasts, the naïve random walk model provides slightly more accurate forecasts than either panel data model. That is, for these short- run forecasts, one might not be able to do better than the predicting that tomorrow will look like today. CONCLUSION In this chapter, I compare the forecasting performance of a basic homo- geneous model to the heterogeneous counterpart using the city-level panel data provided by the Committee on Law and Justice. The results reveal the fragility of the forecasting exercise. Seemingly minor changes to a model can produce qualitatively different forecasts, and models that appear to provide sound forecasts in some scenarios do poorly in others. In the end, the naïve random walk forecasts that tomorrow will be like today do well relative to the linear time-series models, especially for shorter run forecast horizons. Two factors contribute to the variability and uncertainty illustrated here. First, forecasting is an inherently difficult undertaking. Social phenomena such as crime can sometimes evolve in subtle but substantial ways that are very difficult to identify using historical data and can take a long time to understand. Forecasts are invariably error ridden around turning points,

OCR for page 177
0 UNDERSTANDING CRIME TRENDS especially when these movements are largely the result of external events that are themselves unpredictable. Second, little serious attention has been devoted to crime rate fore- casting, and there is no well-developed research program on the problem. Effective forecasts of social processes that evolve over time would seem to require a scientific process that evolves as well. Certainly, periodic efforts to forecast crime or analyze forecasting models cannot hope to provide meaningful guidance. For further headway to be made, a focused and sustained research effort is needed. This research would necessarily include an applied com- ponent, providing and assessing crime rate forecasts at regularly scheduled intervals. To make notable advances, there would also need to be a sus- tained methodological research program aimed at developing and assess- ing the performance of different forecasting approaches. In this chapter, I consider a very limited set of models and estimators. There are many other forecasting approaches that could be considered. Baltagi (2006) for exam- ple, assesses a variety of forecasting models and estimators using the same structure as those considered in this chapter. More sophisticated models that incorporate, for example, structural breaks, cross-state or cross-crime interactions, and a larger set of observed covariates might also be evalu- ated. Model-averaging techniques similar to those described by Durlauf, Navarro, and Rivers (Chapter 7 in this volume) have been shown to be effective at reducing forecasting errors in other settings. Finally, one might consider using entirely different approaches, such as the prediction market forecasting techniques described by Gürkaynak and Wolfers (2006). ACKNOWLEDGMENTS I have benefited from the comments of Phil Cook, Richard Rosenfeld, Jose Fernandez, Elizabeth Wittner, several anonymous referees, the Uni- versity of Virginia Public Economics Lunch Group, and participants at the Committee on Law and Justice Workshop on Understanding Crime Trends. I thank Rosemary Liu for assistance in formatting and organizing the data file. The data used in this chapter were assembled by Rob Fornango and made available by the Committee on Law and Justice. The author remains solely responsible for how the data have been used and interpreted. REFERENCES Baltagi, Badi H. (2006). Forecasting with panel data. Center for Policy Research Working Paper No. 91, Syracuse University.

OCR for page 177
0 FORECASTING CRIME Congressional Budget Office. (2005). CBO’s economic forecasting record: An ealuation of the economic forecasts CBO made from January  through January 00. Avail- able: http://www.cbo.gov/ftpdocs/68xx/doc6812/10-25-EconomicForecastingRecord.pdf [accessed August 2008]. DeFina, Robert H., and Thomas M. Arvanites. (2002). The weak effect of imprisonment on crime: 1971-1998. Social Science Quarterly, (3), 635-653. Diebold, Francis X. (1998). Elements of forecasting. Cincinnati: South-Western College. Gürkaynak, Refet, and Justin Wolfers. (2006). Macroeconomic deriaties: An initial analysis of market-based macro forecasts, uncertainty, and risks. (NBER Working Paper # 11929.) Cambridge, MA: National Bureau of Economic Research. Land, Kenneth C., and Patricia L. McCall. (2001). The indeterminacy of forecasts of crime rates and juvenile offenses. In National Research Council and Institute of Medicine, Juenile crime, juenile justice, Panel on Juvenile Crime: Prevention, Treatment, and Control, Joan McCord, Cathy Spatz Widom, and Nancy A. Crowell (Eds.). Committee on Law and Justice and Board on Children, Youth, and Families. Washington, DC: National Academy Press. Levitt, Steven D. (2004). Understanding why crime fell in the 1990s: Four factors that explain the decline and six that do not. Journal of Economic Perspectie, (1), 163-190. McDowall, D. (2002). Tests of nonlinear dynamics in the U.S. homicide time series, and their implications. Criminology, 0(3), 722-736. Zahn, Margaret A., and Patricia L. McCall. (1999). Trends and patterns of homicide in the 20th-century United States. In M. Dwayne Smith and Margaret Zahn (Eds.), Homicide: A sourcebook of social research (pp. 9-26). Thousand Oaks, CA: Sage.

OCR for page 177