forecasting crime may ultimately require more complex models, there is ample precedent for applying simple alternatives (Baltagi, 2006; Diebold, 1998).2 I thus focus on basic linear models that do not allow for structural breaks in the time-series process, do not incorporate cross-state or cross-crime interactions, and include only a small number of observed covariates. Finally, I focus on point rather than interval forecasts. Sampling variability plays a key role in forecasting, but a natural starting point is to examine the sensitivity of point forecasts to different modeling assumptions. Thus, my focus is on forecasting variability across different models. Adding confidence intervals will only increase the uncertainty associated with these forecasts.
I begin by considering the problem of forecasting the national homicide rate. This homicide series lies at the center of much of the controversy surrounding the few earlier forecasting exercises that have proven so futile. Using annual data on homicide rates, I estimate a basic autoregressive model that captures some important features of the time-series variation in homicide rates and does reasonably well at shorter run forecasts. As for the longer run forecasts, the statistical models clearly predict a sharp drop in crime during the 1990s, but they fail to forecast the steep rise in crime during the late 1980s.
After illustrating the basic approach using the national homicide series, I then focus on the problem of forecasting city-level crime rates. Using panel data on annual city-level crime rates for the period 1980-2000, I again estimate a series of autoregressive lag models for four different crimes: homicide, robbery, burglary, and motor vehicle theft (MVT). Data for 2001-2004 are used for out-of-sample analyses.
The key objective is to compare the performance of various city-level forecasting models. First, I examine basic panel data models with and without covariates and with and without autoregressive lags. Most importantly, I contrast the homogeneous panel data model with heterogeneous models in which the process can vary arbitrarily across cities. I also consider two naïve models, one in which the forecast simply equals the city-level mean or fixed effect—the best constant forecast—and the other in which the forecast equals the last observed rate—a random walk forecast. In addition to considering the basic plausibility of the various model estimates, I examine differences in prediction accuracy and bias over 1-, 2-, 4-, and 10-year forecast horizons.
Diebold refers to this idea as the parsimony principle; all else equal, simple models are preferable to complex models. Certainly, imposing correct restrictions on a model should improve the forecasting performance, but even incorrect restrictions may be useful in finite samples. Simple models can be more precisely estimated and may lessen the likelihood of overfitting the observed data at the expense of effective forecasting of unrealized outcomes. Finally, empirical evidence from other settings reveals that simpler models can do at least as well and possibly better at forecasting than more complex alternatives.