Appendix D
Statistical Issues in the Evaluation of the Effects of RighttoCarry Laws
Joel L. Horowitz
Different investigators have obtained conflicting estimates of the effects of righttocarry laws on crime. Moreover, the estimates are sensitive to relatively minor changes in data and the specifications of models. This paper presents a statistical framework that explains the conflicts and why there is little likelihood that persuasive conclusions about the effects of righttocarry laws can be drawn from analyses of observational (nonexperimental) data. The framework has two main parts. The first relates to the difficulty of choosing the right explanatory variables for a model. The second relates to the difficulty of estimating the relation among crime rates, the explanatory variables, and the adoption of righttocarry laws even if the correct explanatory variables are known.
CHOOSING THE EXPLANATORY VARIABLES
The effect on crime of having a righttocarry law in effect at a given time and place may be defined as the difference between the crime rate (or its logarithm) with the law in effect and the crime rate (or its logarithm) without the law. The fundamental problem in measuring the effect of a righttocarry law (as well as in evaluating other public policy measures) is that at any given time and place, a righttocarry law is either in effect or not in effect. Therefore, one can measure the crime rate with the law in effect or without it, depending on the state of affairs at the time and place of interest, but not both with and without the law. Consequently, one of the two measurements needed to implement the definition of the law’s effect is
always missing. To estimate the law’s effect, one must have a way of “filling in” the missing observation.
The discussion of this problem can be streamlined considerably by using mathematical notation. Let i index locations (possibly counties) and t index time periods (possibly years). Let denote the crime rate that county i would have in year t with a righttocarry law in effect. Let denote the crime rate that county i would have in year t without such a law. Then the effect of the law on the crime rate is defined as under the assumption that all other factors affecting crime are the same with or without the law. The fundamental measurement problem is that one can observe either (if the law is in effect in county i and year t) or (if the law is not in effect in county i and year t) but not both. Therefore, Δ_{it} can never be observed.
One possible solution to this problem consists of replacing the unobservable Δ_{it} by the difference between the crime rates after and before adoption of a righttocarry law (in other words, carrying out a beforeandafter study). For example, suppose that county i (or county i’s state) adopts a righttocarry law in year s. Then one can observe whenever t < s and whenever t > s. Thus, one might consider measuring the effect of the law by (for example) (the crime rate a year after adoption minus the crime rate a year before adoption). However, this approach has several serious difficulties.
First, factors that affect crime other than adoption of a righttocarry law may change between years s – 1 and s + 1. For example, economic conditions, levels of police activity, or conditions in drug markets may change. If this happens, then measures the combined effect of all of the changes that took place, not the effect of the righttocarry law alone. Second, can give a misleading indication of the effect of the law’s adoption even if no other relevant factors change. For example, suppose that crime increases each year before the law’s adoption and decreases at the same rate each year after adoption (Figure C1). Then , indicating no change in crime levels, even though the trend in crime reversed in the year of adoption of the righttocarry law. Taking the difference between multiyear averages of crime levels after and before adoption of the law would give a similarly misleading indication. This has been pointed out by Lott (2000:135) in his response to Black and Nagin (1998). As a third example, righttocarry laws might be enacted in response to crime waves that would peak and decrease even without the laws. If this happens, then might reflect mainly the dynamics of crime waves rather than the effects of righttocarry laws.
Finally, the states that have righttocarry laws in effect in a given year may be systematically different from the states that do not have these laws in effect. Indeed, Lott (2000:119) found that in his data, “states adopting
[righttocarry] laws are relatively Republican with large National Rifle Association memberships and low but rising rates of violent crime and property crime.” Nontimevarying systematic differences among states are accounted for by the fixed effects, γ_{i}, in Models 6.1 and 6.2 in Chapter 6. However, if there are timevarying factors that differ systematically among states with and without righttocarry laws and that influence the laws’ effects on crime, then the effects of enacting these laws in states that do not have them cannot be predicted from the experience of states that do have them, even if the other problems just described are not present.
The foregoing problems would not arise if the counties that have righttocarry laws could be selected randomly. Of course, this is not possible, but consideration of the hypothetical situation in which it is possible provides insight into the methods that are used to estimate the effects of realworld righttocarry laws. If the counties that have righttocarry laws in year t are selected randomly, then there can be no systematic differences between counties with and without these laws in year t. Consequently, the average value of is the same across counties in year t regardless of whether a righttocarry law is in effect. Similarly, the average value of is the same across counties. It follows that the average effect on crime of the righttocarry law is the average value of in counties with the law
minus the average value of in counties that do not have the law. In other words, the average effect is the average value of the observed crime rate in counties with the law minus the average value of the observed crime rate in counties that do not have the law.^{1}
In the real world, the counties that have righttocarry laws cannot be selected randomly, but one might hope that the benefits of randomization can be achieved by “controlling” the variables that are responsible for “relevant” systematic differences between counties that do and do not have righttocarry laws. Specifically, suppose that the relevant variables are denoted by X. Suppose further that the average value of is the same across counties that have the same value of X, regardless of whether a righttocarry law is in effect. Similarly, suppose that the average value of is the same across counties that have the same value of X. If these conditions are satisfied, then the average effect on crime of adoption of a righttocarry law in counties with a specified value of X is the average of the observed crime rates in counties with the specified value of X that have the law in place minus the average of the observed crime rates in counties with the specified value of X that do not have the law. This is the idea on which all of the models of Lott and his critics are based.
The problem with this idea is that the variables that should be included in X are unknown, and it is not possible to carry out an empirical test of whether a proposed set of X variables is the correct one. This is because the answer to the question whether X is a proper set of control variables depends on the relation of X to the unobservable counterfactual outcomes ( in counties that do not have righttocarry laws in year t and in counties that do have the laws in year t). Thus, it is largely a matter of opinion which set to use. A set that seems credible to one investigator may lack credibility to another. This problem is the source of the disagreement between Lott and his critics over Lott’s use of the arrest rate as an explanatory variable in his models. It is also the source of other claims that Lott may not have accounted for all relevant influences on crime. See, for example, Ayers and Donohue (1999:464465) and Lott’s response (Lott, 2000:213215).^{2}
Lott is aware of this problem. In response, he argues that his study used “the most comprehensive set of control variables yet used in a study of crime, let alone any previous study on gun control” (Lott, 2000:153). There are two problems with this argument. First, although it is true that Lott uses a large set of control variables (his data contain over 100 variables, though not all are used in each of his models), he is limited by the availability of data. There is (and can be) no assurance that his data contain all relevant variables. Second, it is possible to control for too many variables. Specifically, suppose that there are two sets of potential explanatory variables, X and Z. Then it is possible for the average value of to be the same among counties with the same value of X, regardless of whether a righttocarry law is in place, whereas the average value of among counties with the same values of X and Z depends on whether a righttocarry law has been adopted. The same possibility applies to . In summary, it is not enough to use a very large set of control or explanatory variables. Rather, one must use a set that consists of just the right variables and, in general, no extra ones.^{3}
In fact, there is evidence of uncontrolled (or, possibly, overcontrolled) systematic differences among counties with and without righttocarry laws in effect. Donohue (2002: Tables 56) estimated models in which future adoption of a righttocarry law is used as an explanatory variable of crime levels prior to the law’s adoption. He found a statistically significant relation between crime levels and future adoption of a righttocarry law, even after controlling for what he calls “an array of explanatory variables.” This result implies that there are systematic differences between adopting and nonadopting states that are not accounted for by the explanatory variables In other words, there are variables that affect crime rates but are not in the model, and it is possible that the omitted variables are the causes of any apparent effects of adoption of righttocarry laws.^{4}
^{3} 
Bronars and Lott (1998) and Lott (2000) have attempted to control for confounding variables by comparing changes in crime rates in neighboring counties such that some counties are in a state that adopted a righttocarry law and others are in a state that did not adopt the law. Bronars and Lott (1998) and Lott (2000) found that crime rates tend to decrease in counties where the law was adopted and increase in neighboring counties where the law was not adopted. The issues raised by this finding (and by any conclusion that differential changes in crime levels in neighboring counties are caused by adoption or nonadoption of righttocarry laws) are identical to the issues raised by the results of Lott’s main models, Models 6.1 and 6.2 in Chapter 6. 
^{4} 
If the explanatory variables accounted for all systematic differences in crime rates, then the average crime rate conditional on the explanatory variables would be independent of the adoption variable. Thus, future adoption of a righttocarry law would not have any explanatory power. Lott and Mustard (1997, Table 11) and Lott (2000:118) attempted to control for omitted variables affecting crime by carrying out a procedure called “twostage least squares” (2SLS). 
There is also evidence that estimates of the effects of these laws are sensitive to the choice of explanatory variables. See, for example, the discussion of Table 65 in Chapter 6. Thus, the choice of explanatory variables matters. As has already been explained, there is and can be no empirical test for whether a proposed set of explanatory variables is correct. There is little prospect for achieving an empirically supportable agreement on the right set of variables. For this reason, in addition to the goodnessoffit problems that are discussed next, it is unlikely that there can be an empirically based resolution of the question of whether Lott has reached the correct conclusions about the effects of righttocarry laws on crime.^{5}
ESTIMATING THE RELATION AMONG CRIME RATES, THE EXPLANATORY VARIABLES, AND ADOPTION OF RIGHTTOCARRY LAWS
This section discusses the problem of estimating the average crime rate in counties that have the same values of a set of explanatory variables X and that have (or do not have) righttocarry laws in effect. Specifically, let Z_{it} =1 if county i has a righttocarry law in effect in year t, and let Z_{it} =0 if county i does not have such a law in year Y_{it}. Let denote the crime rate (or its logarithm) in county i and year t, regardless of whether a righttocarry law is in effect. The objective in this section is to estimate the average values Y_{it} of conditional on Z_{it} =1 and Y_{it} conditional on Z_{it} = 0 for counties in which the explanatory variables X have the same values, say X = X_{0}. Denote these averages by E(Y_{it}  Z_{it} = 1, X_{0} ) and E(Y_{it}  Z_{it} = 0, X_{0} ), respectively. E(Y_{it}  Z_{it} = 1, X_{0} ) is the average crime rate in year t in counties that have righttocarry laws and whose explanatory variables have the values

However, the 2SLS estimates of the effects of righttocarry laws on the incidence of violent crimes differ by factors of 15 to 42, depending on the crime, from the estimates in Lott’s Table 4.1 and are implausibly large. For example, according to the 2SLS estimates reported by Lott and Mustard (1997, Table 11), adoption of righttocarry laws reduces all violent crimes by 72 percent, murders by 67 percent, and aggravated assaults by 73 percent. 2SLS works by using explanatory variables called instruments to control the effects of any missing variables. A valid instrument must be correlated with the variable indicating the presence or absence of a righttocarry law but otherwise unrelated to fluctuations in crime that are not explained by the covariates of the model. In Lott and Mustard (1997) and Lott (2000), the instruments include levels and changes in levels of crime rates and are, by definition, correlated with the dependent variables of the models. Thus, they are unlikely to be valid instruments. It is likely, therefore, that Lott’s and Mustard’s 2SLS estimates are artifacts of the use of invalid instruments and other forms of specification errors. 
X_{0}. E(Y_{it}  Z_{it} = 0, X_{0} ) is the average crime rate in year t in counties that do not have righttocarry laws and whose explanatory variables have the values X_{0}. If the explanatory variables control for all other factors that are relevant to the crime rate, then is the D_{t} (X_{0}) = E(Y_{it}  Z_{it} = 1, X_{0} ) − E(Y_{it}  Z_{it} = 0, X_{0} ) average change in the crime rate caused by the law in year t in counties where the values of the explanatory variables are X_{0}.
The models of Lott and his critics are all aimed at estimating D_{t} (X_{0}) for some set of explanatory variables X. This section discusses the statistical issues that are involved in estimating D_{t} (X_{0}). The discussion focuses on the problem of estimating the function D_{t} for a given set of explanatory variables. This issue is distinct from and independent of the problem of choosing the explanatory variables that was discussed in the previous section. Thus, the discussion in this section does not depend on whether there is agreement on a “correct” set of explanatory variables.
Estimating D_{t} (X_{0}) is relatively simple if in year t there are many counties with righttocarry laws and the same values X_{0} of the explanatory variables and many counties without righttocarry laws and identical values X_{0} of the explanatory variables. D_{t} (X_{0}) would then be the average of the observed crime rate in the counties that do have righttocarry laws minus the average crime rate in counties that do not have such laws. However, there are not many counties with the same values of the explanatory variables. Indeed, in the data used by Lott and his critics, each county has unique values of the explanatory variables. Therefore, the simple averaging procedure cannot be used. Instead, D_{t} (X_{0}) must be inferred from observations of crime rates among counties with a range of values of X. In other words, it is necessary to estimate the relation between average crime rates and the values of the explanatory variables.
In principle, the relations between average crime rates and the explanatory variables with and without a righttocarry law in effect can be estimated without making any assumptions about their shapes. This is called nonparametric estimation. Härdle (1990) provides a detailed discussion of nonparametric estimation methods. Nonparametric estimation is highly flexible and largely eliminates the possibility that the estimated model may not fit the data, but it has the serious drawback that the size of the data set needed to obtain estimates that are sufficiently precise to be useful increases very rapidly as the number of explanatory variables increases. This is called the curse of dimensionality. Because of it, nonparametric estimation is a practical option only in situations in which there are few explanatory variables. It is not a practical option in situations like estimation of the effects of righttocarry laws, where there can be 50 or more explanatory variables.
Because of the problems posed by the curse of dimensionality, the most frequently used methods for estimation with a large number of explanatory
variables assume that the relation to be estimated belongs to a relatively small class of “shapes.”^{6} For example, Models 6.1 and 6.2 assume that the average of the logarithm of the crime rate is a linear function of the variables comprising X. Lott and his critics all restrict the shapes of the relations they estimate. Doing this greatly increases estimation precision, but it creates the possibility that the true relation of interest does not have the assumed shape. That is, the estimated model may not fit the data. This is called misspecification. Moreover, because the set of possible shapes increases as the number of variables in X increases, the opportunities for misspecification also increase. This is another form of the curse of dimensionality. Its practical consequence is that one should not be surprised if a simple class of models (or shapes) such as linear models fails to fit the data.
Lack of fit is a serious concern because it can cause estimation results to be seriously misleading. An example based on an article that was published in the National Review (Tucker 1987) illustrates this problem. The example consists of estimating the relation between the fraction of a city’s population who are homeless, the vacancy rate in the city, an indicator of whether the city has rent control, and several other explanatory variables. Two models are estimated:
(D.1)
and
(D.2)
where FRAC denotes the number of homeless per 1,000 population in a city, RENT is an indicator of whether a city has rent control (RENT =1 if a city has rent control and RENT = 0 otherwise), VAC denotes the vacancy rate, and X denotes the other explanatory variables. The data are taken from Tucker (1987). The estimation results are summarized in Table D1.
According to Model D.1, there is a statistically significant relation between the fraction of homeless and the indicator of rent control (p < 0.05) but not between homelessness and the vacancy rate (p > 0.10). Moreover, according to Model D.1, the fraction of homeless is higher in cities that have rent control than it is in cities that do not have rent control. This
TABLE D1 Results of Estimating a Model of the Fraction of Homeless in a City (quantities in parentheses are standard errors)
Model 
Coefficient of RENT 
Coefficient of VAC or 1/VAC 
(D.1) 
3.17 (1.51) 
–0.26 (0.16) 
(D.2) 
–1.65 (3.11) 
18.89 (8.15) 
result is consistent with the hypothesis that rent control is a cause of homelessness (possibly because it creates a shortage of rental units) and that the vacancy rate is unrelated to homelessness. However, Model D.2 gives the opposite conclusion. According to this model, there is a statistically significant relation between the fraction of homeless and the vacancy rate (p < 0.05) but not between homelessness and rent control (p > 0.10). Moreover, according to Model D.2, the fraction of homeless decreases as the vacancy rate increases. Thus, the results of estimation in Model D.2 are consistent with the hypothesis that a low vacancy rate contributes to homelessness but rent control does not. In other words, Model D.1 and Model D.2 yield opposite conclusions about the effects of rent control and the vacancy rate on homelessness. In addition, it is not possible for both of the models to fit the data, although it is possible for neither to fit. Therefore, misspecification or lack of fit is causing at least one of the models to give a misleading indication of the effect of rent control and the vacancy rate on homelessness.
It is possible to carry out statistical tests for lack of fit. None of the models examined by the committee passes a simple specification test called RESET (Ramsey, 1969). That is, none of the models fits the data. This raises the question whether a model that fits the data can be found. For example, by estimating and testing a large number of models, it might be possible to find one that passes the RESET test. This is called a specification search. However, a specification search cannot circumvent the curse of dimensionality. If the search is carried out informally (that is, without a statistically valid search procedure and stopping rule), as is usually the case in applications, then it invalidates the statistical theory on which estimation and inference are based. The results of the search may be misleading, but because the relevant statistical theory no longer applies, it is not possible to test for a misleading result. Alternatively, one can carry out a statistically valid search that is guaranteed to find the correct model in a sufficiently large sample. However, this is a form of nonparametric regression, and therefore it suffers the lack of precision that is an unavoidable consequence of the curse of dimensionality. Therefore, there is little likelihood of identi
fying a wellfitting model with existing data and statistical methods.^{7} In summary, the problems posed by highdimensional estimation, misspecified models, and lack of knowledge of the correct set of explanatory variables seem insurmountable with observational data.
REFERENCES
Ayers, I., and J.J. Donohue 1999 Nondiscretionary concealed weapons law: A case study of statistics, standards of proof, and public policy. American Law and Economics Review 1:436.
Black, D.A., and D.S. Nagin 1998 Do righttocarry laws deter violent crime? Journal of Legal Studies 27:209219.
Bronars, S., and J.R. Lott, Jr. 1998 Criminal deterrence, geographic spillovers, and the right to carry concealed handguns. American Economic Review 88:475479.
Donohue, J.J. 2002 Divining the Impact of State Laws Permitting Citizens to Carry Concealed Handguns. Unpublished manuscript, Stanford Law School.
Härdle, W. 1990 Applied Nonparametric Regression. Cambridge: Cambridge University Press.
Lott, J.R. 2000 More Guns, Less Crime: Understanding Crime and GunControl Laws. Chicago: University of Chicago Press.
Lott, J.R., and D.B. Mustard 1997 Crime, deterrence, and righttocarry concealed handguns. Journal of Legal Studies 26(1):168.
Ramsey, J.B. 1969 Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society Series B 31(2):350371.
Tucker, W. 1987 Where do the homeless come from? National Review Sept. 25:3243.