5

Administrative Data on Undocumented Migration Across U.S. Borders

Administrative data collected by the U.S. Department of Homeland Security (DHS) represent an important source of information about the activities of undocumented migrants. As part of its operations to secure U.S. borders and the U.S. interior against illegal immigration, DHS records the number of undocumented migrants it apprehends, the disposition of these migrants, and the resources it devotes to enforcement activities. The relevant administrative data come in three primary forms: apprehensions data collected by the U.S. Border Patrol (USBP), which contain individual records of migrants apprehended by USBP between ports of entry; data on apprehensions at ports of entry by the Office of Field Operations (OFO); and data on apprehensions in the U.S. interior by Immigration and Customs Enforcement (ICE).

The panel was tasked with reviewing administrative data collected by DHS, and it formally requested access to data from the enforcement database, indicating that DHS could provide the data to the panel in a format that would protect any information that DHS deemed operationally sensitive. The panel made this request with the understanding that any data given to it would need to be made publicly available, in accordance with the institutional rules governing National Research Council studies. However, DHS would not provide these data without an exemption from public disclosure requirements. It was the judgment of the panel that the quality of its published analysis and the timeliness of its deliberations would have been unduly impaired by the classification restrictions that would have accompanied such an exemption. Therefore, the panel did not pursue its request, and DHS did not provide the panel with access to its administrative data.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 73
5 Administrative Data on Undocumented Migration Across U.S. Borders Administrative data collected by the U.S. Department of Homeland Security (DHS) represent an important source of information about the activities of undocumented migrants. As part of its operations to secure U.S. borders and the U.S. interior against illegal immigration, DHS records the number of undocumented migrants it apprehends, the disposition of these migrants, and the resources it devotes to enforcement activities. The relevant administrative data come in three primary forms: apprehensions data collected by the U.S. Border Patrol (USBP), which contain individual records of migrants apprehended by USBP between ports of entry; data on apprehensions at ports of entry by the Office of Field Operations (OFO); and data on apprehensions in the U.S. interior by Immigration and Customs Enforcement (ICE). The panel was tasked with reviewing administrative data collected by DHS, and it formally requested access to data from the enforcement data- base, indicating that DHS could provide the data to the panel in a format that would protect any information that DHS deemed operationally sensi- tive. The panel made this request with the understanding that any data given to it would need to be made publicly available, in accordance with the institutional rules governing National Research Council studies. However, DHS would not provide these data without an exemption from public dis- closure requirements. It was the judgment of the panel that the quality of its published analysis and the timeliness of its deliberations would have been unduly impaired by the classification restrictions that would have accom- panied such an exemption. Therefore, the panel did not pursue its request, and DHS did not provide the panel with access to its administrative data. 73

OCR for page 73
74 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER Nevertheless, the Office of Immigration Statistics (OIS) at DHS did pro- vide information on the main fields included in the apprehensions database. Non-DHS data sources such as the Survey of Migration at the Northern Border of Mexico (EMIF-N) also provide some information on the appre- hension of migrants at the border. In this chapter, we use the descriptions of the DHS data sources provided to the panel by OIS, as well other non-DHS data, to assess the usefulness of administrative data for measuring the flow of unauthorized migrants into the United States. Our conclusion is that administrative data alone do not permit reliable estimation of the inflow of unauthorized migrants across the U.S.–Mexico border. The data provide no direct information on the number individuals who elude capture and enter the United States successfully. By making as- sumptions about the behavior of unauthorized migrants, one can use the volume of apprehensions to estimate the magnitude of unauthorized flows (see, e.g., Espenshade, 1995b; Massey and Singer, 1995). In this chapter, the panel discusses how one can generalize an approach, previously de- veloped and reported in the sociology literature, which uses a “repeated trials” model. We note that all such estimates are based on strong assump- tions that are difficult to validate empirically. As discussed in this chapter, estimation methods based on capture-recapture techniques (described in detail in Appendix B), which offer a sophisticated approach to determin- ing the size of a population based on the fraction of initially “sampled” (i.e., apprehended) individuals who are subsequently “re-sampled” (i.e., re-apprehended) cannot, unfortunately, solve the problem about the lack of direct information on the number individuals who elude capture and enter the United States successfully. While administrative data have limitations, they could still offer potential insights into unauthorized migration flows if they were combined with other data sources. The panel outlines some strategies for characterizing key features of unauthorized migration flows, based on combining administrative and survey data. These approaches are discussed in more in detail in Chapter 6. SOURCES OF ADMINISTRATIVE DATA ON APPREHENSIONS USBP seeks to apprehend all individuals who attempt to cross U.S. borders illegally. Data on these apprehensions are a major source of DHS administrative records on unauthorized migration. In the last decade, USBP resources have increased dramatically, with the number of USBP officers growing from 9,000 in 2001 to 21,000 today (see Kessler, 2011). The ex- pansion in resources, combined with the drop-off in the number of migrants apprehended at the U.S.–Mexico border since 2007, means that USBP currently has the manpower to document virtually all individuals with whom it comes in contact. But it also creates difficulties for comparing in-

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 75 formation across time. In the mid-2000s, when USBP had fewer agents and the booming U.S. economy encouraged high levels of unauthorized entry, recordkeeping on apprehensions may have been incomplete. Additionally, the expansion of USBP resources has been uneven across the nine sectors of the U.S.–Mexico border (Borger et al., in press), which complicates spatial comparisons of administrative data. Beginning in fiscal year 1999, USBP created an electronic record of each apprehension made by a USBP agent. Table 5-1 is a partial list of the variables contained in the ENFORCE database, in which USBP, OFO, and ICE apprehensions data are recorded. Individual USBP apprehension records contain demographic information on the person apprehended, in- cluding gender, date of birth, country of origin, and (if a Mexican national) state of birth. The records also include an event number—which permits linking of individuals apprehended together—and information on when and where the apprehension took place, including the nearest port of entry and the USBP station, district, and sector of the arresting officer. A fin- gerprint identification number (IDFINS), which since fiscal year 2000 has been based on all 10 fingers, essentially identifies an individual and enables one to “tag” migrants who are apprehended more than once. On days (or more precisely, evenings) when apprehensions run at high levels, individual USBP stations may not have the personnel to fingerprint or interview all TABLE 5-1  Variables in ENFORCE Database U.S. Border Patrol Data IDFINS Event number Date of apprehension Site of apprehension Nearest port of entry Border Patrol sector Disposition Time in U.S. Arrest method Country of citizenship Country of residence Country of birth State of residence (Mexico only) State of birth (Mexico only) Date of birth Gender Marital status SOURCE: Office of Immigration Statistics, U.S. Depart- ment of Homeland Security (personal communication).

OCR for page 73
76 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER apprehended migrants—a situation that appears to have been more com- mon in the early 2000s, when the ratio of apprehensions to USBP officers was much higher than today. Currently a USBP station needs the approval of its station chief to forego fingerprinting and interviewing a subject. USBP asserts that it enters information into its database on nearly all current ap- prehended migrants. Additional information in USBP apprehensions data describes: • Arrest method. The method for the overwhelming majority of arrests is capture by USBP at the border. Other arrest methods are capture by USBP agents in the U.S. interior, capture by other law enforce- ment agencies, and capture at traffic checkpoints. • Status at entry. The overwhelming majority of records have this status as “Present without Authority from Mexico,” which indicates the individual was attempting to cross the U.S.–Mexico border as opposed to entering from Canada or by sea. • Status when found. The overwhelming majority of records indi- cate the individual is in transit rather than working or seeking employment. • Time in the United States. The overwhelming majority are for arrests at entry. • Smuggler use. Whether the individual hired a smuggler to cross the border, and if so, the price paid. The records also describe the disposition of the individual after apprehen- sion. Most apprehendees are returned to their countries of origin, rather than being subject to incarceration in the United States. Apprehensions data in aggregate form (i.e., the total number of ap- prehensions in a given month) have been used in a large body of academic research (see Hanson [2006] for a survey). Apprehensions tend to rise when average U.S. wages increase relative to average Mexican wages or when Mexico’s real exchange rate depreciates vis-à-vis the United States (Hanson and Spilimbergo, 1999). They tend to fall initially, but later recover, in response to increases in USBP enforcement activities (Bean et al., 1990; Cornelius and Saleyhan, 2007; Dávila et al., 2002; Donato et al., 1992; Espenshade, 1994; Kossoudji, 1992; Orrenius and Zavodny, 2003).

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 77 USING THE APPREHENSIONS DATA TO EVALUATE UNAUTHORIZED MIGRANT FLOWS Using Apprehensions to Infer Unauthorized Flows There are complications with using apprehensions data to estimate the number of individuals crossing the border illegally. The more serious problem relates to the inherent nature of the data: while apprehensions provide data on the number of individuals captured at the border, they provide no direct information on those who elude capture, which is the population of interest for this study. Minor problems include misreporting of key variables (e.g., given USBP return policies, non-Mexicans have a strong incentive to claim Mexican nationality) and possible missing data during peak apprehension periods (owing to failure to record all arrests during these times). Nevertheless, the data can be used to apply capture-recapture tech- niques to make inferences about the size of the undocumented population and the flow of individuals into this population at regular time inter- vals. The re-apprehension of individuals provides information that can, in theory, be used to make inferences about the size of the unauthorized population entering the United States successfully. Of apprehensions of Mexican men over the period from 1999 to 2009, approximately three- fifths were of individuals who were apprehended only once, one-fifth are of individuals who were apprehended twice over the period, and one-fifth are of individuals who were apprehended three or more times (Borger et el., in press). Re-apprehensions of individuals typically occur within a few days or weeks of the initial capture. Such re-apprehensions are likely part of a single crossing episode. Of apprehended Mexican men who are subse- quently re-apprehended, three quarters occur within 90 days, with the vast majority of these occurring in the first 30 days. With the appropriate data in hand, how would one describe apprehen- sions analytically? Consider a simple model of the apprehensions process, whose main elements are described in Table 5-2. Suppose that at time t there are M(t) individuals in Mexico who are considering crossing the U.S.–Mexico border illegally (this exercise ignores other nationalities). Suppose further that a fraction m(t) of these individuals choose to attempt illegal migration, where m(t) may be affected by economic conditions in the United States and Mexico and by the intensity of enforcement at the border. Let the probability of apprehension by USBP at time t be a(t). Upon being apprehended, an individual decides whether to cross again or to return home (the latter group having been successfully deterred from further crossing attempts). Let the probability of retrying, conditional on being apprehended, be r(t). It is likely that m(t), a(t), and r(t) will vary

OCR for page 73
78 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER TABLE 5-2  Modeling Apprehensions Variable Definition M(t) Population of potential migrants in Mexico m(t) Probability of attempting migration a(t) Probability of apprehension, conditional on attempting migration r(t) Probability of re-attempting migration, conditional on being apprehended across individuals according to demographic characteristics (age, gender, region of birth, marital status, family structure), skill (education, occupa- tion, work experience), and knowledge about migration (previous crossing experience, access to migrant network), among other factors. For purposes of illustration, we first ignore these sources of individual heterogeneity but then consider some ways to address them below. In each period t, the number of individuals apprehended on their first attempt to cross the border is represented by the product, a(t)m(t)M(t): the number of potential migrants in Mexico times the probability that an indi- vidual attempts to migrate times the probability of apprehension. Similarly, the number of individuals apprehended on their second attempt to cross the border is equal to the product, a(t)r(t)[a(t – 1)m(t – 1)M(t – 1)]: the number of individuals apprehended last period (in brackets) times the probability of attempting to cross again (after an initial apprehension) times the prob- ability of apprehension. Assuming that the IDFINS data are known for all migrants, one can separate apprehensions associated with first attempts to cross the border from those associated with repeat attempts. Ignored here are complications associated with individuals waiting for more than one period before retrying. Allowing for waiting complicates the math, but does not change the basic structure of the problem. Using the value of first-time apprehensions in the previous period and second-time apprehensions in the current period, one can identify the value for a(t)r(t), which is the joint probability of retrying to cross the border and being apprehended. But unless one assumes that r(t) = 1, which means that all individuals who attempt to cross the border keep trying until they succeed, the value of a(t) cannot be determined; if a(t) can be identified, then the magnitude of apprehensions can be used to estimate the magnitude of illegal attempts to cross the border or successful attempts to cross the border. Previous uses of apprehensions data to estimate unauthorized mi- gration flows, such as the analyses by Espenshade (1995b) and by Massey and Singer (1995), have assumed that r(t) = 1. But the EMIF-N data do not support this assumption. Some individuals who are apprehended become discouraged and do not attempt to cross the border again. Furthermore, the

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 79 EMIF-N data suggest that the fraction of attempted crossers who become discouraged has risen over time. By imposing additional structure on the data, one can use the apprehen- sions data to infer more about unauthorized migration flows. One approach would be to develop an economic model of the process governing the deci- sion to migrate, which would allow one to characterize how the migration probability, m(t), responds to changes in economic conditions, and how the apprehensions probability, a(t), and the probability of retrying to cross the border, r(t), respond to changes in the level of enforcement activity. The data sets from ENOE, ENADID, the MMP, and the MxFLS contain information that permit the migration decision to be modeled (Gathmann, 2008; McKenzie and Rapoport, 2010; Orrenius and Zavodny, 2005) with some limitations (as discussed in Chapter 4). Furthermore, EMIF-N and the MMP contain information that allows examination of how re- apprehensions respond to changes in enforcement. The use of survey data would further allow one to address heterogeneity across individuals in how the decision to migrate responds to changes in environmental conditions. McKenzie and Rapoport (2010), for instance, find that the propensity to migrate varies across Mexican communities according to the past migration experience of community members. Chapter 6 discusses different classes of models that combine survey, administrative, and other types of data. A second approach in the use of apprehensions data is to make assump- tions about the stochastic process governing apprehensions, generalizing the repeated-trials method developed by Espenshade (1995b). In a section below, we use EMIF-N data on repeat apprehensions to illustrate such an approach (which could be explored more fully using ENFORCE data). Using Data on Smuggling Costs The price of smuggling services is a potentially useful indicator of the effectiveness of border enforcement. If intensifying enforcement causes the risk of apprehension, incarceration, or physical harm for smugglers to rise, one would expect them to charge higher prices for ferrying migrants across the border. Gathmann (2008) finds in data from the MMP that higher lev- els of enforcement activity by USBP are associated with higher prices for smuggling services paid by migrants in surveyed communities. Because the ENFORCE database contains information on smuggling costs, it could be used to construct a measure of the effectiveness of enforcement. In practice, there are myriad problems with using existing smuggling cost data for analytical purposes. One is that USBP collects information on smuggling costs in an inconsistent manner. Borger and colleagues (in press) report that, in the USBP apprehensions data, fewer than 20 percent of apprehendees report whether or not they use a smuggler. Even if one

OCR for page 73
80 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER were able to collect information on smuggling and smuggling prices from all individuals apprehended, the problem would remain that these prices correspond to just those individuals who were apprehended. One would therefore have a selected sample of individuals from whom to extract price data. Smuggling prices in such a sample may be subject to downward bias owing to the possibility that individuals purchasing inferior smuggling ser- vices at low-end prices may be more likely to be apprehended than those who paid higher prices. A further problem is that there is likely to be differentiation in the mar- ket for smuggling services. There is likely to be variation across migrants in the duration of smuggling services being purchased (transport immediately across the border versus delivery to an interior U.S. city), the size or compo- sition of the group being guided (small numbers of adult males versus entire families including children), experience in crossing the border, mode of transportation (by land or by sea), and the physical risks being confronted (cold in winter, heat in summer, longer routes when crossing through the Sonoran desert or mountainous regions). Borger and colleagues (in press) find that smuggling prices tend to be higher for groups that include women, children, or the elderly. With information on the characteristics of the services being purchased, one could in principle construct a hedonic price index for migrant smuggling that would adjust for product differentiation, similar to how the U.S. Bureau of Labor Statistics adjusts price indices for consumer goods to account for changes in the quality of goods over time. However, the information currently reported by USBP in its ENFORCE da- tabase is insufficient for such an exercise. The systematic collection of data on smuggling prices would expand the options available to DHS for analyz- ing the behavior of undocumented migration at the U.S.–Mexico border. Frequency of Apprehension Frequencies Analysis An alternative approach for using apprehensions data to estimate the number of individuals crossing the border successfully is to impose as- sumptions about the underlying stochastic process that governs attempts to cross the border. If one treats apprehensions as a count variable and the number of apprehensions for a given individual as the outcome of a draw of a random variable from a defined distribution, then one can use the observed frequency of outcomes for one apprehension, two apprehensions, three apprehensions, and so forth to estimate the “missing category” of zero apprehensions, which corresponds to the number of individuals who cross the border without being apprehended. This approach generalizes the repeated-trials model used by Espenshade (1995b). Implementing this approach requires one to impose the untested assumption that data on ob-

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 81 1.0 0.8 Fraction of People Apprehended 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Number of Times Individual Is Apprehended 2005 2006 2007 2008 2009 FIGURE 5-1  Frequency of apprehensions across time. SOURCE: Data from EMIF-N. served behavior (the number apprehended) is informative about unobserved F igure 5 -1 behavior (the number avoiding apprehension).1 To provide an example of how one could use data on apprehensions, Figure 5-1 shows the number of times individuals surveyed in EMIF-N report being apprehended in a given series of attempts to cross the U.S.– Mexico border. The length of the time window used to define apprehensions for a single crossing episode is an important issue. In EMIF-N, the length of this window is not defined precisely. Because most subsequent apprehen- sions occur within a few weeks of the initial apprehension, the length of the crossing window may not be important for the results. As more of those apprehended are subject to consequence programs, however, the appropri- ate window for defining a migration episode may change. 1  Related ideas have been used to estimate the number of animals or plants present in a community (Bunge and Fitzpatrick, 1993; Royle and Dorazio, 2008), the number of unique records in a filing system with duplicates (Arnold and Beaver, 1988), and statistical disclosure risk assessment (Fienberg and Makov, 1998). In addition, there is a growing literature on Good-Turing methods to estimate the probability of types being unobserved (Good, 1953). The latter were originally developed to estimate the frequencies of words in a corpus.

OCR for page 73
82 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER Each plot in Figure 5-1 shows the relative frequency of outcomes (as the fraction of all outcomes, or density) based on observations for the calendar year of the initial apprehension. For individuals first apprehended in 2009, 75.4 percent are apprehended only once, 16.1 percent are ap- prehended twice, 4.4 percent are apprehended three times, 1.3 percent are apprehended four times, 0.6 percent are apprehended five times, and 0.5 percent are apprehended six or more times. To estimate the number of zero apprehensions, the frequency of apprehensions is projected to the left, based on estimation of the distributional parameters governing apprehension, as discussed below. The figures for 2005 to 2009 overlap one another to a considerable degree. There is a slight increase in the fraction apprehended two times in 2008 and 2009 compared with earlier years. This suggests that, if the same distribution governs apprehensions in each period, the fraction of individuals in the zero-apprehension category would be similar across time, although slightly lower in recent years. An important implication of such an outcome is that the probability of crossing the border successfully may have changed only modestly across time, despite the massive increase in border enforcement resources. The panel emphasizes that drawing infer- ences about the effectiveness of enforcement policies from apprehensions data is problematic. In the simple analysis illustrated here, the probability that those who are apprehended re-attempt to cross the border is assumed to be stable over time (an assumption also implicit in the repeated trials approach discussed earlier). If this assumption is incorrect, there may be less stability in the zero apprehensions category than Figure 5-1 appears to suggest. We also note that this “frequency of frequencies” approach cannot separate the proportion of migrants who cross successfully from those who are deterred from further attempts. The data used for Figure 5-1 provide mild evidence of variation in outcomes for apprehensions across time (from year to year). What about variation across space (i.e., border sectors)? Figure 5-2 plots the frequencies of apprehensions from 2007 to 2009 for four USBP regions that account for a large share of apprehensions: Tijuana/San Diego, Nogales/Nogales, Ciudad Juarez/El Paso, and Nuevo Laredo/Laredo. Across sectors, there is minor variation in the frequency of apprehensions. Projecting back to the category of zero apprehensions, it would appear that the frequency of suc- cessful crossings is slightly higher for Ciudad Juarez/El Paso and lower for the other locations. The absence of notable regional variation in the zero apprehensions category suggests that, despite large cross-sector differences in the scale of enforcement activities, the probability of apprehension may be stable across regions. It would have been preferable to perform the analyses represented in Figures 5-1 and 5-2 using ENFORCE data. ENFORCE covers the universe

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 83 1.0 Fraction of People Apprehended 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Number of Times Individual Is Apprehended Tijuana/San Diego Nogales/Nogales Ciudad Juarez/El Paso Nuevo Laredo/Laredo FIGURE 5-2  Frequency of apprehensions across space. SOURCE: Data from EMIF-N. F igure 5-2 of individuals apprehended, whereas EMIF-N only covers those individuals questioned by survey enumerators, whose choice of survey zones and points to find individuals being returned to Mexico after apprehension may intro- duce unknown sources of bias into the sample. Of course, both ENFORCE and EMIF-N are subject to the limitation that the population of individuals who are apprehended once but not seen again includes both those who, on their subsequent attempt, cross into the United States successfully and those who, after the initial apprehension, become discouraged and return home to Mexico. The conflation of successful crossers and discouraged crossers contaminates the analysis. For instance, if no apprehended crossers became discouraged, then the ratio of those apprehended for a second time to those apprehended just once would equal the probability of apprehension. But if some apprehended crossers become discouraged, then this ratio equals the apprehension probability multiplied by the probability that initially appre- hended crossers do not become discouraged, a product that does not allow one to pin down the apprehension probability itself. Despite the concerns noted above, there would have been additional value in using the ENFORCE data for this analysis. Because ENFORCE data

OCR for page 73
84 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER contain the universe of apprehensions, as well as demographic information on captured migrants, they would permit one to evaluate the stability of the zero-apprehensions category across time, space, and individuals by age, gender, and region of birth within Mexico. The panel could have used the ENFORCE data to determine whether there were systematic changes in the zero-apprehensions category as DHS boosted enforcement along the border and imposed consequence programs at specific points along the border. Such changes may indicate that apprehension probabilities are responsive to changes in border enforcement (e.g., the zero-apprehensions category expands because more individuals are being caught) or that the composition of border crossers is responsive to changes in border enforcement (e.g., the zero-apprehensions category expands because more-determined crossers ac- count for a higher fraction of those crossing). Although the panel would not have been able to attach likelihoods to these or other explanations, know- ing whether the size of the zero-apprehensions category was correlated with the intensity of border enforcement would have been helpful to the panel in considering approaches (discussed in Chapter 6) to formally modeling migration flows. In conducting these analyses, moreover, the panel would have gained at least some insight into the quality, completeness, and reli- ability of the administrative data. More Detailed Frequency of Apprehension Frequencies The frequency of frequencies approach is based on fitting statistical distributions to the counts of the number of times an individual is appre- hended (within a given window).2 Three core assumptions are sufficient to make such models meaningful: 1. The individuals are apprehended independently. That is, their pro- pensity to be apprehended is independent of that of other individuals who attempt to cross. 2. Apprehended individuals are not deterred from subsequent attempts. In fact, the approach assumes that they will attempt to cross until they are successful. 3. If individuals cyclically migrate then their propensity to be appre- hended is independent of their prior attempts. That is, their prob- ability of apprehension is the same as if they were a new individual attempting to cross. 2  The frequency of frequencies may be thought of as a “species problem” (see Efron and Thisted [1976] on how many words Shakespeare knew).

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 85 BOX 5-1 Classes of Statistical Distributions The following classes of distributions are relevant for modeling frequency of apprehension frequencies: Poisson, negative binomial, geometric, and Conway- Maxwell-Poisson. The choice of a Poisson distribution can be motivated in a num- ber of ways. A simple one is to assume that individuals have an “apprehendability,” defined as a measure of their propensity to be apprehended. Specifically, assume that apprehendability is measured by the expected number of apprehensions before a successful attempt for the individual. Further assume that the ratio of the probability of being apprehended k – 1 times to that of being apprehended k times is proportional to k (unconditional on being apprehended the kth time). If this ap- prehendability is common to all individuals, then the number of times an individual is apprehended is a Poisson distribution whose mean is their apprehendability. The choice of a negative binomial distribution can be motivated by individual heterogeneity in apprehendability. Assume that the individual apprehendabilities vary but can be modeled as independent draws from a gamma distribution. Then the number of times an individual is apprehended is a negative binomial distribu- tion whose mean is the mean apprehendability of the group of individuals. A geometric distribution is a special case of negative binomial distribution. That is, it presumes a specific relationship between the mean apprehendability and the variance of the apprehendability in the population. There is an alternative motivation for choosing it. Suppose there is a common probability of apprehen- sion per attempt, a, and the apprehension events are independent for the same individual over time. Then the number of times an individual is apprehended is represented by a geometric distribution whose mean is (1 – a)/a. The Conway-Maxwell-Poisson is a variant of the Poisson distribution that allows over-dispersion (like a negative binomial distribution) as well as under- dispersion relative to a Poisson distribution. While over-dispersion may be ex- pected, it is possible that under-dispersion in the number of apprehensions occurs, and this possibility should be represented in the array of models whose fit to the data is tested. All three assumptions are important, and the possibility of deviating from them is high. Details on statistical distributions that incorporate these as- sumptions are presented in Box 5-1. The observed number of apprehensions is truncated at zero. That is, there are no observations of individuals who are not apprehended on their first attempt. However, candidate distributions can be fit to the available data, taking this into account. The fit can be tested using a maximum like- lihood estimation method, such as that provided in the degreenet package from the Comprehensive R Archive Network (CRAN).3 Measures of the 3  Available:http://statnet.org and http://CRAN.R-project.org/package=degreenet (accessed August 2012).

OCR for page 73
86 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER uncertainty due to sampling, including confidence intervals, can be esti- mated, although such estimates are not included in this discussion. We note that the assumptions of the model allow it to extrapolate beyond the data themselves (i.e., to the frequency of the number of times an individual is not apprehended). If, however, the distribution is misspecified or the assump- tions are false, then this extrapolation can be subject to substantial error. Figure 5-3 shows the total proportions for the number of apprehen- sions from the 2009 EMIF-N data, represented by the circles on the plot (notice that no value is shown for zero apprehensions). The colored lines on the plot represent the fits of the four types of statistical distributions described in Box 5-1. The negative binomial, geometric, and Conway- Maxwell-Poisson distributions all fit the observed counts closely—indeed, the lines overlap so that the separate colors are not visible. The Poisson 1.0 Observed Conway-Maxwell- Poisson fit Poisson fit Geometric fit 0.8 0.74 0.74 0.75 Negative binomial fit Probability of n Apprehensions 0.6 0.54 0.4 0.2 0 0 1 2 3 4 5 6 Number of Apprehensions, n FIGURE 5-3  Fits of naïve apprehensions models for 2009. SOURCE: Data from EMIF-N. Figure 5-3

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 87 distribution does not provide a good fit for one to three apprehensions. The numerical values for the probability of non-apprehension on the first crossing attempt, as estimated by each of the four distribution classes, are shown close to the vertical axis. (The values of the distributions at zero apprehensions are off the scale of this graph.) The probabilities for zero ap- prehensions for the three good-fitting distributions are close together (74 to 75 percent), while that of the poorer-fitting Poisson distribution is substan- tially lower (54 percent). Because the sample sizes are large, the nominal confidence intervals (not plotted) for the probability of non-apprehension on the first crossing are narrow, but these estimates are not adjusted for possible model misspecification. A variant of this distribution-fitting ap- proach is the Good-Turing frequency model (Good, 1953). The simplest version of this approach estimates the probability of non-apprehension as the proportion of those apprehended who were apprehended exactly once. For the 2009 EMIF-N data, this estimate is 75.4 percent, which is in close agreement with the first three good-fitting distributions. CONCLUSION At least three agencies within DHS collect administrative data on ap- prehensions of unauthorized immigrants. USBP collects data on apprehen- sions between ports of entry, OFO collects data on apprehensions at ports of entry, and ICE collects data on apprehensions in the interior of the United States. Because fingerprints on those apprehended are collected in all three data sources, DHS’s ENFORCE database can integrate data across the three sources at the individual level. However, conversations with rep- resentatives from DHS suggest that the linkages between the apprehensions records controlled by USBP, OFO, and ICE in the ENFORCE database are limited to uses that relate specifically to enforcement. Linkages across the data sources for broader analytical purposes would require approval from each of the three agencies, and the full database has not been widely used for analysis. If one wants to analyze apprehensions at the border, integrating USBP and OFO apprehensions records is essential.4 To understand how U.S. enforcement, either at the border or in the interior, affects attempts at un- authorized entry, integration of the ICE and USBP databases is necessary. As discussed in Chapter 2, increasing enforcement in one border sector may 4  See also National Research Council (2011:50-51) for a discussion of how the immigration enforcement data published in the widely used DHS Yearbook of Immigration Statistics do not completely reflect the immigration enforcement activities undertaken by all relevant DHS agencies.

OCR for page 73
88 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER redirect attempts at entry to other sectors, implying that analysis of enforce- ment requires looking at the border as an integrated whole. That said, administrative data from DHS are alone insufficient to esti- mate the flow of unauthorized migrants across the border. These data report the number of individuals who are captured by USBP at entry but contain no information on those who elude capture. Data on re-apprehensions of individuals provide insight into the migration process but do not carry information about the “got aways.” Because those apprehended may ei- ther try to cross the border again or return to their home in interior Mexico, knowing the fraction of those re-apprehended does not allow one to identify the probability of apprehension. Without knowledge of the ap- prehension probability, one cannot use the level of apprehensions to make inferences about the flow of undocumented migrants across the border. It is unlikely that having access to the DHS administrative data would have changed the panel’s conclusion. Despite their limitations, the administrative data still have many uses for understanding unauthorized migration. By combining administrative data with survey data, one can produce a model of individuals’ migration decisions in Mexico that is informative about how attempts at illegal entry respond to changes in the economic environment. Such an approach would combine a behavioral model of the decision to migrate, analyzed using sur- vey data, with a model of the stochastic process governing apprehensions, analyzed using administrative data. Although this approach can produce estimates of the flow of unauthorized migrants across the border, it incorpo- rates assumptions about migrant behavior and the statistical properties of apprehensions that may not be open to empirical validation. Still, models of migration that combine administrative and survey data would provide DHS with additional tools for analyzing the effectiveness of border enforcement and expected future workloads for USBP agents. Chapter 6 examines in more detail a variety of modeling techniques and approaches. Because DHS administrative data have not been made public, they have never been evaluated by independent scholars for their quality, com- pleteness, or reliability—an omission that is significant in light of the role that apprehensions data could play in informing model-based approaches to estimating flows. As the discussion in Chapter 6 will also make clear, knowledge of and experience with the use of model-based approaches for estimating flows are so far limited, and the attendant complexities and uncertainties are considerable. In order to develop, apply, and continually refine specific modeling approaches, DHS will need to engage with the broader scientific community in a sustained and long-term fashion. This will be possible only if the administrative data discussed in this chapter are made widely available. Currently, however, DHS shares its administrative data on apprehen-

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 89 sions of unauthorized immigrants only with those with whom it contracts to perform confidential analyses. Such analysis comes at high monetary cost to DHS and is rarely subject to peer review in the manner typical of aca- demic research. There is a large community of scholars actively studying il- legal immigration. Providing this community with access to the ENFORCE database would likely produce dozens of new academic studies that would be available to DHS at no charge. Because most of these studies would be subject to peer review by academic journals, they would arguably be of higher quality than the consulting reports that DHS currently acquires. The wide dissemination of data (along with the integration of data from surveys with data from administrative records) is also a recommended practice for federal statistical agencies, which have developed a number of procedures for providing research data access while protecting the confidentiality of the information (National Research Council, 2009). A further benefit of putting individual-level apprehensions data in the public domain is that this could potentially improve the quality of data collection by EMIF-N and any future surveys that target individuals who have been apprehended by USBP. DHS’s administrative data presumably represent the full universe of apprehended migrants. As noted in Chapter 3, these data would be valuable in efforts, such as EMIF-N, to survey and estimate flows of such repatriated migrants. In particular, detailed data on the number and basic demographic characteristics of migrants repatriated, by time, date, and port of entry of return, would provide an independent measurement of this return flow that would be extremely helpful in both designing the survey’s sampling frame and correctly weighting estimates. Some in DHS have expressed concern that restricting the release of administrative data is necessary because information contained in the files is law enforcement sensitive. However, the number of apprehended indi- viduals subject to criminal prosecution for terrorism or the trafficking of drugs, arms, or people appears to be very small. Records on these individu- als could be excised from the USBP apprehensions database before their release to the public, without affecting the value of these data for analytical purposes. Even though the smugglers of illegal aliens already appear to have relatively accurate information on the rates of apprehension and success- ful entry into the United States, important operational information could nevertheless be safeguarded through broad geographic identifiers that link, for example, to USBP sectors rather than individual USBP stations. Others have argued that releasing administrative data risks violating the privacy of individuals who are apprehended. However, it is simple to transform individual identifiers in the USBP apprehensions database in a manner that would make the risk to privacy very low. For many research purposes, individual-level data would also be unnecessary. It would be sufficient to have aggregate data on the frequency of apprehensions for individuals bro-

OCR for page 73
90 ESTIMATING ILLEGAL ENTRIES AT THE U.S.–MEXICO BORDER ken down by age, gender, country of birth, sector of apprehension along the border, and time period (e.g., month and year). The panel believes that USBP would be able to release a much more complete individual-level file by implementing masking methods for problematic fields in the records and by releasing data with sufficient delay, for example a full year, to diminish their sensitivity for operational use and deployment. It should be noted that there are other mechanisms in addition to properly constructed public use files for providing researcher access to data while protecting privacy. These mechanisms include: establishing one or more secure enclaves for researchers to access microdata, similar to the U.S. Census Bureau’s network of Research Data Centers or its National Sci- ence Foundation–Census Research Network; developing remote, monitored online data access services such as the system maintained by the National Center for Health Statistics (NCHS); and providing licenses to individual researchers for using confidential data at their institutions, as is done by the National Center for Education Statistics (NCES). However, these systems, unlike the construction of one or more public use files, require a level of staff and resources that would likely be difficult for DHS to establish and maintain over a sustained period. Given the basic nature of the information included in the DHS enforcement administrative databases and the popula- tion in question (i.e., unauthorized border crossers), DHS’s confidentiality and privacy concerns may also be different from those of NCHS, NCES, and the Census Bureau. •  ecommendation 5.1: DHS should integrate apprehensions data R from USBP, OFO, and ICE for analytical purposes. •  onclusion 5.1: Administrative data from DHS are alone insuf- C ficient to estimate the flow of unauthorized migrants across the U.S.–Mexico border. However, they could be combined with sur- vey data to produce useful insights about migrant flows and the ef- fectiveness of border enforcement. The use of modeling approaches in conjunction with disaggregated survey and administrative data is necessary for estimating these flows. •  ecommendation 5.2: DHS should sponsor and conduct research R on modeling approaches for estimating the flows of unauthorized migrants across the U.S.–Mexico border. •  onclusion 5.2: DHS would greatly benefit from making the ad- C ministrative data from its immigration enforcement databases

OCR for page 73
ADMINISTRATIVE DATA ON UNDOCUMENTED MIGRATION 91 publicly available for research use, as that would allow DHS to engage with the broader scientific community to develop, apply, and continually refine specific modeling approaches. DHS could develop ways of constructing masked and/or aggregate files for public release in order to protect sensitive information.

OCR for page 73