survey, or substantial modification of a current one, would be very costly. Moreover, its design would have to be sufficiently flexible to reflect the dynamic nature of the migration process.

Similarly, administrative data collected by the U.S. Department of Homeland Security (DHS) along the U.S.–Mexico border (which were collected for purposes other than the estimation of migration flows) are likely to provide only a partial picture of the activities of undocumented migrants and cannot be used in isolation to draw inferences about migration flows (see Chapter 5). The difficulty in estimating flow from current data sources persists even if statistical modeling techniques, such as capture-recapture methodology and other sampling strategies, are used to estimate these hard-to-count populations.

Based on the panel’s conversations with U.S. Border Patrol (USBP) agents during site visits to Arizona and California, it is clear that USBP already attempts to combine information from different sources to forecast border-crossing activity, albeit in informal ways. In addition to whatever the surveys may indicate, agents make use of their own administrative data, their previous experience, and other sources of information that include, for example, occupancy rates of hotels on the Mexican side of the border, sign-cutting (i.e., observing and tracking footprints and other physical signs of migrant passage), and remote sensing data.

Building upon what USBP already does in practice, this chapter discusses more formal ways for combining varied sources of information to estimate unauthorized migration flows with geographic and annual/quarterly specificity. These methods include conventional approaches, such as probability models, regression models, and spatiotemporal processes, and more recent methods such as agent-based modeling.

To fit a model, one wants to have a training sample for which both the explanatory variables (such as economic pressure, enforcement effort, point of origin) are known, and also the true values of the response variable (such as the flow of illegal immigrants at a specific portion of the border). Such a training sample is difficult to obtain in this situation and will never be fully achieved. Nevertheless, a model for illegal flow will include many components for which data exist. For example, each border station records the number of people in different demographic segments who are interdicted that month, and surveys are available that indicate how many people in a particular town chose to seek work in the United States. A mathematical model for illegal immigration that is founded on good social science theory can be fit to the available data, and it offers reasonable hope of correctly tracking the unmeasured data. This hope can be approximately validated, or disconfirmed, if the model’s broad predictions for, say, the total number of illegal Mexican immigrants are not consistent with estimates obtained



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement