A significant difficulty with agent-based models is that their statistical properties are understudied. In general, it is not clear how one should make principled uncertainty statements about such models, nor how one can assess goodness-of-fit. On the other hand, these models enjoy a high level of face validity: if the rule sets are reasonable, then the model may seem more plausible than a model that encodes human behavior in complex mathematics. Also, agent-based models are easy to assess; if the emergent behavior is unreasonable, then the model is inadequate.
In the context of modeling immigration flows with an agent-based model, administrative data and surveys offer important opportunities for model tuning and falsification. For example, consider rules of the following kind:
• An agent decides whether to attempt illegal immigration according to a coin toss, where the probability of heads is a function of the agent’s age, income, marital status, the distance from the U.S. border, and other relevant covariates.
• If the coin toss leads the agent to attempt to immigrate, then the agent tries a certain number of times, until discouragement, where the number of attempts is a probabilistic function of the agent’s covariates.
• If the agent succeeds, then the agent will attempt to engage in various kinds of activity in the United States, such as migrant labor, home construction, joining a family member, and so on.
Obviously, these rules are simplistic and offered only as illustration. The important point is that one can tune these rules, in principle, according to data in the administrative records and surveys. If, in a given year, the age mix of those interdicted at the border does not match the mix generated by the agent-based model, then this indicates that the model is incorrectly specified. More directly, the data enable the modeler to fit the functions that determine how the covariates affect the coin toss, or how easily an agent with certain characteristics will be discouraged.
One can address the problem of making inference from agent-based models in at least two ways. One way is to do sensitivity analyses and see how the outputs vary across reasonable ranges of inputs. This is particularly useful given that certain important information (e.g., the probability of successfully crossing the border in the first, second, or later attempts) is not available. A second way is to build an emulator, which creates a mathematically simpler model that approximates the agent-based model. Using methods introduced by O’Hagan (2001) and developed by Gramacy