Read "Immigration Statistics: A Story of Neglect" at NAP.edu

« Previous: Appendix A: Selected Forms

Page 203 Cite

Suggested Citation:"Appendix B: Some Methodological Issues in Analyzing Data on Immigration." National Research Council. 1985. Immigration Statistics: A Story of Neglect. Washington, DC: The National Academies Press. doi: 10.17226/593.

Page 204 Cite

Page 205 Cite

Page 206 Cite

Page 207 Cite

Page 208 Cite

Page 209 Cite

Page 210 Cite

Page 211 Cite

Page 212 Cite

Page 213 Cite

Page 214 Cite

Page 215 Cite

Page 216 Cite

Page 217 Cite

Page 218 Cite

Page 219 Cite

Page 220 Cite

Page 221 Cite

Page 222 Cite

Page 223 Cite

Page 224 Cite

Page 225 Cite

Page 226 Cite

Page 227 Cite

Page 228 Cite

Page 229 Cite

Page 230 Cite

Page 231 Cite

Page 232 Cite

Page 233 Cite

Page 234 Cite

Page 235 Cite

Page 236 Cite

Page 237 Cite

Page 238 Cite

Page 239 Cite

Page 240 Cite

Page 241 Cite

Page 242 Cite

Page 243 Cite

Page 244 Cite

Page 245 Cite

Page 246 Cite

Page 247 Cite

Page 248 Cite

Page 249 Cite

Page 250 Cite

Page 251 Cite

Page 252 Cite

Page 253 Cite

Page 254 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix SOME METHODOLOGICAL ISSUES IN ANALYZING DATA ON IMMIGRATION INTRODUCTION The body of this report has been concerned largely with the process of collecting and disseminating data on immigration and the foreign-born. Analytical issues have been touched on, but no detailed examination of analytical procedures has been attempted. This appendix incorporates three papers concerning analysis, the first two by Kenneth Hill, of the panel staff, and the third by Kenneth Wachter, a member of the panel. Although the papers have benefited from the comments of a number of reviewers, they nonetheless represent the views of the authors rather than of the panel as a collective entity. They are included here because they concern issues central to the panel's charge and were prepared as a part of its overall work plan. We hope they will serve to stimulate both discussion and new research areas. The first paper outlines three methodological procedures that could be applied to data that either are available or could be made available at little expense and with little change in current administrative practices. The methods outlined are acted at measuring stocks or flows that are poorly documented by existing statistics: emigration of immigrants admitted for permanent residence (and, coincidentally, an estimate of average coverage of the Alien Address Report system), the size and growth of the population of illegally resident aliens, and net flows of U.S. citizens. These methods are intended to be illustrations of ways in which particular types of data might be used for analytical purposes and to indicate the potential analytical value of compiling or processing data that are already collected. The estimates obtained by these methods are also intended to be illustrative rather than substantive--for substantive applications, the necessary data must be available and the extensive assumptions underlying the methods must be evaluated in the light of the results obtained. The methods proposed do have some promise for producing useful new estimates, to complement rather than to replace existing ones, and it is hoped that, even if these methods in the form presented do not prove viable or prove to be excessively sensitive to critical and unsupportable assumptions, their presentation will stimulate discussion and the development of new approaches to the use of available or easily generated data. 203

204 The second piece is concerned with estimating the size of the illegally resident population of the United States. Estimating the size of this population, and still more its characteristics, poses serious and special measurement problems, since the population itself is, for obvious reasons, anxious to avoid any unnecessary contact with officialdom. As a result, the methods applied, though often ingenious, also often rely on extensive assumptions that are hard to justify. Hill reviews the major empirical studies that have been made of the size of the illegal population and examines their results in the context of their methodology and assumptions. Several of the methods have been reviewed elsewhere, and little new about these methods is presented here; however, some of the methods have not been subjected to detailed examination before, and it seemed useful to cover all the major methods and their results in one place and to pull together and evaluate all the available empirical estimates of the size of the illegal population. The paper is intended as an evaluation of the various estimates and thus concentrates on the negative rather than on the positive aspects of the methodologies used. The reader should bear in mind, however, that the measurement problems involved are particularly severe and that any methods used will inevitably involve assumptions and approximations that are hard to justify. It is an area in which new approaches or the use of different data are to be welcomed, and in which a wide margin of uncertainty in the estimates derived should not be interpreted as a criticism of the methodology or of the attempt. The third piece discusses the issues of imputation and treatment of missing data with particular reference to procedures of the Immigration and Naturalization Service (INS) and the presentation of data in the INS statistical yearbooks. Wachter argues strongly that the procedures currently used by INS should be reviewed in teems of their statistical validity and should be carefully documented in the Statistical Yearbook so that users can be aware of how the necessary imputations have been made and be alerted to how such imputations might affect the data.

205 Indirect Approaches to Assessing Stocks and Flows of Migrants Kenneth Hill INTRODUCT ION Statistics for U.S. migrant groups are of very variable quality. The best data made available by the INS cover first arrivals of permanent immigrants, or those changing status to permanent immigrant, and those naturalizing to U.S. citizenship. These data seem to be fairly reliable, in general if not with regard to all the available detail, even though they have suffered from severe processing and publication delays in the last few years. Figures on first arrivals of refugees, published very promptly by the Office of Refugee Resettlement, also seem to be reliable. Some elements of inflow are thus adequately covered by existing statistics. Inflows of temporary visitors, returning citizens, and returning resident aliens are less satisfactory. Although total arrivals by air are reasonably well recorded by the INS, processing of arrival declarat ions for aliens has been sporadic in recent years, and permanent residents are no longer required to complete such declarat ions. The situation for the inflow through land border ports of entry is worse; in many cases no direct head count is made, the total flow being estimated as the product of numbers of cars and an average occupancy figure derived from semiannual surveys, also used to estimate c it izen/nonc it izen rat ios ~ see Chapter 4 for a more comple te descript ion of these procedures). Since the gross inflow across land borders represents the great majority of total inflow, INS estimates of total inflow cannot be regarded as satisfactory and in any case exclude any inf 1 ow of und oc ument ed a 1 fen s. No systematic attempt is even made to record outflow; although temporary visitors are required to complete a declaration on departure, compliance is high only at airports. Departures of all passengers by air are recorded by the INS from airline reports, but coverage of charter flights appears to be incomplete ~ see Chapter 5~. No attempt is made to record departures of citizens or permanent residents at land border points or even to est imate the number of vehicles crossing. There is thus no basis for estimating gross out flow from the United States and no basis for monitoring changes in population stock. Until 1981, the INS attempted to monitor the stock of resident aliens through the Alien Address Reporting system; however, reporting was widely felt to be incomplete and the system was scrapped, although resident aliens are st ill required to register changes of address with the INS. (This requirement is seldom observed, however, and the forms are not processed. ~ Information on the population stock and inflows is available from the decennial census, which collects country of birth, citizenship, period of arrival for the foreign-born, and residence one year and five years before the census. The accuracy of some of the information, which

206 is self-reported, is open to question, and the coverage by the census of undocumented aliens is unknown. There are thus major deficiencies in U.S. international migration statistics, the two most important being the size and structure of the undocumented population and emigration of both U.S. citizens and of noncitizens. Numerous ingenious approaches have been developed to obtain estimates of the stocks and flows involved. Siegel et al. (1980) provide a useful review of methods used to estimate illegal immigration, and methods of estimating emigration have been reviewed by Passel and Peck (1979) and Warren and Kraly (1985~. This paper describes three potentially useful new indirect approaches to the estimation of stocks and flows of U.S. migrants. Unfortunately, the approaches are based on data that are no longer collected, from the Alien Address Reporting system, on data that are collected but not processed, from records of deportable aliens located in the United States or on data that are difficult to compile, from foreign census counts of U.S. citizens or the U.S.-born living abroad. The immediate practical applicability of the methods described is thus severely limited, but the methods are described in order to indicate some directions that analysis could take if fairly simple procedures for data collection, processing, or compilation were instituted. They are not proposed as final solutions to the measurement problems with which they are concerned. Like all indirect methods, they too involve assumptions and approximations that will affect the results. Rather, these approaches illustrate how certain types of data could be used to obtain estimates of stocks and flows of people in, into, and out of the United States. We hope this illustration of the application of somewhat different approaches to the problem will generate further thinking, which may stimulate additional future research in this area. The first method uses information on Reportable aliens located by duration of illegal residence and other simple characteristics to estimate the size and structure of the nonlegal population of the United States. The second method combines information from the Alien Address Reporting system with information on numbers of new i~u~igrants and naturalizations to estimate both the coverage of the address reporting program and the emigration of resident aliens. The third method uses census data from other countries on the U.S. citizen or U.S.-born population resident in those countries to scale information from an administrative data source--Internal Revenue Service tax filer records--on the U.S. population living abroad. THE SIZE AND STRUCTURE OF THE NONLEGAL POPULATION OF THE UNITED STATES Numerous methods have been described for estimating the number of nonlegal residents of the United States or major components of this population (see the second paper in this appendix) for a review of the more important studies). The approaches proposed here use information on locations of Reportable aliens by duration of illegal residence to estimate the size and duration structure of the underlying population, first assuming the population to be demographically stable and second using duration-specific growth rates. The INS collects information on Reportable aliens located on form I-213 (see Appendix A) but has not

207 processed the data systematically, although Davidson ( 1981) has described the results of processing a sample of the forms completed in calendar 1978. The use of I-213 data to est imate either numbers or characterist ics of the nonlegal population is not straightforward for a number of reasons. First, the locat ions occur very large ly at short durat ions of illegal stay; in fiscal 1982, for example, 75 percent of the 963,000 locat ions were of aliens with a durat ion of illegal stay of 30 days or less, and 50 percent occurred at entry; a high proportion of these locat ions may be of the same person located several t ime s in the same year. Second, the located aliens cannot be regarded as a random sample of the underlying population, since probabilities of location are likely to vary by characteristics such as sex, nationality, and occupation. Third, the quality of the data on the I-213 is widely regarded as low, and although no thorough evaluation has been made, Davidson ( 1981) shows that employment and residence characteristics suffer from high levels of nonresponse. These shortcomings no doubt partly explain the INS I s failure to process I-213 forms on a routine basis. Before describing and illustrating the methods in detail, it is use fu 1 to provide a genera 1 expl anat ion of why the me thod s might be expected to work at all. To make any analytical use of locations, it has to be assumed that the number of locat ions is re lated to the number of deportable aliens who can be located. If the number of locations is determined by INS targets, or the INS locates as many deportable aliens as it can given existing resources and manpower, there will be no systemat ic relet ionship between locat ions and populat ion at risk, and locations will provide no basis for estimating the size of the population. No empirical basis exists for assuming a relationship between locations and population, but it does seem plausible that if the Reportable alien populat ion were doubled, the INS would locate at least some more deportable aliens without any increase in ef fort, although locat ions might inc rease by a fac tor of le s s than two. Even accept ing this assumption of a positive elasticity of locations to population, it might appear at first sight that a series of numbers of locations by duration could only indicate relative, not absolute, rates of location by duration. Sets of location rates that are the same in duration pattern but different in level will produce the same numbers of locations at each duration when applied to populations that share a given distribution by duration but are appropriately scaled. It is not obvious, therefore, that recorded numbers of locations can tell us anything about the size of the underlying population. However, there is a link between the two because the number of locations affects the size of the population, in much the same way as deaths affect the size and age distribution of a closed population. If locations were the only source of attrition, the parallel with deaths would be exact, and methods for estimating population size from deaths by age for a stable population (that is, a population changing at a single, constant rate at all ages and thus maintaining a constant age structure though not a constant size) , such as that proposed by Preston et al. (1980), or for a general population (Preston and Coale, 1982), could be applied to locations by duration. In practice, voluntary return migration, change of status' and deaths also contribute to the attrition of the population of illegal aliens, so estimates based on locations alone will underestimate the true size of

208 the population unless allowance is made for other unobserved types of loss. The first approach assumes that the illegal alien population is stable in the demographic sense of having a constant, unchanging rate of change at each duration of illegal residence. In such a stable population, the number reaching duration d in a year, N(d), can be expressed in teems of the number of entries in the year E, the stable rate of change r, and the probability of surviving from entry to d, p~d): N(d)=Ee~r~p~d) (1) The average population at all durations, P. can be found by integrating equation 1: w .. w P=/ N(d)dd-EIe~rdp(d)dd (2) 0 0 where w is the highest duration attained. In any population, the rate of change r is equal to the entry rate, E/P, less the loss rate, LIP, where L is total losses; substituting rP + L for E in equation (2) and rearranging gives P/L=ie~rdp(d)dd/[l-ri e~rdp(d)dd] (3) 0 0 Also in a stable population, survival to duration d can be expressed in teems of losses by duration, ltd), and r: pod)=; 1(djer~dd// 1(djer~dd (4) d O If we now assume that losses from INS locations, D(d), form a constant proportion of all losses at all durations d, p~d) can be expressed in terms of D(d) and r, since the constant proportion will cancel out in equation 4. We can now apply equations 3 and 4 to Davidson's data on locations by duration of illegal stay for 1978, assuming different growth rates, and limiting the analysis to locations at durations of one month or more. Equation 4 has been evaluated assuming that locations are distributed evenly over each duration group, applying a value of d for the midpoint of the interval, except for the open 7+ years interval, for which a value of 9.5 years was assumed. The integrals in equation 3 were then evaluated trapezoidally for each duration category. Calculations are shown in Table B-1. The ratio P/L, average population to average losses, increases from 1.655 for a zero growth rate to 1.882 for a growth rate of 5 percent to 2.168 for a growth rate of 10 percent; an annual growth rate of 10 percent implies a population doubling time of seven years. The estimated P/L is surprisingly insensitive to the assumed growth rate. The estimates of P/L do not provide a basis for estimating P directly, since we do not know the value of L, average annual losses. However, we can obtain estimates of P for a range of assumptions about the value of L/D, total losses to location losses. Total locations at one month duration or more were 231,274 in 1978. If locations were 25 percent of total losses, the value of L would be 0.93 million, and the alien population present illegally in the United States for a month or

209 more would then be 1.53 million for a growth rate of zero, 1.74 million for a growth rate of 5 percent, and 2. 01 mil lion for a growth rate of 10 percent. If locations were 50 percent of total losses, each estimate would be halved. Locat ions data do not suggest a rapid growth rate of the populat ion. In 1979, 245,118 deportable aliens illegally resident for a month or more were located, so if location rates remained constant the underlying population grew at 5.8 percent annually. A growth rate around 5 percent thus seems more likely than one of 10 percent. We have little guidance for a plausible figure for L/D, though Garcia y Griego (1980:Figure 3.3), using data from the Mexican CENIET border survey on migrat ion histories of Mexicans returned by the INS, found that about 60 percent of returns to Mexico over the period 1970-1977 resulted from INS locat ions, and about 40 percent were voluntary. These results suggest that an L/D ratio of 2. 0, allowing for deaths and legalizat ions in addition to voluntary returns, is more plausible than a ratio of 4. 0, at least for Mexican illegal residents. Using these assumptions, the data suggest an illegal population resident one month or more that averaged around 0.9 million in 1978. This procedure can also provide a number of other interest ing results. For a growth rate of 5 percent, the ratio P/L is estimated at 1. 882; this rat lo is the inverse of the loss rate, which is therefore est imated as 0. 531. The entry rate, E/P, is equal to the loss rate plus the growth rate, and is therefore est imated as 0. 581. If the rat lo L/D is taken to be 2. 0, P is equal to 0.871 million, implying a value of E of 0.506 million. This value is the number of illegals achieving a month' s residence in 1978; since locations under a month in 1978 totalled 0.817 million, and the value of E is est imated at 0.506 million reaching a month without being located, total entries are estimated (assuming all losses at durat ions less than a month result from locat ions) at 1.323 million, of which the Border Patrol located 62 percent at entry or during the first month of illegal residence. We can also use the pi d) func t ions to calculate durat ion-spec if ic annual location rates, dividing the life table losses pods - p~d+l) by person-years lived by the life table popular ion, approximated by n~p~d) + Fidel) ~ / 2, where n is the length of the duration interval in years, and then dividing by 2.0 again to allow for the assumption that only half the losses resulted from locations. The resulting location rates nld are shown in the last column of Table B-1. One comfort ing feature of the rates is that those for the open interval, wld, which are set at 0.200 by assigning a uniform distribution over 5 years, are more or less consistent with the rates for shorter durations A discomforting feature is that the rates are lowest for the duration interval 1-2 years, whereas we might expect them to decline steadily with duration. A possible explanation would be that location losses represent a lower proportion of all losses at long durations than at short durations This explanation is tested in Table B-2, in which locations numbers are inflated by variable durat ion-spec if ic fac tors, averaging 2. 0 overal 1, and then manipulated using a growth rate of 5 percent. Three models are presented, ~ a) with the location proportion of all losses rising with duration, (b) with it falling, and (c) wi th it starting high for duration 1-6 months, falling sharply to a minimum for durat ion 7-12 months, then

210 rising steadily as duration increases. Model (b) does indeed produce locat ion rates that are essent tally constant at durat ions over one year. More surprisingly, the results using these three models suggest that the procedure is not very sensitive even to substant ial variat ions in the location to total loss ratios by duration, the estimated total population varying from 0.79 million for model (a) to 1.24 million for model (b) . The assumption of stability can be dropped if inflation is available on duration-specific growth rates. If duration-specific location rates were constant from year to year, population growth rates could be calculated directly from the numbers of locations in successive years, since the locations growth rates would be identical to the underly ing populat ion growth rates. Even if we wished not to as sume constant rates, we could assume a constant duration pattern for the rates and an overall growth rate to which the durat ion-spec if ic rates would be scaled. To apply this procedure, we need information on locations by durat ion for at least two consecut ive years. Unfortunately, such useful data are not available, but we present the methodology required and illustrate the effects of departure from stability for two different case s. Preston and Coale ( 1983) have shown that for a non-stable population, a -or r(x)dx N(a) = B e ° p(a) (5) where N( a) is the population age a, B the number of births, r(x) the growth rate at age x, and p(a) the probability of surviving to age a, all at some particular t ime t. By integration, the total population P is g iven by: a w W -or r(x)dx P=/ N(a)da=Bi e ° 0 0 p(a)da (6) In any population, the birth rate B/P is equal to the loss rate L/P plus the growth rate R. so equat ion 6 can be rewritten ~ replac ing age by durat ion) as d d W -or r(x )d x w -or r(x ) d x P/L= ~ e ° p(d)dd / [1 - R r e ° p(d)dd] (7) 0 0 we can estimate p(d) and r(x), we can then use this equation to estimate P/L. The variable growth rate version of equat ion 4 is d d w r r(x)dx w ~ r(x)dx P(d)=| l(d)e° dd/i l(d)e° dd (8) d O

211 Thus, given values of l(d) (or nld) and r(d) (or nrd) we can obtain p( d), the survival function needed in equation 7. Note that the values of lid) again do not need to be the correct level, as long as they have the true duration pattern, since a constant level factor will cancel out from the top and bottom of equation 8. Thus we can use locations nDd in place of losses Hid in equation ~ if we assume that locations make up a constant proportion of total losses for all durations We have no data to which to apply this more flexible approach, since Davidson' s data on locations by duration are for 1978 only and provide no guidance concerning duration-specific changes in locations. However, we can test the sensitivity of the stable assumption estimates derived above to a non-stable underlying population by assuming different patterns of duration-specific growth rates. Using the basic model with an overall growth rate of 5 percent and a constant location to loss ratio of 0.5, we illustrate in Table B-3 the estimates obtained assuming first that duration-specific growth rates fall with duration and second that they rise. The P/L ratios obtained bracket the ratio for a stable population, lower for falling rates and higher for rising rates, but differ from it by only 4 or 5 percent. Thus it appears that the stable procedure is actually quite insensitive to departures from stability, at least for the range of growth rates tested, as it was to substantial differences in the stable growth rate used. This insensitivity arises from the heavy concentration of locations at short durations for which the growth rate has only a modest effect. In conclusion, these methods make some strong assumptions, but the results are not very sensitive to many of them. Deviations from stability appear to be relatively unimportant, and the stability assumption can be relaxed if data are available for more than one year. Similarly, the results are not highly sensitive to the stable growth rate assumed in the stable method or to the overall growth rate in the non-stable method. The results are more sensitive to locations to losses ratios that change sharply with duration, although ratios that change by more than a factor of two affect the overall population to loss ratio by less than 50 percent. The assumption to which the final estimate is directly proport tonal is the overall location to loss ratio; a value of this ratio of 0.25 will produce an estimate of the illegal population exactly twice as large as will a value of 0.50. However, overall the methodology turns out to be surprisingly robust to deviations from the assumptions. It is likely to work best for groups with similar location and other loss probabilities, so it could usefully be applied to data on locations classified by sex and nationality groups, though not by age since age would introduce entries to and departures from the population considered as a result of birthdays. Data for consecutive years would also prove useful for relaxing the assumption of stability and for examining the consistency of the results. Given the limited data available, the results using location to loss ratios that fall with duration appear most plausible; with an overall location to loss ratio of 0.59 they suggest an average illegal alien population resident a month or more of 1.2 million for 1978, a figure by no means inconsistent with other empirical estimates available. This figure of course excludes the contribution of illegal immigrants at durations of 10 days or less, but their contribution in terms of person-years lived must be fairly small, even if their number is large;

for 1978, it would increase the estimate of 1.2 million by less than O.1 million. This estimate is of course only arrived at in order to illustrate how these methods work. More extensive data, permitting repeated applications, the relaxation of certain assumptions, and separate analyses for more homogenous subgroups, are necessary to establish the ultimate value of the methods for estimation purposes. ESTIMATING EMIGRATION OF RESIDENT ALIENS Until 1981, most aliens resident in the United States were required to report their address to the INS in January every year. Reporting was made by completing and mailing to the INS a special card (form I-53) available at post offices and elsewhere. The information collected is described in Chapter 4, and the form reproduced in Appendix A. Figures from the reporting system were published in the INS Statistical Yearbook by nationality and state of residence. Reporting under the system was widely regarded as being incomplete, one of the reasons why the Alien Address Reporting (AAR) system was dropped after 1981, and year-to-year fluctuations in the numbers of reporting foreigners can only be explained plausibly in terms of varying coverage. However, the information available provides some basis for estimating the emigration of permanent resident aliens. If all recording is complete, the number of permanent residents reporting in year t+l, PR(t+l), should be equal to the number who reported in year t, PR(t), plus immigrants (both arriving and changing status), lit, less naturalizations, 1Nt, emigration, 1Et, and deaths in the United States of permanent immigrants, 1Dt. Thus PR(t+l)=PR(t)+ 1It-INt-(lEt+lDt) ( ) If reporting in years t and t+1 was kits and k~t+l) complete, and PRR(t) and PRR(t+1) are the numbers reporting, then PRR(t+l)/k(t+l)=PRR(t)/k(t)+ iIt - 1Nt -(lEt + IDt) or PRR(t+l) k(~+~) k(~+~) ( E + D)+k(t+l) 1 t PRR(t) k(t) PRR(t) PRR(t) Since PRR(t) = k(t)[PR(t)], we can write PRR(t+l) k(t+l) k(t+l) (1 Et + 1 Dt) + k t+1 (1 It ~ 1 Nt) _ ( ) PRR(t) k(t) k(t) Pit(t) PRR(t) =- [l-R(t)] +k(t+l) lIt-lNt' (10) k(t) PRR(t)

213 where it(t) is a loss ratio of deaths and emigrants divided by the initial population; if deaths and emigration are regarded as minimal for immigrants during their year of entry, it(t) can be regarded approximately as a loss rate equal to the sum of the death and emigration rates (note that the denominator of it(t) is the true, not the reported, population at time t). If over a number of years k(t) and it(t) are approximately constant, equation 10 becomes PRR(t+l) `1 R'+k ~I~-~N~ (11) PRR(t) PRR(t) where R is the loss rate, k is the average coverage completeness of the AAR system, and lit, 1Nt, PRR(t) and PRR(t+l) can be obtained from INS statistics. R and k can thus be estimated by plotting the ratios in equation 11, and fitting a straight line of intercept (1-R) and slope k. The estimated value R is not an emigration rate but rather a combined emigration and death rate. The emigration element could be obtained by subtracting a death rate calculated on the basis of the age distribution of the population being considered; this death rate would probably not exceed 10 per 1~000 for the immigrant populations from most countries of orlgln. The derivation above suggests some practical implications for applying the method. Since it(t) and k(t) are assumed to be constant, the method should be applied to groups as homogenous as possible, such as country of origin by sex groups. It is also clear that the method will not work well if (a) the fluctuations in kits or it(t) are large, or (b) lit ~ 1Nt is small relative to PRR(t), or (c) (lIt ~ 1Nt)/PRR(t) varies little over time. Simulations suggest that the line should be fitted to the points using a group mean procedure, ordering the observations by the values of flit ~ 1Nt)/PRR(t); that the resulting estimate of R is reasonably robust to random fluctuations in it(t) and kite; but that the resulting estimate of k is much more sensitive to such fluctuations. It is also necessary to discuss in more detail the effects of the assumption that it(t) and k(t) can be summarized by average values R and k applying to the whole period. Simulations suggest that random variations around the average values will have little effect on R but will have a more pronounced effect on the estimate of k, tending to reduce its value. Underlying trends in it(t) and kit) might be expected to have more substantial effects, however. Limited_simulations suggest that trends in it(t) result in overestimates in R and k if it(t) is increasing, and underestimates of R and k if it(t) is declining; the effect on R is small, the estimate not deviating much from the average value, but the effect on k is substantial, and the estimate might be in error by as much as plus or minus 5 percent for a trend in it(t) over a 15-year period of about 1 percent per annum. A trend over time in k(t) has relatively little effect on the estimate of k, which works out close to the weighted average of k(t) regardless of the direction of the trend, but the estimate of R is biased upward by declining coverage and downward by increasing coverage. In general it can be concluded that the estimates of R and k are reasonably robust to trends in it(t) and kite, so long as

214 there is reasonable year-to-year fluctuation in PRR(t+l)/PRR(t) and (lIt-lNt)/PRR(t). The method is applied for the period 1959-1979 to data on permanent immigrants from Colombia, Mexico, the Philippines, and the United Kingdom in Table B-4; Figure B-1 shows scatterplots of the basic ratios and the fitted straight lines. These applications are quite interesting. For the United Kingdom, the independent variable (lI~-lNt)/PRR(t) is small and varies little, but considerable variation is found in the dependent variable PRR(t+l)/PRR(t), presumably arising from fluctuations in kite. As a result, the points show no obvious linear trend, and estimation is impossible. Mexico is somewhat simil ar, though in th i s case neither variable shows much variation from year to year; though the fluctuations are small, the points appear to show some linearity, and the estimated loss rate is 4 percent, or about 40,000 a year on the reporting population of about 1 million. The estimated coverage of the AAR is 121 percent, however, and though the fit is by no means close, the points do seem to indicate overreporting by Mexicans; some nonpermanent residents such as those on educational or temporary worker visas, and possibly some illegals, may have reported themselves as permanent residents under the reporting system. For Colombia and the Philippines, the two ratios show much more variability and a more pronounced linear trend; for Colombia the loss rate is 2.6 percent and the coverage 85 percent, whereas for the Philippines the loss rate is higher, 7.8 percent, the coverage well over complete at 135 percent, and the fit quite respectable. Once again, overreporting may arise as a result of misreporting of residence status. It may be concluded that this method will not work in all applications, as a result of fluctuations in it(t) and kits from year to year, but that in some applications it seems to produce reasonable estimates of both coverage and loss rates. The death of the AAR system in 1981 means that the method will not be of any use in the future, but it should be applied more widely to data for the 1960s and 1970s. Such applications should experiment with different fitting procedures, such as trimmed means, and with fitting to different time periods to assess the possible impact of trends in kite and R(t). COMBINING ADMINI STRATIVE AND FORE ION CENSUS DATA Population censuses often collect, and sometimes tabulate, information on country of birth or country of citizenship. No systematic attempt has been made to estimate emigration from the United States on the basis of foreign census data on Americans abroad' for some very good reasons, among them difficulty of access to the data, lack of timeliness, variation in census dates, and variation in census content. However, the success of the IMILA project, coordinated by the U.N. Latin American Demographic Centre, which estimated migration flows in Latin America from birthplace data contained in samples from population censuses in the Americas, suggests that more could be done through international cooperation. The issue of timeliness is important e Censuses are generally taken at best only every 10 years, and detailed results rarely become available until 3 or 4 years later. Thus even if the required data on the

215 American-born or American-nationality population by age and sex are tabulated, the best that can be hoped for is a stock figure every 10 years and never less than 3 or 4 years out of date, or an intercensal flow averaging at best almost 10 years out of date. This problem can be alleviated by combining foreign census data, which suffer from timeliness problems, with U.S. administrative data sources that are potentially much more up to date. American citizens meeting residence requirements abroad can claim tax benefits as a result of such residence; and the Internal Revenue Service (IRS) has analyzed such returns for the tax years 1975 and 1979 and is currently processing returns for 1983. Unfortunately, changes in tax law regarding foreign residence and in IRS tabulation procedures invalidate a comparison of the 1975 and 1979 data; for instance, the number of returns claiming residence in Canada declined from 18,700 in 1975 to 2,500 in 1979. In addition, tax returns count households rather than individuals, so adjustment is required on the basis of foreign census data to obtain numbers of residents abroad from numbers of tax returns.- However, the use of foreign census data, and possible combination with IRS information, can be illustrated for the case of Japan. The number of American citizens resident in Japan is available by age and sex from the 1970, 1975, and 1980 censuses. Table B-5 shows the reported numbers and the expected numbers in 1975 and 1980 derived by projecting forward the populations from the 1970 and 1975 censuses (survivorship ratios for the projection were taken arbitrarily from a Coale-Demeny "West" model life table with a life expectancy at birth of 75 years for females and 71 years for males). For each age group except that of 0-4, the difference between the reported and the expected numbers is taken as net (surviving) emigration from the United States to Japan; many of the 0-4 age group are likely to have been born in Japan and are thus not emigrants from the United States. The resulting net emigration figures are also shown in Table B-5. The age pattern of the estimates is plausible: negative up to age 20, reflecting net return migration to the United States, positive from 20-39, reflecting migration to Japan, and negative above age 4O, except for the odd blip in old age, again reflecting return migration. Overall, the data suggest net return migration of about 100 between 1970 and 1975 (nearly 400 male emigrants being more than balanced by over 500 female returnees) and of nearly 1,200 between 1975 and 1980 (with only a slight preponderance of females). The IRS recorded 5,100 tax returns filed for 1975 by U.S. citizen residents of Japan (both those with bona fide residence and those under a 17-month foreign presence rule), compared with a 1975 Japanese census figure of 18,755 U.S. citizens living in Japan. On this basis, later IRS estimates of tax filers resident in Japan would have to be inflated by a factor of 3.7 to estimate the U.S. citizen total. The IRS number of tax filers from Japan in 1975 in fact looks quite reasonable given the age and sex structure of the population, which shows 6,566 male and 4,920 female U.S. citizens aged 20-64 in the 1975 census of Japan given that many of the tax returns will cover husband and wife joint returns. Unfortunately, the IRS tables for 1979 tax returns do not classify Japan separately, so the application of the 1975 inflation factor to filers resident in Japan in 1979 cannot be tested by comparison with the 1980 Japanese census results.

216 The use of citizenship information in the above analysis is less than ideal because citizenship can change. U.S. immigrants who naturalize but then return permanently to their country of origin are likely to be missed because they are likely to assume their original citizenship, and births abroad to U.S. citizen parents appear in the gross figures as emigrants. Information by birthplace would be less ambiguous, though such information would still exclude emigration of U.S. immigrants. The application of this procedure to all countries with resident U.S. emigrants would be tedious at best and impossible as a result of data unavailability at worst. However, it would not be impossible to focus on some 10 or 15 key countries covering a high proportion of U.S. emigrants, to obtain the necessary data either from published sources or by requesting special census tabulations from the statistical authorities involved and updating the estimates of the U.S. population living in each country on the basis of IRS tax returns claiming foreign residence exemptions. The IRS data could also be used to extend the coverage of the analysis from the key countries selected to the entire world by assuming constant inflation factors for regions or continents. The resulting estimates, though by no means perfect, would at least add to the weak existing empirical basis for estimating net flows of the U.S.-born or U.S. citizen population. , ~ CONCLUSIONS U.S. international migration statistics are particularly poor regarding the nonlegal immigrant population as well as emigration both of U.S. citizens and immigrants admitted for permanent residence. Fairly straightforward though indirect methods of analysis using data that are collected but not processed or compiled, or data that could be collected relatively simply, could provide complementary estimates of the size of these stocks and flows at a modest cost. The methods proposed in this paper require further empirical testing and theoretical refinement, but they illustrate how techniques developed in analytical demography can be usefully applied to migration statistics. REFERENCES Davidson, C.A. 1981 Characteristics of Deportable Aliens Located in the Interior of the United States. Paper presented at the annual meetings of the Population Association of America, Washington, D.C. Garcia y Griego, M. 1980 E1 Volumen de la Migracion de Mexicanos no Documentados a los Estados Unidos (Nuevas Hipotesis). Secretaria del Trabajo y Prevision Social. Centro Nacional de Informacion y Estadisticas del Trabajo. Mexico City. Passel, J.S., and Peck, J.M. 1979 Estimating Emigration from the United States--A Review of Data and Methods. Paper presented at the annual meetings of the Population Association of America, Philadelphia.

217 Preston, S.H., Coale, A.J., Trussell, T.J., and Weinstein, M. 1980 Estimating the completeness of reporting of adult deaths in populations that are approximately stable. Population Index 46(2) :179-202. Preston, S.H., and Coale, A.J. 1982 Age structure, growth, attrition, and accession: A new synthe s i s . Popul at ion Index 48 ~ 2 ): 21 7-2 5 9 . Siegel, J.S., Passel, J.S., and Robinson, J.G. 1980 Prel iminary Review of Existing Studies of the Number of Illegal Residents in the United States. Mimeo. Bureau of the Census, U. S . Department of Commerce, Washington, D. C. Warren, R., and Kra ly, E . 1985 The Elusive Exodus: Emigrat ion from the United States. Population Trends and Public Policy, No. 8. Population Reference Bureau, Washington, D . C .

218 1.4 1.3 1.2 1.1 1.0 COLOMBIA 1 .08 . . . / /K - 1.06 1.04 ~1.02 ~/ · - ~= _ _ 1.00 - 0.9 MEXICO ~ / · · it. . . 1 ~ 0.02 0.04 0.06 0.08 ~0.98 /1 1 1 0 0.1 0.2 0.3 0.4 0 1 .3 1.2 1.1 1.0 0.9 0.8 0.7 1 1 PHILIPPINES / 1.3 . o/ / 1.2 · ·/ /.e _ 1 1 1 U.K. _ 1.1 1 .0 0.9 0.8 . _ , - . . _ . ~0.7 1 1 1 1 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 rlQuK~ ~-1 PlOtS Ot PRR (t+l) / PRR (t), vertical scale, against ( lit - 1Nt) t PRR (t), horizontal scale, for Colombia, Mexico' the Philippines, and the United Kingdom, 1959 to 1979.

219 TAME B-1 Estimation of Population to Loss Ratios P/L From Invocations by Duration Assuming Population to be Stable; Growth Rates of 0, 5, and 10 Percent; 1978 Durat ion d, d+n Locat ionsMidpo int rd* -rd (Years ~nDdd* nDde pi d) e pi d ~P/L old (a) r = 0.0 0.0-0.416 1686.208 1686 1.000 1.000 0.7S9 0.417-0.916 399.667 399 .519 .519 0.247 0.917-2.916 6981.917 698 .405 .405 0.164 2.917-4.916 4083.917 408 .205 .205 0.197 4.917-6.916 1965.917 196 .089 .089 0.230 6.917+ 1159.417 115 .033 .033 (0.200) Total 3502 3502 1.655 1.655 (b) r = 0.05 0.0-0.416 1686 .208 1704 1.000 1.000 0.686 0.417-0.916 399 .667 413 .555 .544 0.216 0.917-2.916 698 1.917 768 .447 .427 0.144 2.917-4.916 408 3.917 496 .247 .213 0.179 4.917-6.916 196 5.917 263 .117 .091 0.209 6.917+ 115 9.417 184 .048 .034 (0.200) Total 3502 3828 1.720 1.882 (c) r = 0.10 0.0-0.416 1686 .208 1721 1.000 1.000 0.609 0.417-0.916 399 .667 427 .595 .571 0.185 0.917-2.916 698 1.917 845 .494 .451 0.126 2.917-4.916 408 3.917 604 .295 .220 0.158 4.917-6.916 196 5.917 354 .153 .094 0.189 6.917+ 115 9.417 295 .069 .035 (0.200) Total 3502 4246 1.782 2.168

220 ~o · - E CO ¢ o . - Q v U~ o ~ ~ o C~ V ~ o Ct o ~ ~ 3 - , o CO C' o .,1 Ct ~ Ct cn u, cn o o ~_] ·,1 o ~r: ~ 0 o ~n · - o o o ~ o ·rl U o C~ C) ~ o u C~ V ~ ·- · - ·, 0 v U] 1 1 ~ o · - JJ E~ a 00 . C~ 0 a: ~ ~ cs~ ~O ° ~° c O ~ ~ ~ ~ C ~O ~ ~D ~ O ~ O d" C~ ~ O O O ~ U~ ~ C~ O · e e e e e e e e e e e ~: _. o 0- £ ~ 07 eq ~; o ¢ o 1 · - o ·rl p! cn f: ~ ~ O ~ ~ ~O ~ ~ ~ O c~ O O ~.,' ~ ~ ~ 1 - 00 ~ ~ J ~e e e e e e cn , O · - C~ O P ~: o ~ 0 u~ oo ~D 00 CS~ C~ ~ 00 ~_1 ~ LS) ~ ~ ~- _ ~ 1-U~ -) -~ ~ O O C C)O u~ a~ ~ I C~ ~ C~ C~ ~ ~ ~ ~ ~ O ~ ~ ~ C~ ~ C~ ~e e e e e e e e e e e e e e e e e e 00 00 ~ee _ 0 a~ ~ ~ ~ ~ u~0 ~ ~ c~ c~ ~ ~0 c~ ~ _ .O ~ ~ ~ ~ ~ ~O ~ ~ ~ ~ sD ~O c~ ~ =: O ~ C~ ~ O O UNO ~ ~ ~ ~ O ~O ~ ~ ~ I e e e e e e ee e e e e e e e e e (U _ _~_~ C~J ~ - C~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ C ~ ~ O ~ ~ ~ U~ ~ ~ O G ~ ~ '~ O~ ~;~- c~ `~^, ~ I- _ oo ~ I~ ~ ~ ~ ~- ~ c~ oo I~ ~r~ ~ cs~ a~ ~ ~ ~ c~ ~ ~c~ t_ ~I~ o ct 00 ~ ~ ~ ~ ~S ~00 ~ 1- ~ ~ 1- O ~ ~ ~ ~ ~O ~ ~ ~ ~ ~O ~ ~ ~ ~ _ ~ ~ cr~ ~ c~ ~D ~ ~ C~ ~ ~ cr~ e e e e e e e ee e e e e ee e e e u~ ~ ~ u~ a ~, ~ ~ u~ . - 3 ~o ~ u~ ~ sD 1- ~ ~CS~ ~ ~ ~ ~ ~ ~ ~ oo O ~ ~ ~ C~ O ~ c~ ~ ~ c~ ~ O cn 0 a' ~ ~ ~ ~ ~ ~ ~ 0 ~ ~ 0 ~ ~C U~ ~ ,_ CS~ U~ ~ ~ ~ C~ e e e e e e '"1 <) ~ 00 00 ~U ~C~ O ~ ~ O ~) ~) ~ ~ _~ _~ U~ · - ~q ·rl 0 ~ ~ ~ ~ 0 ~ a~ ~ ~ ~o 0 c~ r~ ~ ~ ~ O O U~ C~ ~ -~ ~ ~J C~ O U~ ~ ~ e e e e e e · - 3 ~ ~ 0 oo ~ ~ c~ oo ~ ~ o o~ ~ o bC ·rl ~_ C~ o 0 oo ~ ~ ~ oo a~ ~ 0 ~ ~ 0 to ~ ~ ~ ~ ~ ~ U~ C~ . - C~ . `0 ~ ~C ~D ~, ~ ~ _ ~t_ Oi a~ ~ cr ~a~ ~ a~ a~ ~e e e e e e e e e e e e O ~ ° ~ ~ ~ ° ~ ~ ~C O C~ ~ ~ ·,4 C~_' 1 1 1 1 + c ~1 1 1 1 1 + tt _1 1 1 1 1 + ~C Ct e ~ ~ _I _ ~e _1 _d _i _4 - ~CC e,- ~- _l ~1 ~- Ct ~O ~ ~ ~ ~ ~ ~O ~ ~ ~ ~ ~ ~O ~ ~ ~ ~ ~ ~ ~C ~| e e e e e O ~| e e e e e O ~| · e e · e O ~_ ~o ~ 0 c~ ~_ 0 0 0 c ~`_ c 0 0 ~ ~E~

221 . - Cal :> to CD A: c o . - Cal o earl Jo V o o o girl Ct o o U o girl ~ cn '= V O O ~ o 60 ~1 ~ C~ V ~ C) · - ~ V . - ~q C) U] ~) 1 1 ~ ~q O e~ ¢ ~ e~ ~ =_~0 1 ~0 OC . O ~ ~ ~ ~ C~ C~ 0 c~ _ C a: c~ O U~ ~ ~ C O · · e e · · ~ 0~= = - ~O O ~ C~ C~ O ~O U~ ~ C`l _ O O _· e e e e e e ~_ _ =_30 S" =_30 O ~ ~: . - o :~ U) o . - C' C~ O o ct + o ~ ~ Y o ~ ~ ~ ~ ~ o' ou~ - e · e · · · _ O ~ ~ cr' sD ~D ~) l~ U~ ~ C~ ~ O · · · ~ e e 0 ~ 1` m1 0 ~ a~ O ~ ~C 0 ~ U~ C~ _ 1 0 ~ C~ 00 ~ U~ O O O ~ C~ e · · · · e _ _ _ ~ _ _ 0 ~ ~ ~ ~ I_ ~ O O O O _ C~ C~ e e · ~ ~ ~ c~ u~ U~ ~ ~ ~ _ U~ O O O O O O C · · · · e e e - 0 0 ax c~ ~ _~ I_ _ ~ ~ OC ~ O r~ U~ _ ~ 0 O ~ O U~ ~ O O O -4 ~J ~ O e · · · · _ _' ~ ~ ~ r~ 0 ~ 0 c~ a~ c~ 0 ax O O O C~ ~ ~ · · · · · . _ ~ ' I~ 0 a~ u~ 00 00 0 0 0 · · · · · · ~ O O . ~·~1 S ~0 ~ ~ ~ ~ ~S~ O ~ ~ ~ ~ ~O ~ ~ ~ _ _ c~ ~ ~ a~ a~ ~c~ ~D ~ 0~ a~ ~J · ·· · · · · ·· · · ~ ~_ ~ ~ ~ ~ u, a~ ·~- ·- 3 3 ~o oo .,' . - _ ~= o~ oo 00 ~ u~ ~ce ~D 0N 0 0 ~ ~ ~ oo ~ os O ~ ~ 0 . - 0 a~ a~ 0 ~ _ 0 ~0 ~ ~Q ~ _ _ ~L' ~ ~ ~D ~ _ _ ~_ cn a s~ s ~ _ _ _ JJ _ ~ _ _ 4~, ~ a~ ~ ~ 3 c~ ox ~ o~ 3 · ~ ~ ~ o .... O ~ O C~ ~ \0 ~sD O C~ ~ ~ ~_ 1 1 1 1 + ~ 1 1 1 1 + C~ ~ ~ I_ ~ ~ _ ~ ~ ~ I_ I_ I_ · _ _ ~ _ _ C' · ~ _ ~ _ _ C~ O ~ ~ ~ o~ ~ u ~0 ~ ~ a~ ~| · · · · e O , ~I · · · · · O _O O O - + ~ E ~_ O O O ~ ~ ~ E~

222 TABLE B-4 Estimation of Emigration and Coverage of Alien Address Reporting System, 1959-1979: Colombia, Mexico, the Philippines, and the United Kingdom Colombia Mexico Philippines United Kingdom pRR(e+~) 1It-lNt PRR(t+~) 1It-lNt PRR(t+l) lIt-lNt PRR(t+l) Ilt-lNt Year t PRR(t) PRR t) PRR(t) PRR(t) PRR(t) PRR t PRR(t) PRR t 1959 1.125 0.201 1.011 0.043 1.007 0.016 0.873 0.051 1960 1.158 0.208 1.022 0.057 0.999 0.010 1.180 0.047 1961 1.071 0.238 1.054 0.074 0.787 0.014 0.937 0.047 1962 1.140 0.268 1.052 0.084 0.966 0.026 0.988 0.053 1963 1.296 0.337 1.058 0.063 1.008 0.023 1.040 0.066 1964 1.331 0.337 1.031 0.048 0.985 0.014 0.997 0.072 1965 1.28,6 0.249 1.024 0.055 1.070 0.043 1.028 0.057 1966 1.066 0.155 1.033 0.056 1.063 0.110 1.010 0.053 1967 1.073 0.115 1.023 0.053 1.173 0.176 1.072 0.063 1968 1.091 0.134 1.025 0.054 1.221 0.203 1.022 0.046 1969 1.081 0.119 1.017 0.054 1.248 0.225 1.018 0.022 1970 0.919 O.113 1.028 0.055 1.243 0.207 0.958 0.018 1971 1.173 0.080 1.046 0.066 1.170 0.165 1.036 0.011 1972 1.065 0.064 1.070 0.074 1.211 0.135 0.987 0.010 1973 0.982 0.069 1.057 0.075 1.040 0.121 0.970 0.009 1974 1.072 0.070 0.997 0.070 1.059 0.097 1.017 O.008 1975 0.983 0.069 1.020 0.061 1.028 0.103 0.956 0.009 1976 1.014 0.070 1.059 0.053 1.081 0.106 1.036 0.009 1977 1.111 0.095 1.003 0.052 1.002 0.105 0.962 0.013 1978 1.065 0.108 1.061 0.073 1.060 0.086 1.007 0.019 1979 1.061 0.098 0.992 0.045 1.032 0.106 0.982 0.023 Slope, k 0.850 1.212 1.347 0.464 Intercept, (1-R) 0.974 0.960 0.932 0.988 .

223 to CO 1 to Cal Ct o CO N . - · - U) o o As Jo ~0 Ct · - fig US 1 At: ED be Z O .,1 ~ Cal Ct 0 a, V C) O ~ X ~3 t0 Cal C: Z In O .,' ~ C) O ~ C) US 1- Cry X 60 O i ,_ ~ ~ C; Ct Ct ~4 US O O ~ 00 C' OC ~ ¢ '- . ~ ~ =+ ~ ~ O ~ ~ ~ O Cl' ~ ~ ~ _I ~ O ~ ~ 00 CJ~ ~ lo ~ Cal 1 · _' _1 1 ~ Cal _' 1 1 1 1 1 1 1 1 1 1 1 cs~ r~ u~ oo ~ ~ ~ ~ ~ r~ u~ ~ u~ ~u~ C~ ~ O ~ ~ C~ ~ ~ 00 C~ ~ ~ 0o ~ ~1- 1-~ 00 ~ ~ O ~ U) U~ · ~ ~ ~ ~ ~ o u~ o ~ oo ~ ~ ~ ~; c' ~ - + · oo oo oo oD u~ 1 c~ o · t~ _I _~ c~ cr~ c~ ~ ~ oo ~ u~ ~ ~ 1- ~ ~ ~ ~ ~ mm ~ ~ 1 ~ +4 1 1 · 1 1 ~ ~ ~ C~ 1 1 1 1 1 1 1 1 ~ oo u, ~ ~ ~ ~ '~ ~ '~ oo 1 co 4~J ~ ~ ~ ~ o oo ~ ~ C~) 1- ~ O oo ~ ~ oo ~ ~ oo · U~ ~ ~ oo ~ C~ C~ ~ ~ ~ ~ ~ O ~ ~ ~ O ~ C~ C~ ~ 0 ~ ~ oo ~ U~ 1 . oo ~ O ~ ~ ~ CO C~ o o~ ~o ~ ~ ~ ~ 0 ~ ~ 0 ~ c~ ~ ~ 0~ ~ ~0 11 1-~ O C~ ~ ~) ~t ~ ~ O ~t ~ r_ ~) ~s) I~ ~s r~ I~ ~ ~ ~ ~ ~ ~n c~ ~ ~ ~c~ ~C~ CO ~ oo ~ m U ~) O + C~ ~-l~ O \C) _I N ~ ~ ~ ~ ~ a~ ~ 00 ~- ~ C ~00 C ~C ~00 ·,l a~ c~ ~ ~ ~ ~ cs~ ~ ~ ~ c~ c~ J~l · · · · ~· · e · · · · · · · ~ - cn . ox ~ ~ ~ cs~ ~ ax ~ cr U) ~ ~D ~ ~ ~ a, oo I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + O U~ O U~ O ~ O ~ O U~ O ~ O ~ O U~ O ~ O _t _~ C~ C~ ~ C-) ~J -;t U~ ~ ~ ~ 1-~- 00 00 o E~

224 o - Ct f' o to at o . - C. pi o Up be C) U z in U] Cal Ct C: a' z on C) x ca 0 in ~ C' girl u o CJ _' Up 1 ¢ Ed C. o o C' be ¢ . ~ o ~ Go Cx) Cal ~ ~ o ~ ~ Cal Go ~ ~ Cal AL ~ ~ 0 ~ ~ cat ~ O ~ ~ ~ 1 ~ 1 ~Cal . 1 _' ~ ~ ~ 1 ~ 1 ~ 1 1 1 1 1 1 1 1 1 Can ~ ~ o ~ ~ ~ ~ Cat o Go ox ox o ~ ~ ~ oo m o ~ C~ l C-I U) ~) ~U) I_ I_ ~ ~ oo a~ J ~ l~ ~) C~ ~l ~) · c~ oo ~ ~n ~ c~ O ~ ~ ~ C~ · 00 00 OC) 1 - 00 00 CX) ~ ~ ~ ~O O . U~ ~ ~ C~ · ~ ~ C-) ~ ~ ~ 00 ~ ~ C ~ 00 ~ t- ~ ~ ~ ~ ~ O C~ ~ ~ ~ C~ 1 · 1 _~ c~ ~ ~ 1 1 1 ~_ 1 1 1 1 1 1 1 1 1 1 ~ ~ ~ r~ ~ U~ ~ U~ ~ ~ C~ ~ 00 1 ~ O c~ 1- _~ C~ 0 00 ~ 1- u~ ~00 ~ ~) C~ oo ~ ~ ~ a, oo ~ u~ U~ ~ ~ c~ ~ 00 · C~ ~l ~ 0~\ 00 ~ 1- U~ ~ _d ~ O O ~ O 1 · oo ~ ~ ~ ~ ~ 1_ lo ~) c~ ~ ~ ~ o ~ ~ ~ ~ ~ ~ ~ o ~ o ~ ~ ~ 1 1 L~ ~ ~ C~ ~ 1- ~ ~ C~ O ~ 00 ~ C-) 00 ~ o~ c~ ~ Lr, ~ L~ cn N 00 ~ OG 1- ~ ~ C~) 00 ~ ~ ~ ~O ·,l =~ cr, cs, ~ ~ Oi o~ ~ oo oo 1~ ~ c~ C5~ C~ ~ ~ ~ ~ ~ ~ ~ ~ C~ 0 . - · . · . . . . . . . . . . . . C; U: . a ~ ~ ~ ~ ~ ~ CS~ Ct ~ ~ ~ _ ~ C~ ~ ~ ~ ~ U~ ~ ~ ~ ~ ~ ~ CO e I , I I I I I I I I I I , I I I I I u~ 0 u~ O u, O u~ O ~ O ~ 0 u~ O _ ~ ~ C~ - + U O O C~ - a - - a' ~rl _ _ O E~ O ~n oo ~ _' Ct ~

225 Illegal Aliens: An Assessment Kenneth Hill INTRODUCTION Few population issues generate as much political heat as that of illegal immigration. Although limited legal immigration is generally accepted as a continuation of the American tradition and past immigration is associated with a halcyon period of dynamic growth, illegal immigration is almost universally viewed pejoratively, and numerous societal ills, from budget deficits and high unemployment to overcrowding and rising crime rates, are ascribed to it. Perhaps more than in any other area, however, the public debate on illegal immigration is founded on much smoke and little fire. It is argued, with some justification, that hard data on illegal aliens are difficult to obtain, so discussion has been based on soft data ranging from guesses or informed opinions to results of assumption-laden estimation procedures. The Staff Report of the Interagency Task Force on Immigration Policy (1979:382) sums up the pervasive attitude concisely: "The lack of existing data [on illegal immigrants] points up the most vexing aspect of the issue--the dilemma of either waiting for more data to become available (with no guarantee that it ever will) while the problem worsens, or acting on the available scanty data with the corollary risk of a misguided policy choice.'" The pessimistic views that the illegal alien population, being a clandestine population, is essentially unsurveyable and that reasonable data on it may never become available have affected attitudes to data collection, data use, and data evaluation since illegal aliens reemerged as an issue in the early 1970s to the recent discussion of legalization and control provisions in the Simpson-Mazzoli legislation. Results of imaginative data collection and analysis projects carried out over the last few years have not been given the credence they deserve because of the belief that the uncountable cannot be counted. This paper reviews the more important of these projects and underexploited data sets to see what conclusions can be drawn about the numbers, trends, and characteristics of illegal aliens in the United States. For our purposes, an illegal alien is defined as a noncitizen physically present in the United States who entered the country illegally and has not regularized his or her situation, or who has violated his or her terms of entry. This definition thus includes those who enter without inspection or with falsified documents, those who enter legally but overstay their visa period, those who enter legally but violate their terms of entry (for instance, by taking employment), and those who enter as permanent residents but break the law in such a way as to become Reportable. There are thus a number of ways of entering the illegal population, not all coinciding with an entry into the country. There are also a number of ways of leaving the population: death, deportation,

226 immediate "voluntary" departure after being located by the INS, genuinely voluntary emigration, and regularization of status. It is not clear either exactly what we mean by the illegal population; at a particular moment, there is some number of illegal aliens in the United States (although the status of a permanent resident alien who has committed but not been convicted of a Reportable offense is ambiguous), but the number varies from day to day, as do the actual individuals. The illegal population for a given year could be the number present on some particular date, July 1 for example, or the maximum number present at any moment during the year, or the minimum number present, or the total number of individuals present illegally at any time during the year, or an average such as the person years of illegal residence during the year. This last measure is attractive, since it seems to reflect likely impact most closely, but it will not coincide with all measurement definitions--an estimate of the illegal population covered by the 1980 census, for example, has an April 1, 1980, reference date. In comparing estimates of the illegal population, it is most important to keep in mind the definitions of the population being estimated. ESTIMATES OF NUMBERS OF ILLEGAL ALIENS The reemergence of illegal immigration as an issue of public concern since the early 1970s has been accompanied by a number of attempts to estimate the size of the illegal population, or components of it. Some of the early attempts were little more than guesses, and we will not review them here, beyond suggesting that they could be aptly characterized as coming out of the blue (figures of l million given by INS Commissioner Farrell for March 1972 and 6-7 million for September 1974 given by INS Commissioner Chapman), cumulated out of the blue (figures of 5 million for April 1975 and 6 million for November 1976 obtained by adding up estimates from INS district directors), or averaged out of the blue (a figure of 8 million for September 1975 obtained by Lesko Associates using a Delphic technique). Unfortunately, the press continues to quote similarly conjectural estimates (for example, a range of 6-12 million quoted by Thornton and Sieghart in the Washington Post, June 26, 1984) despite the existence of a number of empirical studies that, though making numerous assumptions and often giving rise to estimates within a broad range of uncertainty, are at least based on some sort of evidence. By reviewing the major studies and their critical assumptions, we aim to define the limits of the size of the illegal population, and its trend over time, that are indicated by the available data, so that public debate can be put on a sounder footing. We do not attempt to arrive at a single figure regarded as a best estimate, however, because we do not believe that the existing data and methods are strong enough to support more than ranges within which the true figure is likely to lie. A number of reviews of this sort have been prepared in the past, for example that by Siegel, Passel, and Robinson (1980), prepared as a working document for the Select Commission on Immigration and Refugee Policy. Because we do not wish to go over again the ground already adequately surveyed elsewhere, our review is limited to summarizing the main strengths and weaknesses of the approaches covered by the major

227 studies, clarifying and adding as necessary, to examining in more detail the most recent approaches, and to examining some scraps of data that apparently have not been utilized previously. Empirical Estimates of the Size of the Illegal Alien Population First, however, it is useful to summarize the main findings. Table B-6 shows the major empirical estimates of the size of the illegal population, or components thereof, prepared since the early 1970s. Several points can be made about these estimates: (a) for those studies that produced estimates of upper and lower population limits, the range between the limits is typically large; (b) variations from method to method are large; (c) the estimates do not show a clear trend over time, although no estimates are available for the period since 1980; (d) only two estimates, the maxima of Lancaster and Scheuren for 1973 and Bean, King, and Passel for 1980, are consistent with an illegal population in the range of 6-12 million. However, before trying to draw any conclusions, we briefly describe the methodologies and data used to arrive at the various and varying estimates. Goldberg (1974) The population by age and sex recorded by the 1960 Mexican census was projected forward to 1970 using a 1964 life table; differences by age and sex between the 1970 census population and the 1970 projected population were interpreted as emigration to the United States between 1960 and 1970, net of return migration and deaths of migrants in the United States during the period; the total obtained was 1,866,000. Legal net migration from Mexico was then estimated from data on the growth of the Mexican-born population enumerated by the 1960 and 1970 U.S. censuses as 269,000. The balance, 1,597,000, was taken as net illegal migration from Mexico to the United States between 1960 and 1970. There is nothing wrong with the principle of this approach, except its reliance on the critical assumption that census coverage did not change between 1960 and 1970. The sensitivity of the results to this assumption can readily be demonstrated; the enumerated population in 1970 was 48.2 million, so if coverage had been 1 percent less complete in 1970 than in 1960, the 1970 population comparable to the 1960 population would have been about 48.7 million, and forward projection of the 1960 population would give an excess of 500,000 "emigrants" over the enumerated 1970 population. In practice, however, the application has two clear shortcomings. First , the 1960 population ( reference date June 8) was pro jected forward for 10 years, to June 8, 1970; the reference date of the 1970 census was January 18, so the survi~rorship ratios used were somewhat too low. Goldberg argues that this difference in reference rates reduces the expected 1970 population and thus the estimated emigration. This argument is quite wrong, since the main influence on the size of particular age groups, at least for young adults, is not mortality but the underlying population growth rate. To take an example, if the population aged 15-19 in mid-1960 is projected forward by 10 years, the

228 result is the expected population aged 25-29 in mid-1970. The survivors of this cohort at the beginning of 1970 would be somewhat more numerous, since some deaths would occur during the first half of 1970, but they would be aged 24.5-29.5, rather than 25.0-29.99 years. If the population were growing at 3 percent per year, the difference would be about 1.5 percent, who would appear as emigrants when the enumerated population aged 25-29 was subtracted. This error would inflate Goldberg's estimate of emigration above age 10 by about 500,000. The second shortcoming is the estimation of the emigration of children. Registered births between 1960 and 1970 and the enumerated population aged 0-4 in 1960 are not used in the forward projection because they are viewed as inaccurate. Instead, emigrants aged 0-4 are estimated as one-quarter of the product of the observed child-woman ratio (children 0-4 divided by women aged 15-44) and the number of estimated emigrant women aged 15-44; the factor one-quarter allows for the fact that children aged 0-4 in 1970 are alive for only about one-quarter of the period 1960-1970. Emigrants aged 5-9 are estimated as three-quarters of the product of the observed child-woman ratio (children 5-9 divided by women 20-49) and the number of emigrant women aged 20-49, and the emigrants aged 10-14 as the product of the child-woman ratio and emigrant women for children aged 10-14 and women aged 25-54. This procedure assumes that emigrant women have the same number of children as nonemigrant women and that they emigrate with their children; these assumptions are not plausible for illegal female migrants, but they do produce large numbers of emigrant children under age 15, 368,000 females or 46 percent of total female emigrants, and 383,000 males or 36 percent of total male emigrants. These estimates would of course be reduced by about half by adjusting the emigrant female population for the effects of the dating error described above, but would remain much the same proportion of total emigration. Standard forward projection of registered births and the population aged 0-4 in 1960 indicate a net inflow of 327,000 males and 176,000 females between the 1960 and 1970 censuses. Underregistration of births and underenumeration of the population under age 5 affect these results, and the development of suitable adjustment factors would require a major analysis of Mexican census and vital registration data, but at face value these data do not support an outflow of children of the magnitude suggested by Goldberg. However, the Mexican census data clearly represent a valuable source of data on the possible magnitude of flows to the United States. We have reanalyzed the 1960 and 1970 census age distributions and included the age distribution from the 1980 census, using a rather different analytical technique. Preston and Coale (1982) propose a method for estimating age-specific migration rates using age-specific growth rates and an intercensal life table. The estimating equation used here, starting at age 10, is a-S e 1 Nfa+S)l(10) + 5 ~ SrX ~° N(lO)l~a+5) ~° (1) where Sex is the emigration rate, and srX the recorded growth rate, for the age group x, x+5, N(10) and N(a+5) are the numbers having 10th and a+5th birthdays between the censuses, and 1~10) and l~a+5) are life table survivors to ages 10 and a+5 respectively. N(10) and N(a+5)

229 were calculated as the geometric means between censuses of the geometric means of the recorded populations aged a-5,a and a,a+5 at each census. The use of equation 1 has two advantages for the present applications: first, the method accommodates an intercensal interval that is not an exact multiple of five years very conveniently and, second, the analysis can start at age 10, as above, avoiding the uncertainties of enumeration completeness under age 10. Table B-7 shows the details of the calculations for males for the period 1960-1970, and Table B-8 summarizes the results in terms of emigrants by age and sex for the periods 1960-1970 and 1970-1980. For the 1960-1970 period, emigration, net of returns but not of mortality, is estimated as some 420,000 males and 300,000 females aged 10-80. Over this period, some 350,000 permanent immigrants born in Mexico aged 10 and over were admitted to the United States, suggesting an illegal inflow to the United States of at-least 370,000 over the period. Between 1970 and 1980, however, the method estimates a net outflow from Mexico of 78,000 males and a net inflow to Mexico of 381,000 females, despite the admission of some 530,000 Mexicans aged 10 and over to permanent residence status by the United States alone over the period. Clearly the method has not worked between 1970 and 1980, probably either because the 1980 enumeration was substantially more complete than that of 1970 or that of 1960 or because the results have been distorted by changes in age misreporting patterns between the two censuses. If we assume that legal emigration from Mexico was approximately evenly distributed by sex (an assumption supported by INS records) and that return migration of legal emigrants was also approximately evenly distributed by sex, the difference for the period 1970-1980 between the estimated male outflow from Mexico and the estimated female inflow to Mexico of 460,000 could be taken as an indicator of the balance of male over female illegal emigration during the period. If 85 percent of illegal emigrants were males, the total flow of illegal emigrants would be 652,000 males and 115,000 females, or 767,000 overall. However, this figure assumes no change in the sex differential of enumeration completeness and would be doubled by assuming that only 65 percent of illegal emigrants were males, so it cannot be taken in any way as a firm estimate. In summary, data from recent Mexican censuses, although at first sight a promising source of estimates of emigration to the United States, prove to be of little value for this purpose, probably because of changes in enumeration completeness. However, it can be said that the data do not support a huge flood of Mexicans into the United States in the 1970s, nor the entry of 1.6 million illegal Mexicans in the 1960s. It should be pointed out that the Mexican censuses are conducted on a de jure basis, so short-term illegal migrants would often be included in the census counts, further clouding the conclusions that can be drawn from the data. Lancaster and Scheuren (1978) This study used a matching procedure to estimate the number of U.S. residents not included in the March 1973 Current Population Survey (CPS). Three administrative data sets, of IRS tax filers, workers covered by social security (SSA) contributions, and social security beneficiaries, were used to classify CPS households by whether they

230 contained a tax filer, an SSA worker, or an SSA beneficiary. Individuals were then classified by whether they were part of a household containing a tax filer, an SSA worker, or an SSA beneficiary. Log-linear models were then used to estimate the number of people not covered by any of the three data systems, the missing cell in the 2x2x2 table. The total population was then found by adding all the cells together, and the nonlegal population obtained as the difference between this population and a presumed legal population based on the 1970 census adjusted for coverage and projected forward to March 1973 allowing for births, deaths, and legal immigration. The estimate arrived at is of 3.9 million nonlegal residents aged 18-44, with subjective 68 percent confidence intervals of 2.9 to 5.7 million. This study was the first major attempt to estimate the size of the nonlegal population empirically and remains the only major application of record matching for this purpose. Lancaster and Scheuren make no exaggerated claims for their results, describing the study as exploratory and pointing out possible shortcomings, notably failures to match correctly nonindependent probabilities of omission from the various systems and problems in calculating the 1973 legal population. To these may be added the problem posed by the high degree of overlap between the tax filer and SSA employment variables and the low degree of overlap of SSA beneficiaries with either of the other two variables. Siegel et al. (1980) provide an effective review of the application and conclude that ". . . the assumptions required for the estimate of 3.9 million illegal residents to be valid are very strong indeed. Consequently, the subjective 68-percent confidence interval cited by the authors is probably too narrow, especially for the lower limit of the interval, and could be broadened to include the statistical possibility of much less than 2.9 million illegal aliens in the United States." Given the dependence of the results on the small number of households receiving social security benefits (only 2.5 percent of all households) and on the accuracy of the legal population estimate for 1973 based on adjusted 1970 census data, Lancaster and Scheuren's estimates should not be given undue weight. Robinson (1980) This study examines trends in age-specific death rates over the period 1950-1975, for the United States as a whole and for three groupings of states, a southwestern group (Texas, Colorado, New Mexico, Arizona, and California), an eastern group (New York, New Jersey, Illinois, Michigan, and Florida), and the remainder of the country. Trends for certain population groups (under age 20, aged 45-64, age 65+, white females aged 20-44 and black and other females aged 20-44) are similar for all three groups of states of residence; however, trends for white males aged 20-44 and black and other males aged 20-44 show different trends for the five-state groups than for the 40-state group. Specifically, death rates for white males aged 20-44 increase more rapidly (1960-1970) and fall more slowly (1970-1975) for the two five-state areas than for the 40-state area; death rates for black and other males aged 20-44 rise more

231 rapidly (1960-1970) and fall more slowly (1970-1975) for the eastern five-state group than for the 40-state group, while for the southwestern group they rise more slowly (1960-1970) and fall more slowly (1970-1975) than for the 40-state group. The irregularities for the black and other death rates for males aged 20-44 are assumed to reflect real changes in mortality risks, since the increases between 1960 and 1970 are largely accounted for by violent deaths, and the declines from 1970 to 1975 are similar for all three state groups. The irregularities for death rate trends of white males aged 20-44, on the other hand, are taken as indicating an increase in the illegal population contributing to registered deaths but not to census-based population denominators, on the assumptions that deaths of illegal residents are likely to be registered, whereas such residents may not be included in census enumerations. It may be noted that the reasons for deciding that trends for black and other males aged 20-44 are real, whereas rather similar trends for white males result from illegal residents, seem rather thin. Estimates of the illegal white male population aged 20-44 are then obtained as follows. An estimate of deaths of illegal residents is found as the number of excess deaths in the two five-state groups over the number that would have occurred if death rates had changed at the same rates as in the 40-state group. This number of excess deaths is the key to the estimates of the illegal population, which are obtained from it by assuming different death rates, different levels of death registration completeness, and different levels of enumeration completeness for illegal residents. The estimates vary widely, the lowest being obtained by assuming that all deaths to illegal residents are registered, that no illegal residents appear in the population denominators, and that illegal residents experience the mortality rates of black and other males, while the highest are obtained by assuming that 90 percent of deaths to illegal residents are registered, that 50 percent of illegal residents are included in the population denominators, and that illegals experience the death rates of white males aged 20-44 in the United States from violent causes only. The lowest and highest estimates vary by a factor of about 10, though they are all based on a single number of excess deaths, for which no variability is allowed, though Robinson describes it as only a rough estimate. This estimate itself must have a large margin of possible error; the death rate may have been affected by the substantial legal immigration into the two five-state groups, the observed disproportionate effects on deaths from violent causes and deaths in metropolitan areas being not inconsistent with this explanation, or by differential underenumeration of the legal population of these states by the 1970 census, an explanation not inconsistent with the metropolitan area effect. Indeed, the fact that much of the death rate differential in the two five-state groups arises from an increased death rate in metropolitan areas argues against the presence of a substantial rural illegal population. Thus the true range of the size of the illegal population indicated by these data is wider even than that suggested by Robinson, and the analysis cannot even be taken as conclusive evidence that there was at least some increase in the illegal population not included in the death rate denominators over the period 1960-1975, although it is certainly suggestive of an increase.

232 Heer (1979) This study examines the change in the population of Mexican origin reported in the CPS between 1970 and 1975, interpreting the difference between the reported change and expected change from natural increase and legal immigration as net illegal immigration. Siegel et al. (1980) raise serious questions about the results, pointing out, first, that the sampling error of the difference in the CPS populations of Mexican origin is quite large, in fact larger than Heer's lowest estimate of net illegal inflow; second, that the results are sensitive to the choice of starting and ending points, because the CPS estimate of the population of Mexican origin fluctuated irregularly during the mid-1970s, such that the choice of 1973-1977 as the period of study would have resulted in estimates of a net outflow; and third, that the estimates depend heavily on inadequately supported assumptions about levels of CPS coverage of the population in question, particularly on the coverage of illegal residents. Using CPS data on the Mexican-born population, Siegel et al. conclude that the change from 1969 to 1975 can be entirely accounted for by natural increase and legal immigration, suggesting two possible conclusions, one that the net flow of illegal Mexicans balanced the return migration of legal immigrants, and the other that the CPS covered negligible numbers of illegal Mexicans. Heer's analysis should not be taken even as confirming a small inflow of illegal Mexicans in the early 1970s, nor as ruling out a substantial inflow during the period. Garcia y Griego (1980) In response to concerns about the political impact of migration to the United States, the Mexican government established a research program in the late 1970s aimed at measuring the flow and characteristics of Mexican migrants seeking employment in the United States. One part of this study was a series of surveys of illegal Mexican migrants returned by the INS from the United States. Migration histories provided by the surveyed returnees, in combination with INS data on locations by duration of illegal stay in the United States and information on prior U.S. residence of a sample of Mexican immigrants (Cue, 1976), are the basis for Garcia y Griego's study. The methodology is based on the fact that the illegal Mexican population of a given entry cohort at any time is equal to the cohort losses that will occur after that time. Thus if cohort losses after a given date can be estimated, the population stock at that date for the cohort is also estimated; by summing across cohorts, the total illegal stock is then obtained. If the stock can be estimated in this way for two dates, the rate of growth of the illegal population can be found. The difficulty clearly lies in estimating cohort losses in the future. Garcia y Griego defines four mutually exclusive and exhaustive ways in which cohort losses can occur: return to Mexico by the INS, voluntary return, legalization, and death. The first element, return by the INS, is estimated using INS data on locations by duration of illegal stay for the years 1972-1977, with locations at one year or over being distributed according to the distribution reported for each entry cohort by the illegal aliens

233 returned to Mexico by the INS in October-November 1977 and interviewed by the border survey. Since location may not result in loss, two alternative assumptions are used to estimate losses from locations: that all locations lead to losses, and that only 40 percent of locations lead to losses, at all durations of illegal stay. This step places heavy reliance on the accuracy of the migration histories reported by the returnees at the border survey; a comparison of the reported durations of stay ended by INS return to Mexico with INS locations by duration suggests the omission from the histories of short stays. The second element, voluntary return to Mexico, is based on losses estimated from INS locations and ratios of voluntary to enforced return by year and entry cohort as obtained from the border survey migration histories. As with enforced returns, two alternative assumptions are used, the first using ratios as observed (the low hypothesis) and the second using ratios adjusted upward by a factor of 1.72 (the high hypothesis) to allow for possible selection bias, that enforced returnees may have had a higher than average probability of being located by the INS, and may therefore underestimate true voluntary returns in their migration histories. The third element, legalization, is based on reports to a small survey conducted in 1973 of 822 Mexican males aged 18-60 at initial entry to the United States as permanent residents. Of these immigrants, 61.5 percent had lived in the United States before; the study assumes all such prior residence to have been illegal and uses the reported distribution by duration of residence applied to cohorts of legal Mexican immigrants to estimate period and cohort losses to the illegal population resulting from legalization. This step seems the most problematic of the four: since the data may not represent only periods of illegal residence, the distribution obtained will be affected by cohort effects and sampling errors, and there seems to be some possibility for overlap between voluntary return as estimated in step 2 and voluntary return followed by legal immigration as estimated in step 3. Once the first three steps have been completed, losses are projected forward from 1977. For durations of stay up to 7 years, ratios of losses for successive durations by cohort obtained for the period 1972-1977 and averaged across the years available are used to project future losses; for durations from 8 to 40 years, a model of losses was used to project future losses. Stocks by date and cohort in the absence of mortality were then obtained by cumulating losses beyond the date, and mortality was then incorporated as the fourth type of loss by applying survivorship ratios to the estimated stocks of survivors. Once the dead are added in, all four elements of loss are accounted for, and entries, stocks, and loss rates can be calculated for each year from 1972 to 1977. The above account is a very inadequate description of the extremely elaborate procedures used by Garcia y Griego, but it gives a broad idea of the bases and manipulations of data used in the estimation, if not of the detail given of the Lexis diagrams, separation factors, and justification of the hypotheses involved. Though some minor methodological changes would appear suitable--such as some smoothing of the ratios of voluntary to enforced returns across cohorts and the use of duration values other than the midpoint of the duration intervals for the highly convex to the origin distribution of INS locations by duration--the effects of such changes would be relatively minor.

234 The final estimates must be interpreted with caution, however, for the following reasons. First, the ratios of voluntary to enforced returns and distributions of enforced returns by duration of stay are based on migration histories of uncertain validity from a fairly small and possibly unrepresentative sample of 9,930 returnees. Second, the number of legalizations is based on data of uncertain relevance from a very small sample; although the number of losses from legalization is small, only 30,000 of an estimated 1.67 million losses in 1976, they are concentrated at high durations of illegal stay and thus have considerable weight in the forward projection process. Third and most important, the estimates at short durations for 1975 and 1976 depend heavily on forward projection of losses based mainly on histories for 1972-1977. Such projection is a hazardous process, especially given the very sharp increase in INS locations at durations of a year or more from 1975 (34,491) to 1977 (89,793). INS records show lower numbers of locations at a year or more for 1978 and 1979, although the application of Garcia y Griego's location rates for 1977 to the beginning 1978 population would imply a sharply higher number of locations for 1978. Fourth, the linking of voluntary to enforced returns implies that if INS locations rates rise, so do voluntary returns, the reverse of the effect one would expect. In sublunary, the analysis is ingenious, but constructs rather large numbers from rather small ones in ways that are bound to be sensitive to data errors. The final estimates, of a stock between 482,000 and 1,224,000 at the beginning of 1977, net annual inflows ranging from 75,000 to 284,000 for 1975 and 1976, and a growth rate in 1976 of around 27 percent, cannot be regarded as solid limits on the size of the illegal Mexican population, although the stock estimates for the beginning of 1972, from 234,000 to 436,000, may be more robust since they are based more on data and less on projection than the stock estimates for later in the decade. CENIET (1982) Another part of the Mexican research program that collected the data on returnees used in the analysis by Garcia y Griego just described was a major national household survey conduc ted in December 1978 and January 1979 covering 62, 500 households and focused on labor migration to the United States. This survey was carried out on a de jure basis, comparable to the Mexican censuses, and incorporated special questionnaires for persons with migration experience in the United States. Two migrant populations were identified: (1) those age 15 and over normally resident in the household but at the time of the survey working or looking for work in the United States and with a family member in the household to provide information about them and (2) those age 15 and over actually present in the household who had spent at least one day working or looking for work ire the United States during the preceding five years. When inflated to the national level, the former population, of normal residents actually in the United States working or seeking work, was estimated as 519,000, and the latter population, of actual residents who had worked or looked for work in the United States in the preceding five years, was estimated as 471,000. In addition to numbers,

235 the survey report provides a considerable amount of information on the characteristics of these two populations. The problem with the results of this survey is how to interpret them in terms of illegal migration to the United States from Mexico. The report argues that both populations represent illegal migrants, and the age and sex compositions of the populations support this contention. However, the absent resident population is likely to include some recent legal migrants and to exclude both those illegal migrants who have been absent for a long period and those illegal migrants who left no family members behind to report on them or who were under age 15 or not working or looking for work. Thus the 519,000 can be taken, plus or minus some allowance for sampling error, as a minimum estimate of the illegal Mexican population in the United States at the end of 1978. The returned migrant population will include some returned legal migrants, but since about half these returnees had apparently worked or looked for work in the United States during 1978, and only 15 percent reported having entered the United States legally, a substantial proportion of them were probably illegal migrants who had entered the United States for seasonal summer work but had returned to Mexico for the winter. Interpretation would be easier if the results had been published more fully. Particularly useful would be tabulations of both populations by age group and sex, by month and year of departure for the absent population, and by month and year of both most recent departure and return for the returned migrants. With the results given in the report and the hints dropped in the text, it is really not possible to do more than speculate about the relationship between the CENIET-defined populations and the population of interest to us, of Mexicans illegally in the United States. The figures could be taken to indicate a net annual flow of around 250,000, if the 500,000 or so absent residents had left in 1978 and half the returnees had returned in 1978. Alternatively, the figures could be taken to indicate a stock of 500,000 (those actually in the United States), or 600,000 illegal person years lived, if the 500,000 spent all 1978 in the United States and half the returnees had spent half a year each in the United States in 1978; to these stock estimates an unknown amount should be added for migrants not covered by the survey, namely those under 15, or not employed or seeking employment, or no longer regarded as normal Mexican residents, or with no household member still in Mexico to report on them. It can thus be regarded as unlikely that the illegal Mexican population in the United States at the beginning of 1979 was less than half a million, but upper limits of a million or more would not be inconsistent with the survey results. It might be noted for future reference that a household survey of this type would be an ideal vehicle for applying multiplicity-type questions concerning residence of children or siblings to estimate emigration (see for example IUSSP, 1981~. Warren and Passel (1983) This study used data from the 1980 census and the INS to estimate the number of nonlegal aliens included in the 1980 census by age group, sex, period of entry, and country of birth. The basis of their methodology - to compare the recorded 1980 census population with a constructed legal

236 population based on the INS Alien Address Registration system combined with data on new i~r=~grants and naturalizations. Extensive adjustments were required to the data on both sides of this comparison. The number of people in the census reporting U.S. citizenship through naturalization by period of entry was higher than recorded INS naturalization figures, so the noncitizen foreign-born population was adjusted upward from close to 7 million to about 8 million on the basis of the reported INS data on naturalizations, by period. A further, though smaller, adjustment was incorporated for incorrectly reporting (or overstating) the United States as the place of birth. The legally present population was obtained by adjusting the 1980 alien registration data (reported on form I-53) for undercoverage averaging 11 percent overall, though the undercoverage was estimated separately for 40 countries or groups of countries of birth. The procedure used to estimate I-53 coverage in 1980 is complex, but the principles may be summarized as follows. Annual change in the resident alien population can be represented as P2 +U2=Pi +U~ +I-N-D-E (1) where PI and P2 are INS registration figures for permanent residents for two successive years; U1 and U2 represent underreporting for the respective years; I is the number of aliens admitted for permanent residence during the year; N is the number of aliens becoming naturalized citizens during the year; D is the estimated number of deaths;* and E represents emigration of resident aliens. No information is available for E, U1 or U2, but the net effect of all three can be approximated as follows: E+(U2 -UP -P2 +I-N-D (2) Application of equation 2 produces a series of figures for the net effects of emigration and change in coverage for each year. The next step was to derive an annual series of emigration estimates. First an estimate of total emigration for the 1965-1976 period was made by assuming equivalent proportional coverage in 1965 and 1977. The estimated total emigration for 1965-1976 was then allocated by year, two-thirds on the basis of annual i~ranigrants and one-third on the basis of registering aliens. These ratios were also used to estimate emigration for 1977-1979. The estimates of emigration for each year were then subtracted from the annual series of combined emigration and change of coverage to obtain annual estimates of change of coverage. By cumulating the estimates of U2-U1 from 1965 onward, it is possible to identify the year with the lowest absolute "omission." Registration for this "best" year was adjusted upward by 2 percent and used along with the annual components of alien population change thereafter to calculate the adjusted legal resident population for 1980. Warren and Passel conclude that some 2.1 million illegal aliens were recorded in the 1980 census, an estimate that is quite sensitive to the * The annual numbers of deaths were estimated by applying crude death rates calculated from 1970 census data for aliens by age and sex and U.S. mortality rates for each country of origin to the registered, rather than true, alien populations.

237 numerous and sometimes substantial adjustments made to the basic data. In support of the estimate, the characteristics of the illegal population are plausible in terms of origin (over half born in Mexico), age and sex distribution (mainly young adult males), period of entry (largely 1975-1979) and state of residence (predominantly California, Texas, and Illinois for Mexicans). An analysis of November 1979 CPS data by Warren (1982) also provides support for this approach. In that study two separate estimates of the legally resident population were derived for comparison with CPS data. One estimate, based on I-53 data adjusted as described above, was compared with the CPS alien population (adjusted for overreporting of naturalizations). The other estimate, based on INS immigration figures and an allowance for emigration and mortality, was compared with the CPS foreign-born population entering during the 1970-1979 period. The two comparisons produced comparable estimates not only for the total for 1970-1979 (1.2 and 1.1 million, respectively) but also for the age-sex distribution of the estimated illegal population included in the CPS. Uncertainties remain, however. First, the adjustments for false claims to naturalized citizenship and misreporting of country of birth more than double the number of illegals from under 1 million to over 2 million, and thus need powerful justification. The adjustment for naturalizations was based on comparisons of the number of naturalizations by country of birth and year of immigration from INS data, with some small allowance for emigration and mortality, and numbers of naturalized citizens by country of birth and period of entry from the census. If the latter exceeded the former, the balance was taken as representing false claimants to naturalized citizenship. INS data on naturalizations are regarded as very reliable, primarily because the administrative procedure is clearly defined, but the INS classification by year of immigration may not necessarily agree with the census classification by period of entry, which might be interpreted as most recent entry, or might precede year of immigration in the INS sense of legal status. Such an inconsistency between the two sources would affect the estimated distribution of illegals by period of entry. The total number of illegals would be directly affected by incorrect allowance for mortality and emigration of naturalized citizens and by any omission of naturalizations from INS data. Second, the procedure used to estimate I-53 coverage is only approximate. Its most important area of uncertainty, from the point of view of the subsequent use of its results, is that the completeness estimates are relative to an arbitrarily assumed level of 98 percent completeness of registration in the "best" year. Using a figure of 90 percent, or 110 percent, neither of which can be ruled out from the data, would have a substantial effect on the final estimates of illegals of plus or minus half a million or so. Some other assumptions in the procedure are given little justification, although they probably have little effect on the final estimates. In summary, the estimates of coverage of the I-53 system have a substantial margin of error (the first paper in this appendix shows that, at least for the Philippines and possibly also for Mexico, overregistration of the legal alien population by the I-53 is consistent with the annual fluctuations in registration, immigration, and naturalization). An additional source of uncertainty in the Warren-Passel study arises from the fact that the adjusted I-53 data estimate the actual number of

238 aliens residing legally in the United States in April 1980. However, it is not the "true" legally resident population that is needed for comparison with the census data, but the legally resident population actually included in the census. Any undercount of legal residents by the 1980 census would reduce the number of illegal aliens estimated as included in the census by the same amount, whereas any overcount would inflate the estimate of illegals. Warren and Passel's analysis clearly establishes that a substantial number of illegal aliens were included in the 1980 census population but does not establish the precise number because of the nature of the adjustments made to the underlying data. However, the range of uncertainty of the estimate is narrower than that of the estimates produced by other methods reviewed here, and the analysis represents a major advance in measuring the size of the illegal population. Two final reflections are that the method should also be applied to the 1970 census, to get an idea of both the stability of the methodology and of the rate of growth of the illegal population over the 1970-1980 decade, and that a composite procedure might be used to estimate the coverage of the I-53 data, using Hill's approach (described in the first paper of this appendix), applied perhaps to the period 1964-1976, to estimate absolute coverage by country of origin, and then applying the Warren-Passel approach to carry this estimate forward to 1980. Bean, King, and Passel (1983) If emigration is age- and sex-selective, it will affect the sex ratios by age of the population of origin. This study analyzes the sex ratios of the Mexican population as reported by the provisional results of the 1980 census for the age range 15-39 to derive estimates of net Mexican emigration. The advantage of using sex ratios from one census over using intercensal residual methods is that sex ratios are not affected by overall census omission, although they are affected by sex differentials in enumeration completeness and by sex differentials in age misreporting. Bean et al. assume that immigration and emigration between Mexico and countries other than the United States are either negligible or not sex-selective, and that a deficit of males aged 15-39 in the 1980 census is a function of differential undercount of such males in the census and differential emigration by sex to the United States. Surviving emigrants to the United States aged 15-39, Els_3g , can then be estimated as PM Mats 39 FIS 39 M~s 39 try El s-39 cm Of cm PM - PM e c where PMC is the proportion male of the population aged 15-39 in the absence of migration, Mls_3g and Fls_3g are the number of males and females respectively enumerated by the census, cm and of are the male and female enumeration completenesses in 1980, and PMe is the proportion of males among the surviving emigrants aged 15-39. For the unknowns, PMC was estimated using assumed sex ratios at birth ranging

239 from 103 to 105 males per 100 females, life table nLa values by sex, and the enumerated female populations by age group as weights; PMe was assumed to be 0.60 or 0.65; Cf was assumed to be 0.97; and cm was assumed to range from 0.91 to O.9S. Total emigrants to the United States were then obtained from the estimates of Els_3g by assuming that either 60 or 65 percent of all emigrants were aged 15-39. All possible combinations of assumptions produced 60 estimates of total Mexican emigrants to the United States. Each estimate was then evaluated by seeing whether it satisfied three additional constraints, namely that the proportion male among illegal Mexicans aged 15-39 not included in the 1979 CPS should fall between 0.65 and 0.85, that the proportion aged 15-39 among illegal Mexicans not included in the 1979 CPS should not fall below 0.65 nor exceed 1.0, and that the total number of Mexican emigrants should not be lower than the total legal and illegal number included in the 1979 CPS. These constraints eliminate 42 of the 60 estimates, no less than 30 by the last constraint alone. The estimates of illegal emigrants are then obtained from the remaining estimates of total emigrants by subtracting legal emigrants, taken from INS I-53 data for 1980, adjusted for 9.2 percent underreporting, as 1.208 million. The highest estimate is 3.8 million, for a sex ratio at birth of 104 males per 100 females, a proportion male of all emigrants aged 15-39 of 0.6, and a proportion of all emigrants aged 15-39 of 0.65. The lowest estimate is zero, since many combinations of parameters fail to account even for legal immigration from Mexico. To assess this range of possible estimates, we need to assess the parameter values assumed since the basic equation is clearly correct. The sex ratio at birth in Mexico is likely to be close to 105 males per 100 females, lower observed ratios probably arising from sex-selective underregistration or delayed registration of births; combined with a 1969-1971 Mexico life table developed by the United Nations (1982), this sex ratio at birth gives a value of PMC of 0.5047, within the range of values used in the paper. The estimate of total emigration at ages 15-39 is highly sensitive to the value assumed however, the difference between 0.505 and 0.504 being over 10 percent in the estimate of emigration. The assumed values of male and female undercount, cm and Cf. and most critically the difference between them, are impossible to assess, the values in the paper being justified in terms of estimates of completeness of coverage of the Hispanic population by the 1970 U.S. census, a basis of doubtful validity and uncertain accuracy. The estimates of emigration are highly sensitive not to the absolute values assumed but to the sex differential; assuming female coverage of 97 percent, the use of a male coverage of 94 percent instead of 93 percent increases the estimated emigration by over 30 percent. The proportion male among all emigrants aged 15-39, PMe, is assumed to be 0.60 or 0.65, a range based on sex ratios among legal immigrants of 50.3 percent and an estimate derived by Warren (1982) of 55.2 percent among illegal Mexican immigrants included in the 1979 CPS. The estimates are again highly sensitive to the values used, the use of a value of 0.70 instead of 0.65 reducing the estimate of total emigration by 25 percent. The proportion male of counted illegals seems surprisingly low, given that most studies indicate a proportion around 0.80 or 0.85 (e.g. Reichert and Massey, 1979; CENIET, 1982), and the higher the proportion illegals make up of total emigrants, the higher would be the overall

240 proportion male; the range assumed in the paper may thus be too low, at least for higher proportions illegal. The final parameter introduced is the proportion of all emigrants who are aged 15-39; a range of 0.60 to 0.65 is used again, also based on INS statistics on legal immigrants (0.502) and Warren's (1982) estimate for illegals counted in the 1979 CPS (0.70~. Estimates of total emigration are somewhat less sensitive to this assumption, but it might have been preferable to avoid it altogether by concentrating solely on the age range 15-39; it may also be noted that the higher the proportion illegal among total emigrants, the higher this parameter would be, thus reducing the upper limit on illegal emigration somewhat. The same procedures applied to the provisional results of the 1980 Mexican census can also be applied to the 1960, 1970, and final 1980 census age distributions. Using common parameter values for all three (PMC = 0.5061, Cm = 0.93, of = 0.97, PMe = 0.65), total emigrants aged 15-39 are estimated as 0.838 million for 1960, 0.970 million for 1970, and 1.486 million for 1980 (the estimate for 1980 using the provisional census results was 1.743 million). Given the magnitude of legal immigration from Mexico from 1960 to 1970, the estimated increase from 1960 to 1970 looks low, whereas the increase from 1970 to 1980 is some 190,000 higher than legal immigration from Mexico at ages 15-39 over the period. The increases, though possibly affected by changes in relative underenumeration by sex from census to census, are certainly not consistent with a flood of Mexican illegal immigrants to the United States in the 1970s and do not seem to be consistent with the higher estimates of illegals given by Bean et al. In summary, this method is particularly sensitive to its assumptions, and even the ranges used do not seem to encompass all plausible values, particularly in those cases leading to the highest estimates of numbers of illegal Mexican immigrants in the United States. The results certainly suggest that estimates of illegal Mexicans living in the United States in mid-1980 in excess of 4 million would be hard to reconcile with Mexican census data, though even this conclusion is weakened by the de jure nature of the 1980 census. Intercensal changes suggest an illegal flow between 1970 and 1980 that is unlikely to have greatly exceeded half a million. Once again, the sensitivity of the estimation procedure to uncertain assumptions about the underlying parameters and the nature of the data used indicate that the results must be viewed with great caution; they do, however, provide some additional empirical support for dismissing the more irresponsible and excessive guesses of the growth and size of the illegal Mexican population in the United States. Further Indications of the Size and Growth of the Illegal Population Apart from the procedures of Garcia y Griego described above, little analytic use has been made of INS data on locations of deportable aliens to draw inferences about the size and growth of the illegal immigrant population of the United States. Nonanalytic uses of total locations are quite common, however; for example, in a recent article in the Washington Post (10 September 1984) Representative Daniel Lungren noted that apprehensions by the Border Patrol in the last 2 years were nearly 2 million, and that this figure implied that at least 4 million illegal

241 immigrants had successfully entered the United States over this period. The basis for this assumed 2:1 ratio is not given, and clearly such simple rule of thumb adjustment factors need to be examined carefully before it is concluded that high numbers of locations really indicate yet higher numbers of successful entries. The relationships between locations, successful entries, and the size of the illegal population are unlikely to be simple, since they will depend on the number of attempted entries, location rates at entry and thereafter, and voluntary return rates. There are a number of possible explanations why INS data on locations have received little analytical attention. First, there is a definitional problem of the population covered; the same person may make several attempts to enter the United States illegally and may therefore show up several times in a year's locations data. Second, INS data on locations are not tabulated or published in an optimal way for analytic purposes. The source of the data is the INS form I-213, Report of Deportable Alien, completed for all deportable aliens located, even at or immediately after entry, but these forms are not processed by machine; instead, such data as are available are hand extracted from the I-213 and summarized on the G-23 report (pages 18 and 19) and in some internal INS intelligence reports. The detail available from this hand extraction on the G-23 form is limited to 20 countries or country groups of nationality, classes of status at entry, grouped duration of illegal stay in the United States (the groups being at entry, within 72 hours, 4-30 days, 1-6 months, 7 months to 1 year, and over 1 year), and status when found. Limited additional detail is available from a study of a sample of I-213 forms for 1978 covering locations of deportable aliens in the United States illegally for at least 4 days (Davidson, 1981), giving the distribution of the aliens located for duration of illegal stay categories 4-30 days, 1~6 months, 7 months to 1 year, 1-2 years, 3-4 years, 5-6 years, and 7 or more years. The analytic value of the data would be greatly enhanced by regular tabulation by sex, broad age group, major nationality groups, and single-year durations of illegal stay, but it should be possible to draw some general inferences about the illegal population given just the limited detail currently available. Third, locations of deportable aliens will be affected by INS enforcement practices and allocations of manpower. For instance, a concentration of resources on the Mexican border rather than on internal enforcement would produce a high concentration of locations at short durations, and a change in staffing levels would affect all location rates. These considerations provide one reason for analyzing locations data by country of origin, since more homogeneous groups would tend to be less affected by changes. Table B-9 shows the number of deportable aliens located by duration of illegal residence for fiscal years 1977-1984 (data for fiscal 1984 are partially estimated, based on locations in the first 9 months of the year rated up to full year estimates using 9 month/1 year ratios from 1983~; also shown is the percentage change for each duration category from one year to the next. Total locations remain remarkably consistent from year to year during the period, except for a drop of 15 percent in 1980 and a jump of 30 percent in 1983; locations in 1984 were 25 percent higher than in 1977. This aggregate consistency masks much more marked fluctuations

242 by duration category, however. The drop in 1980 appears for all duration groups and may reflect shifts in resources to deal with the Mariel boatlift. Locations in 1981 and 1982 at durations of 30 days or less remain around their 1980 level, whereas locations for longer durations recover to their pre-1980 levels. The sharp increase in locations in 1983 is entirely accounted for by much higher locations at durations of 30 days or less, the numbers located at longer durations actually declining somewhat. Overall, locations at durations over 30 days fluctuate substantially but show no clear trend, whereas locations at durations of 30 days or less show no very marked trend until jumping substantially to high levels in 1983 and 1984. How can we interpret these location numbers in teems of flows and stocks? One extreme assumption would be that locations are unrelated to the underlying flows and stocks, for example if INS enforcement ac tivities were so overwhelmed by the volume of illegal migrants that location numbers represent physical ceilings of locations per officer-day. If this were the case, no interpretation would be possible since the location numbers would be determined by the level of enforcement activity alone. However, this assumption would fail to account for the drop in locations in the early 1980s and for the sharp rise in 1983; it is also hard to believe that the 90,000 or so illegal aliens located at durations of more than a year really represent the maximum number that the INS can possibly locate, and the decline in 1984 would be hard to explain. An alternative assumption is that the location rates are essentially constant, in which case the numbers of locations directly reflect flows (locations at durations of 30 days or less, say) and stocks (the cases of longer duration). Under this assumption, the long-term illegal population has not changed substantially over the last eight years, whereas the number of attempted entries rose gradually in the late 1970s, fell some 10 percent in the early 1980s, and then increased sharply to high levels in 1983 and 1984. This assumption is not consistent with the data, however, since the sharply higher inflow in 1983 should, given constant location rates, have resulted in higher long-tenm locations in 1984, whereas these locations in fact fell. Other possible assumptions include that location rates have been rising, a possibility at least at short durations given the high-tech equipment introduced by the Border Patrol, in which case flows and stocks of illegals may have been constant or declining, or that rates have been falling, in which case flows and stocks may have been rising. However, the most tempting conclusions are that pressure of attempted entries has risen in 1983 and 1984 but the long-tenm illegal population has not increased much since the late 1970s; accepting the former conclusion further implies that the Border Patrol does a much better job of locating illegal aliens shortly after entry than it is generally given credit for; the latter conclusion has to be accepted unless it is believed that location rates at longer durations of illegal residence have declined at least as fast as the population has increased since the late 1970s, a belief hard to sustain since chance locations, for instance through police referrals, would be expected to reflect the underlying population size even if investigations activities did not. Some further use of the locations data can be made by considering the person years lived illegally in the United States by located Reportable

243 aliens. The person years lived approach provides a measure of impact on U.S. society and also avoids the problem of multiple entries, since an individual making several entry attempts but being located shortly after each one will contribute only modestly to person years lived, though substantially to numbers of locations. Each duration category in Table B-9 has been assigned an average duration of illegal residence (zero for the at-entry category, oO02 of a year for the less than 72 hours category, .035 of a year for 4-30 days, 0.26 for 1-6 months, 0.75 for 7-12 months, and 3.70 years, derived from Davidson's data, for over 1 year). The locations in each duration group and fiscal year of location can then be converted into person years of illegal residence, and these person years can be allocated to fiscal years of residence. Thus for example the 749556 aliens located at durations of over a year in 1984 are assumed to have contributed half a year each to 1984, one year each for 1983, 1982, and 1981, and 0.2 of a year to 1980. The results are shown in Table B-10 by fiscal year of residence. Totals can only be obtained for years up to 1980, since the 1981 and subsequent totals will be affected by locations at durations over one year in 1985 and later. However, the totals obtained show little trend to 1980 (and the 1981 total is very unlikely to be substantially different), averaging around 360,000 per year. These totals represent the person years lived by, or average population of, Reportable aliens who are subsequently located by the INS. They thus represent estimates of the lower limits for the average size of the deportable alien population each year, though clearly they do not define upper limits and are quite sensitive to the assumed average length of the open interval: increasing this average to 5 years would add over 100,000 to each est imate, while reducing it to 2.5 years would reduce each estimate by a similar amount. CONCLUSIONS As a result of this review of empirical est imates of the size of the illegal population of the United States, what can we conclude? First, the procedures that have been used, though often imaginative and sometimes elaborate, all invoke numerous assumptions that often cannot be adequately justified and to which the estimates obtained are sensitive. Second, even the commonly quoted range of 3-6 million illegals may be too high, though none of the procedures reviewed produces compelling upper or lower limits. The study by Warren and Passel suggests that it is unlikely that less than 1. 5 million or more than 2.5 million illegal aliens were included in the 1980 census; locations data suggest that a figure under half a million is unlikely; Mexican census data fail to confirm the permanent absence in 1980 of more than half a million Mexicans who might be illegally resident in the United States, and the figure could be substantially lower. Though no range can be soundly defended, a population of 1.5 to 3.5 million illegal aliens in 1980 appears reasonably consistent with most of the studies. Third, there is no empirical basis at present for the widespread belief that the illegal alien population has increased sharply in the late 1970s and early 1980s; the only data available on recent trends, INS records of locations of deportable aliens, in fact suggest that the population has increased little if at all since 1977, although entry attempts may have increased,

244 possibly for no other reason than that the efficiency of the Border Patrol has increased, causing more entries to fail early and thus to be repeated. The size and growth of the illegal alien population may not be problems of the magnitude sometimes suggested, although any substantial number of illegal residents may cause social and economic problems, particularly at the local level; these wider issues are not considered in this discussion, which is limited to the size of the population only. REFERENCES Bean, F.D., King, A.G., and Passel, J.S. 1983 The number of illegal migrants of Mexican origin in the United States: Sex ratio-based estimates for 1980. Demography 20(1):99-110. CENIET 1981 Infonme Final: Los Trabajadores Mexicanos en los Estados Unidos (Encuesta Nacional de Emigracion a la Prontera Norte del Pals y a los Estados Unidos--ENEFNEU--~. Secretaria del Trabajo y Prevision Social. Centro Nacional de Infonmacion y Estadisticas del Trabajo. Mexico City. Cue, R.A. 1976 Men from an Underdeveloped Society: The Socioeconomic and Spatial Origins and Initial Destination of Documented Mexican Immigrants. Unpublished Doctoral Dissertation. University of Texas at Austin, Austin, Texas. Davidson, C.A. 1981 Characteristics of Deportable Aliens Located in the Interior of the United States. Paper presented at the annual meetings of the Population Association of America, Washington, D.C. Garcia y Griego, M. 1980 E1 Volumen de la Migracion de Mexicanos no Documentados a los Estados Unidos (Nuevas Hipotesis). Secretaria del Trabajo y Prevision Social. Centro Nacional de Informacion y Estadisticas del Trabajo. Mexico City. Goldberg, H. 1974 Estimates of Emigration from Mexico and Illegal Entry into the United States, 1960-1970, by the Residual Method. Unpublished graduate research paper. Center for Population Research. Georgetown University, Washington, D.C. Heer, D.M. 1979 What is the annual net flow of undocumented Mexican immigrants to the United States? Demography 16~3~:417-423. Interagency Task Force on Immigration Policy 1979 Staff Report. Departments of Justice, Labor and State. Washington, D.C. IUSSP 1981 Indirect Procedures for Estimating Emigration. IUSSP Papers, No. 18. IUSSP, Liege, Belgium.

245 Lancaster, C., and Scheuren, F.J. 1978 Counting the Uncountable Illegals: Some Initial Statistical Speculations Employing Capture-Recapture Techniques. 1977 Proceedings of the Social Statistics Section. Part 1, pp. 530-535. American Statistical Association. Preston, S.H., and Coale, A.J. 1982 Age structure, growth, attrition, and accession: A new synthesis. Population Index 48~2~:217-259. Reichert, J.S., and Massey, D.S. 1979 Patterns of migration from a Mexican sending community: a comparison of legal and illegal migrants. International Migration Review 13:599-623. Robinson, J.G. 1980 Estimating the approximate size of the illegal alien population in the United States by the comparative trend analysis of age-specific death rates. Demography 17~2~:159-176. Siegel, J.S., Passel, J.S., and Robinson, J.G. 1980 Preliminary Review of Existing Studies of the Number of Illegal Residents in the United States. Mimeo. Bureau of the Census, U.S. Department of Commerce, Washington, D.C. Warren, R. 1982 Estimation of the Size of the Illegal Population in the United States. Paper presented at the 1982 annual meetings of the Population Association of America, San Diego. Warren, R., and Passel, J.S. 1983 Estimates of Illegal Aliens from Mexico Counted in the 1980 United States Census. Paper presented at the annual meeting of the Population Association of America, Pittsburgh.

246 u o o em Cal o ¢ Cal a' o N can o In U ~r1 U ~ P Ct ~ v a' 1 rQ so ~ a) At: ~ ED u: E A` U) a' to 3 =0 ~ I: Cal .,' E o ,1 PA U) ~ . - o E _, ._' Ct so & 3 O C O C: O . - Ct O FIG O O So o A: cat ~0· · at Go ~Ct Cal ~ a~ ~c~. . oo · ~ ~ O ~. .. ~1 - . Ooo ~ ~aca. ._ ~ct =: · ··ct cD · · · cD · 1 'c . ~3 ~o o · ~o ~_ C) C,} ~ ~ ~_ ~ ~.- ~ ·~ 4) O ~3 · ~ ~x x U]1 _ ·~1 00 ~_ tn ~a,~ (D ct . oo _ ~1~ ~ ~ ='' - oo ~_ ~ c ~: a~ cC O ~ ~ ~ ~ o ~ ~_ ~o ~ ~ ~ ~ ~ E 3 ~ 3 · ~:t tc cn ~ ~ ~o o =: bO a) J: ~ 0 u ~. ~ ~ ~.- ~ ~ ~ ~_ ,. ~ a) oc ~ 0 C cn ~ ~ ~ ~ . ,,, c~ p.,.- u ~ ~ £ o ~4 ~ ~ ~ 0 ~· - - ~ ~ ~ ~ ~ ~ ~3 3 O ~ . - o~ . - ~ ~ ~ ~ ~D ~ ~ ~ . - . - . - ~ O ~ · ~ ~ _ O O ~ _ ~ c~ ~ cc ct ct · . ce o 3 ~0 ~ ~0 - - 3 U, ~ ~ ~ ~ 1 ". ~ ~ ~ ~ ~ ~ 3 ~ ~. o ~ o ~ ~ ~ 0 ~ ~ ~ ~ ~ ~ 0 ~ 0 ~ ~ ~ 0 0 ~_ ~ u ct s~ ct C~ )~ ~ c~ oc ~ ~ t~ tt 0 ~ ~ ~· ~ ~3 c: ~ ~o a,~ ~ ~ ~ cC ~ ,~ ~ ~ c~ _ c. ~ 0c ~,- ~ ~ O ~ · O O . - c~ . - ~ ~ ~ JJ ~ a) ~ ~ 0 c)~ - c~ ~ ca . - ~ . ~ ~ c' ~ cn ~ 0 X o ~ ~ ~ ~ ~ ~ ~ ~ ~ o X o X 3 X ~ ~ - o X ~ ~ ~ E ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ._ ~ ~ ~ ~ ~= Z :~ cn ~ ~ 3 tt ~ 3 cn <: o ~ u: :~: cn :~ ~ ~ U~ ~ ~ ~ u~ £ ~ ~ ~ ~ Z Z o o ~C~ °~- O 0o cr cc c~ ~ ~C ~- . O ~· ~ ca tu · ~· 0 0 U~ ~O O cr ~ =\ a~ c~ _ _ _ _ _ 1 1 1 1 1 1 O ~0 0 0 U ~r ~0 0 0 0 0 ~D ~ ~00 ~00 ~c ~ax ~ ~cr~ _ _ _ ~ _ ~_ ~_ _ _ _ _ C~ ^ ^ O O ~ ~ C ~ ~ O O ~ C =) ~ :, 1-~ _ 00 00 ~a c a ~· - ·~1 ~ ~q U~ U ~Ct ~ ~_ ~S" 0 _ ~- _~ _ _ a ~ oo u ~o~ u S ~'_ _ ~ _ ~ ~. - . - ~0 3 C ~ ~ :^ O ~ O _ td tV · - 0,) ~ ~: s ~ ~ 0 0 _ a ~n ~ u ~ ~ cn =: ~n U] _ ~ ~ c~ ~ E ~ ~ ~- -- . - - ~ - ~ ~ ~ ~ ~ C.) U) · - ·- ~ C)- C) _ ~P, ~ ~_ _ C ~ ~ ~ S~ Z S" C ~_ t_ _ 0 c~ O O a) ~ c' ~C ~.- .~ - C) ~=: = ~C) C) C) 3 ~X :=

to ~ - a~ 1 to AL I) ·. c) o Ad 3 o so o o em e . - UP 1 Cal UP 1 UP I N0 ha So 1~0 UP O ~ + _ Ct ~ _ Cal z :Z .- So C Ct .,, + 1_ _ _ Z o . - Ct So to . - Ct C' ~ O ~Ct ~ .,, _ _ X _ _ U) Cal o v - Cal ~ + a 00 < C~ O C~ O 247 N ~ c~ ~ u~ ~ O ~ ~ ~ ~D ~ O 0 00 ~ ~ ~ ~ ~ ~ ~ U~ ~ _1 1 1 1 ~ 1 1 1 1 o 1 1- _ ~ ~ ~J ~ ~ O =: ~ C~ O U~ ~ ~ ~ O ~ ~ ~ ~ ~ 00 0 O _. O O O O O O C~ O O C~J O _1 ' O OOOCOOOOOOOOOO ~ · e e e e e e e e e e e e e . - 1 1 1 1 1 1 1 _ O C~ ~ ~ ~ C~ U, ~ O ~ ~ ~ U~ ~' O _ ~ _d ~ ~ O ~ ~ _d C~ 00 0 C~ _ '-/ C~ ~ O ~ ~ ~ O ~ ~ ~ ~ ~ ~ ~ 0 0 ~ ~ 0 ~ ~ 0 ~ _ ~ 0 00 . n, e e e e e e e e e e e e ee J~ 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ X ~ ~ ~ ~ U~ ~ ~ ~ O 00 ~ ~ U~ O ~ ~ 1-~ ~ ~ ~ ~ 00 C CS~ ~ ~ ~ O ~ ~ ~ ~ ~ C~ _ _~ t-) U~ ~) OC) ~ _4 C~ ~) ~ U~ OC · e e e e e e e e e e e U~ ~ 00 ~ I~ 00 C~ ~ C~ 1_ ~ ~ 00 ~ 1_ ~ 00 ~ ~ I~ O 00 _1 ~ ~ O O ~ C~ C~ ~ ~ C~ ~ O C~ C~ ~ U) ~ ~ O e e e e e e e e e e e e e e 1 1 1 1 1 ~/ ~ ~ _ ~/ ~ ~ ~ c~ 1 1 1 1 1 1 1 1 1 O ~ ~ C~ ~D ~ ~ ~ O ~ C~ ~ U~ ~ O ~ ~0 ~ C~ U~ ~ ~) C-) -1 C~ _/ _~ _/ _d c~ ~ ~ ~- oo _ ~ ~ oo r~ u~ ~ 1- 00 ~ U~ ~ ~ ~ _ ~ ~ ~ 00 C~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ O 0000000000000 e e e e e e e e e e e e e e O ~ CC ~ ~ ~ ~ ~ C~ ~ C C~ O ~ ~ C-) 1~ ~) ~ ~) ~ I~ U) ~ 1N cs~ a~ ~) ~ c~ ~ ~D C~ I_ O _I a~ ~ ~ ~ a~ oo ~ oo ~ ~- ~ u~ ~ c~ e e e e e e e e e e e e e e ~ ~ ~ O ~ U~ ~ C~ O O C~ ~ U~ C~ C~ 1 - 0~\ C+) 1-00 C~ ~ C-1 0~ 0 U~ 0~ ~ ~ ~ ~ ~ ~ 0N 00 ~ U~ _ ~ ~ ~ ~ ~ ~ ~ ~ u~ o o ~ ~ ~ ~ cs ~ ~ ~ ~ ~ ~ ~ _ ~ ~ ~ O ~ ~ 1- ~ ~ G I- O - ~ ~-~ C~ O CS, ~ ~ U~ ~ ~, 1- ~ . - CN ¢ E~ e~ C C ~ O ~·,' e e 3 . ~ ~ u e' P~ ct c' · - a a _' a o a~ _ O c~ 00 1_ ~ O .- c~ Z ~ ~ ~ ~ ~ ~ ~ ~ ~ 0 0N ~ ~ c~ ~ ~ ~ ~ ~ u~ ~ ~ ~ ~ ~ ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + u~ O u~ O u, O u~ O ~ O u~ O u~ O ~ O u~ _I _1 ~I C ~C-) t-) ~ ~) 1- 1- 00 00 u o e cs c~ c: a~ :^ u~ ~ e u~ u~ u' ~ O

248 TABLE B-8 Estimated Mexican Emigrants by Age Group and Sex, 1960-1970 and 1970-1980, Using Variable Growth Rate Procedure ~ in thousands) Males Females Age Group 1960-1970a1970-1980a1960-1970a1970-1980a 10-14 112196-67-143 15-19 209190-48-141 20-24 89584842 25-29 6239209179 30-34 -67-19627-163 35-39 35-4225-28 40-44 -7-122745 45-49 -3056222 50-54 115-52137-43 55-59 -41-88-38-57 60-64 28242418 65-69 -56-4-434 70-74 -16-174-15 75-79 -10-74-19-81 Total 42378308-381 aThe life tables used were from the United Nat ions model life tables (UN 1982) selected with an expectation of life at age 10 of 56.53 years (males 60-70), 58.21 years (males 70-80), 60.28 years ~ females 60-70) and 62.09 years ~ females 70-80) .

a In In is ·e En Ma girl ho a) o o ·rl ca a in c) of In ¢ a) so o Ace ~ 1 Go ~1 1 ~ rig ¢ a EON En on 1 to a o 1 a) ~ lo .~4 'n ~ 1 In En on ~ o lo: ~ In l o o ~ o-' a ~ ~ u a ¢ ~n . - 1 - 1 ~b C~ C~ o 249 . ~ ~ o ~ ~ ~ U~ ~ ~ ~ ~ ~ oo ~ a) ~ 0 ~ ~ ~ c~ ~ a' 0 ~ ~ ~ ~ I_ . ~ ~ ~ . ~ . ~ . ~ . ~ . ~ _. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + o~ + C~ ~ ~ + C~ + ~ ~ ~ + -~-O 1 U~ _ ~ _ %;t + a~ _ 0 0 ~ _ ~ ~ C~ _ C~ o tn .~l _~ 1 ~ ~ 1-~ O ~ 1-~ O ~ ~) _` ~ ~C~ C~ 0 ~ · U~ · U~ . 0 . CS' · ~ . U~ - - ^m ^ - ^ - "o "o - - ~ ~- ~ c~ ~ _e ~ ~ c~ ~ ~ ~ ~ 1 ~ c~ O c ~cs, 1 oo 1 ~ 1 ~- + ~ + C~ _ r- 1 ~_ _ _ _ _ _ cn C~ . · - U ~1 1 ~ ~ ~ ~ ~) ~ _' ~ O ~ C-) ~ L~ ~O c ~ ~ ~ cr ~ U~ ~ c~ ~ 0 0 0 c~ . ~ . oo . ~ . oo . U~ · ~ · C. 00 0 ~ ~ ~ ~ ~ ~ c~ c~ + ~n ~) ~ 1 ~ + C~ 1 ~ + ~) _ C~) 1 ~) 1 ~ _ _ _ _ _ _ Ct O 1 C~) ~ ~-~ ~ ^ ~ ^ c') ~ oO ~ O u ~ C~ a~ ~ ~ u~ oo ~ ~D ~ ~ ~ 00 ~O · ~ · O . ~ . C~ · C<; · C~ ~ ~ ~ O ~ C~ ~ U~ ~ ~ ~ ~ ~ ~ O ~ 1 IJ~ C~ t~ I + C-) + 0 1 Ln 1 ~O _ c~ + o~ 1 0-O-O- ~ _ _I ,- ~ _ _ CS~ 1 oo ~ r_ ~ 1-~ C~l '. O C~ ~ ~ ~ ~ ~D O u~ · U~ . ~ · O ~ ~ · ~ · C~ ~ ~ ~ C~ ~ ~ ~ C~ ~ ~ ~ ~l ~ 1~ ~D ~) 1 ~ 1 C`l ~) ~t + ~ + O ~ ~ ~ _ c~ _ ~o 1 ~ _ ~ _ oo + 00 _ _ _ 0~ ~ oO ~ C~ ~ ~-~ ~ ~ oo c~ ~ ~ oo u~ 1- · U) · ~ · ~ · O · 00 · a~ ^+ ^m ^ - "o ^ - ^< ^ O + C~ 1 ~ ~ O + ~ 1 ~ ~ ~ C~ _~ _ O _ ~- 1 oo-~- _ C~ + ON + c~ ~ ~ _ ~ ~ c~ _ c~ _ ~) 1 ~ ~ 1~ ~ ~) ~ ~ ~ ~J ~ ~ ~ O ^ _ ~ =~ u~ O o,- ~ c~ ~ 0 . ~ . ~ · c~ · ~ · O · ~^ 0 ~ ~ ^ c~ ~ ~ ^ c~ O ~ ~ + oo ~ ~t + ~ 1 oO ~;t O + O + ~ _ ~ 1 ~ _ oo _ ~ + ~ _ _ ~ _ ~ _ ,_ c~ oocr ~0 oo oo oo oo c~ ~ox ~o~ a' cn JJ ~ O ~u .,' O ~n O C ~=1 ~O b0~ ~ ~q O C. U ~Ct t0 C~oO cr, C) S"O P~ ~n =: · ~ O E~b0 O. - Z~m Ct S~ o ~0 C~ C~

250 TABLE B-10 Distribut ion of Person-Years Lived by Located Deportable Aliens and Fiscal Year of Residence, 1977-1984 Durat ion Pi sca 1 Ye ar Category of Loc at ion Fiscal Year of Residence 1977 1978 1979 1980 1981 1982 1983 1984 4 days 1977404 - - - ~ ~ ~ ~ 1978- 421 - - - - - - 1979- - 405 _ ~ 1980- - - 360 - - - - 1981- - - - 361 - - - 1982- - - - - 352 - - 1983~ 1984- - - - - - 1 589 4-30 days 1977 3309 1978 57 3215 - - - - - - 1979 - 56 3148 - - - - - 1980 - - 38 2141 - - - - 1981 - - - 39 2202 - - - 1982 - - - - 41 2275 - - 1983 - - - - - 49 2767 - 1984 - - - - - - 52 2920 1-6 months 1977 25099 - - - - - - - 1978 3518 23547 - - - - - - 1979 - 4232 28319 - - - - - 1980 - - 3280 21948 - - - - 1981 - - - 3447 23069 - - - 1982 - - - - 3490 23358 - - 1983 - - - - - 3408 22805 - 1984 - - - - - - 3224 21575 7-12 months 1977 17166 - 1978 8613 14355 1979 1980 1981 1982 1983 1984 - 10242 17070 8963 14938 - - 11099 18498 - - - 11214 18689 9985 16642 8858 14763 1 year 1977 63541 - - - - - - - 1978 99493 49747 ~ ~ ~ ~ ~ ~ 1979 81527 81527 40764 - - - - - 1980 63550 63550 63550 31775 - - - - 1981 14609 73047 73047 73047 36524 - - - 1982 - 18992 94960 94960 94960 47480 - - 1983 - - 18971 94853 94853 94853 47427 - 1984 - - - 14911 74556 74556 74556 37278 Total ( thousand s ) 380.9 342.9 352.5 363.5

251 The Imputation and Treatment of Missing Data Kenneth Wachter Every statistical reporting system needs well-defined, routine procedures for the treatment of missing data. That data are missing does not in itself reflect badly on administrators who collect or report it; all good data collection involves missing values--they are an ordinary fact of life. What does reflect badly on those who report statistics is to fail to recognize the importance of obtaining as complete a response as possible, to deny that there were missing values, or to pretend that the reported values are free from gaps in coverage and free from nonresponse. Nothing undermines confidence in an administrative tabulation so badly as the absence of a column showing the number of offices that failed to report or whose reports could not be included in the tabulation, the number of forms submitted incomplete, and so forth. Reporting the extent of missing values builds confidence in a report, by showing that the agency seeks a realistic view of the completeness that its compilations achieve. Of course, it is better when data are not missing, although no large reporting system comes close to 100 percent reporting. There is no statistical magic that can make up for information that is not there. There does now exist a large body of statistical know-how for minimizing the bad effects that missing data could otherwise have on reported totals and on inferences about patterns and trends. Some of these statistical methods are already routine and are in continual use by government agencies, including the Census Bureau. Others are at the stage of research development and testing. A good recent overall account can be found in the entry on "Incomplete Data" in the Encyclopedia of Statistical Sciences (Little and Rubin, 1982~. The first rule of treating missing data is that procedures must be as uniform, standardized, and well-documented as possible. It is more important to know what the numbers before one mean and how they were arrived at than to have them compiled by superior but unfathomable methods. It is better to have a run-of-the-mill but uniform reporting system than to have a system in which (unbeknownst to readers of its reports) certain district offices have pursued every last elusive case while others have misplaced whole bundles of forms. Even if star offices can be identified, they cannot be regarded as a random subset of all offices. A good approach to missing data is to target a random sample of offices for intensive follow-up of missing cases and then to present correction multipliers based on the sample follow-u~. Sampling can be an event, cost-effective way to allocate resources for improvement in data quality. Using the INS Statistical Yearbook as an example, these general considerations lead immediately to the observation that all tables in the yearbook should include a row or column enumerating cases with status unknown. If unknown or uncertain cases have been distributed among other

252 rows or columns, the formula for distributing them should be stated in the table notes. For example, Table lOA in the 1979 Statistical Yearbook on page 28 illustrates this point in having a row for "no occupation reported." In contrast, however, the row for "unknown marital status" in Table lOA contains zeroes for all years and both sexes. The zeroes suggest that ambiguous cases have been assigned by some rules that are not explained, since it is not credible that of some 2 million people not a single person failed to check a box; not a single coder accidentally punched a nonsense digit, and not a single error eluded the agents who accepted the forms. Furthermore, age must have been missing from some records, since age nonresponse is commonplace. A row showing the number of cases that lacked ages next to the median age row would be reassuring. For another example, consider Table 17C on page 54. A row showing numbers of temporary visitors whose region of last permanent residence was uncertain, or a formula showing how such cases have been allocated among regions would bolster confidence and enhance the value of this table. A further observation to be noted is that the footnotes in each table in the Statistical Yearbook should include a statement of the basic data . sources or sources from which the table is derived. For instance, Table 17C of the 1979 Statistical Yearbook is probably derived from the I-94 Nonimmigrant Arrival/Departure Forms. But that cannot be determined from the yearbook itself. Such footnotes would aid both outside readers and those in the INS who trace errors and regenerate the tables in later years. Which tables derive from the same basic data and which from different sets of data? The yearbook should also contain a statement of the total numbers of I-94 forms accounted for in central office tabulations and the number of those that were incomplete, missing matching departure records, and so forth. In that way the information that would help in the interpretation of all the tables based on those particular forms would be assembled in one place. It is very important in treating missing data to know whether the data are missing at random. For example, suppose that a border office typically fails to report at all when it is so busy with exceptional numbers of apprehensions that there is no time for statistical work. Or suppose that at border crossing points, fewer booths are staffed at peak weekends or peak hours, because of staff holidays or staff shortages. In such situations the data are not missing at random. There is a relationship between the values that would have been reported if they were not missing and the fact that those values are missing--a relationship that seriously undermines the statistics. Such situations can sometimes be prevented by astute staffing decisions, but they are bound to occur. The preface to the INS Statistical Yearbook should discuss the most salient such situations, on the basis of direct consultation with the officers in the field who know the realities firsthand. Formalized statistical techniques that compensate for or diminish the bad effects of missing data come under the general heading of imputation. Missing information can be imputed or made up on the basis of information that is not missing. The first rule of imputation is information must never be imputed to records in a way that does not allow the imputed cases to be separated from the actually reported cases at every later stage of the analysis. For instance, values should never be

253 imputed into the basic records, like I-94 fonts, unless they are coded in such a way that the imputed values will be clearly distinguished from the real values whenever totals are assembled. When data are computerized, it is generally easy to add a code to each value indicating whether it is imputed. Then totals can be run off with and without the imputed values. In this way, the effects of imputation can be observed. Of the various imputation strategies now in use, three are mentioned here: "hot deck imputation," generalized regression single or multiple imputation, and incomplete data likelihood maximization with algorithms like those of the "E-M" type. Good accounts of these methods can be found in the entry on incomplete data in the Encyclopedia of Statistical Sciences. For an account of hot deck imputation; see Ford (1983). This is the type of method in widest use among government agencies, especially in the Census Bureau. For regression-based imputation, a new variant that allows unbiased estimation of standard errors in cross-tabulations has been pioneered by Rubin and called "multiple imputation" (Rubin, 1980~. It is not restricted to sample surveys alone. A large experiment using this method to insert 1980 occupational codes into the 1970 census public-use sample is now under way at the Census Bureau. For likelihood maximization methods, more formal statistical expertise in model building is required, although the results can repay the extra effort if the data are otherwise of high quality. A good entree to this extensive statistical literature is the cautionary article by Little and Rubin (1983). The statistical virtues of these methods are not the only considerations in a decision about which to use. The simplicity of the methods and the feasibility of implementing them in practice under the difficult conditions that the INS often faces must be taken into account. When missing values on individual forms like the I-94 are at issue the simplest and most easily implemented formal imputation method is the hot deck method. When missing blocks of data in aggregate tables are at issue, for example if a computer tape or the transmissions from one district office should be garbled or misplaced, a likelihood maximization method would be efficient and appropriate. The idea of hot deck imputation is to substitute for values missing on one record the values that occur on another record whose values have high probability of agreeing with those that are not missing. As an example, consider I-94 forms that are missing entry 11, occupation. A person coding the I-94 forms, finding a missing occupation, would go back, either manually or by computer, to the last-processed I-94 form which showed, say, the same country of citizenship (entry 3) and decade of birth (entry 2~. The value from that form would then be coded as occupation for the form with occupation missing, along with a code showing that this occupation value was an imputed value rather than a true value. The final tabulations of admissions by occupation would then show separate values for occupations without imputations and occupations including imputations. It is essential that any hot deck imputation use a set of formal rules that state that if such and such entries are missing, then the donor form from whom the missing values are supplied shall be the first form that is similar in a number of prespecified variables. The selection of donor form should not be left to the judgment of the coder. It is also essential that the rules be simple, particularly if they are

254 being implemented by hand, but also, in the interests of efficiency, if they are being implemented by computer. Thus the requirements for a match to a donor should not be overly rigid, yet appropriate for the variable to be imputed. The use of likelihood maximization methods, especially those that employ E-M algorithms, demands the specification of a statistical model and therefore demands a trained statistician to formulate the model. In an example such as that of a missing tape, the need might be to estimate cells in a cross-tabulation, in which the total number of records with missing age and sex values might be known, but in which their distribution among the cells of the table might be uncertain. In such a case, a standard model for contingency tables could be used. It remains essential, however, to present the table without as well as with the entries adjusted for the missing data. The impact of the statistical adjustments needs to be visible, so that readers and administrators can assess their plausibility. Likelihood maximization methods would be recommended if fairly large quantities of data, numbering, say, into the thousands of records or 5 percent of the total sample, proved missing. For cases in which few values were missing, either in absolute amounts or relative to the size of the sample, it is generally not cost-effective to adjust. Any statistical system has to deal with missing data. With a record-generating system as large and complex as that of the INS, what is simplest and most easily implemented is undoubtedly best. It is therefore right to advocate a pragmatic approach, rather than a fancy theoretical solution, and to encourage, above all, come-on sense, uniformity, and candor. REFERENCES Ford, B.L. 1983 An overview of hot-deck procedures. Pp. 185-207 in W.G. Madow, I. Olkin, and D.B. Rubin, eds., Incomplete Data in Sample Surveys: Theories and Bibliographies. Vol. 2. New York: - Academic Press. Little, R.J., and Rubin, D. 1982 Incomplete data. In S. Kotz and N.L. Johnson, eds., Encyclopedia of Statistical Science. New York: Wiley. 1983 On jointly estimating parameters and missing data by maximizing the complete-data likelihood. American Statistician 37:218-220. Rubin, D.B. 1980 Pp. 1-9 in Bureau of the Census, Handling Nonresponse in Sample Surveys by Multiple Imputation. Washington, D.C.: U.S. Department of Commerce.

Next: Appendix C: The Settlement Process Among Mexican Migrants to the United States: New Methods and Findings »

Immigration Statistics: A Story of Neglect (1985)

Chapter: Appendix B: Some Methodological Issues in Analyzing Data on Immigration

Welcome to OpenBook!

Get Email Updates