Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Data Gaps and Ways to Fill Them INTRODUCTION In Chapter 3 we reviewed the kinds of data that are needed for legislators, program managers, and program staff to design and implement immigration policy. Chapters 4 through 7 described the data that are actually available, and the processes by which they are collected. In this chapter we compare the two in order to determine major data gaps--that is, areas in which data are needed by policy makers or by analysts examining the consequences of immigration policies but are not currently available. The treatment inevitably must be rather general. It is not possible to foresee future needs in full detail, nor to define every last piece of information that should be collected from a particular alien, because the exact nature of future policy issues cannot be predicted with precision. Such details must be left to the design stage of a data collection initiative. The planning of such an initiative should aim to incorporate the demographic, social, and economic information likely to be of general relevance to policy issues in a format sufficiently flexible to accommodate future needs as they arise. It is possible, however, to identify both major areas for which data are currently needed and general approaches by which such data can be obtained. This chapter has two major sections, the first discussing data gaps and the second discussing approaches to filling the gaps. We start, however, with a brief discussion of the costs and benefits of data improvements to set the stage for the lengthier discussion of the gaps and ways to plug them. COSTS AND BENEFITS OF DATA IMPROVEMENTS All improvements have some cost attached to them, and at a time of acute concern with government spending it is important to weigh the costs of different improvements against the value of the expected improvement in data quality or quantity. In this context, approaches can be listed in ascending order of their likely cost. All the approaches listed require, of course, that the basic data are of good quality. The first essential for any improvement of immigration statistics is thus the implementation of quality-control processes at the data generation stage; in the absence 126
127 of such quality control, the returns to implementing any of the further approaches listed, however sophisticated they may be, will be disappointing. Given this overriding need for emphasis on data quality, the least expensive way to improve immigration statistics is to improve the presentation of data already collected and available in machine-readable form; costs are limited to initial computer programming time and recurrent marginal computer execution time. The next least expensive way is to process data that are collected but not used; the costs are higher because of the inclusion of recurrent data entry. The third way is to integrate existing data sets; even if the data sets are already in machine-readable form, system planning, interagency coordination, data set preparation, and final execution all have substantial and, except for planning, recurrent costs attached to them. The fourth way is to modify existing data collection procedures; planning, testing, and processing design are the one-time costs, while data collection, preparation, and tabulation are recurrent costs. Finally, the most expensive way to improve immigration statistics is to undertake new data collection initiatives; this approach requires major additional costs, including questionnaire, sample, and data processing design, testing, and implementation. Evaluating the Benefits of Better Data The information gains from each approach must be weighed against the relative costs of putting them into effect, to facilitate selecting those that offer the best value. Unfortunately, it is much more difficult to determine the value of a data improvement, or even rank order the values of such improvements, than it is to estimate their likely costs. The best we can do is to indicate the nature of the improvement that would result from a particular strategy and to state that in our collective judgment the potential benefits of our recommendations more than justify their modest costs. The judgment of those to whom we direct our recommendations--the Congress and several executive agencies--must be based on their assessment of the benefits to them of the improved data that our strategies offer. The Costs of Data Improvements Costs here should not be interpreted narrowly as merely dollars and cents of government expenditure. Data collection exercises involve costs to those providing the data both in terms of the time spent answering questions or filling in forms and in terms of concerns regarding confidentiality of sensitive information. Public goodwill toward data collection activities will wear thin very rapidly, with adverse effects on data quality, if demands for data are perceived as excessive. Immigrants may be more tolerant than other groups of the time costs of data collection activities, although possibly more suspicious of motivation and official interference, but goodwill toward the INS has already worn thin because of the number, complexity, and repetitiveness of the forms to be filled in and because of the excessive waiting time people spend when dealing with the agency.
128 Issues of confidentiality and civil liberties are still more thorny. The INS already imposes conditions on the alien population that would be unacceptable to the public at large: permanent residents are required to carry "green cards" at all times, and the INS maintains both machine-readable and hard-copy files on aliens with very limited restrictions on accessibility. Public concern with privacy is probably the major barrier to the linkage of data files between agencies. Ultimately it does not matter whether such concerns are well founded or not (experience over the last decade or so suggests they may be): if a majority of the public regards the construction of ''super files" on individuals as an unwarranted intrusion on their civil liberties, the construction of such files will be politically unacceptable. Furthermore, if the development of such a system is opposed by the population at large, it is highly questionable whether a similar system should be imposed on the politically underprivileged population of aliens. This conclusion does not mean that no data set linkages can or should be attempted, but rather that they should be made with due regard for legitimate concerns, with adequate safeguards of privacy, and with adequate protection against use for other than statistical purposes. Data Generation The vast majority of the data available about aliens is generated when they come into contact with U.S. officials. Thus information about a permanent immigrant is obtained either at the time of applying for a visa and at first entry to the United States or when a nonimmigrant applies for adjustment to permanent resident status. Further information is obtained at subsequent contacts: in theory at every address change (although in practice such changes probably go unreported quite frequently); when applying for naturalization or other immigration benefits; through income tax returns and social security benefits or contributions; at census enumerations or survey interviews; and when registering births or deaths. The number of observations depends on the number of contacts, which may be with a wide range of government agencies, including the INS, the Internal Revenue Service, the Social Security Administration, the Bureau of the Census, and the National Center for Health Statistics. Some of these contacts will happen for all immigrants (application for status, first entry to the United States, census enumeration); some will happen for a large majority (income tax filing, social security contributions); and the remainder (address change, application for naturalization or other benefit, social security benefit, registration of births or death) will depend on events in the immigrant's life in the United States. Data Linkages If it were possible to link together the information from all these contacts, our knowledge of what happens to immigrants would be greatly expanded (but there would still be gaps and uncertainties arising from noncoverage of departures from the United States, from the less-than-universal coverage of other systems, and from the inability to
129 collect all the desirable information for each contact; the census, for example, cannot reasonably ask about the visa status of noncitizens). In practice, it is often not possible to link records across agencies, either because of confidentiality restrictions or because of a lack of suitable and accurate identifiers. Even within agencies, notably the INS, opportunities for record linkage--for example between immigrant and naturalization applications or between entries and annual address reports--have not been exploited. However, since not all unmet data needs could be met even by complete linkage, and since complete linkage is not a politically acceptable proposition, we must examine carefully what the most pressing unmet data needs are and how they can be met acceptably. The next section discusses the major unmet data needs of policy makers in the area of immigration. This discussion provides the framework for the third section, which explores what can be obtained by implementing different improvements. UNMET DATA NEEDS There are five groups of aliens that are of major importance for policy formation: permanent resident aliens, refugees, asylees, temporary workers, and illegal residents. Minor policy issues arise for some other groups (such as the Simpson-Mazzoli bill's visa waiver scheme to make it easier for visitors to enter the country), but by and large there is no dispute about either the principle (whether they should be admitted) or the magnitude (how many should be admitted) of entries of temporary visitors, employees of international organizations, crew members, treaty traders, intracompany transfers, full-time students in higher education, and the like. The information needed for the five important groups shares common elements but also differs in key respects, reflecting the different policy questions relevant to each group. . Immigrants For immigrants, the most obvious issues are how many should be admitted each year and what criteria should be used to decide which applicants to admit. Both issues involve judgments that are not immediately amenable to quantitative assessment--for instance, to answer the question of whether a higher level of immigration, though beneficial overall, would impose unacceptable costs on the poor requires not only data to estimate the possible effects but also a definition of what is unacceptable--and are also too broad for determining information needs. What are needed are enough data to evaluate current policy rather than all possible policies and to answer specific questions. For example: do new immigrants put legal residents out of work or do they create additional jobs? To answer this question, data are needed on where immigrants first settle and on their initial labor market experience (activity, type of employment, wage rate or earnings, type of employer, nature of work), and parallel data are needed for the existing resident population (both citizen and noncitizen). Many such data exist, at least with regard to participation in the formal economy, in IRS, Social Security
130 Administration, or Bureau of the Census records; however, there is insufficient detail, particularly to distinguish between new permanent residents, existing permanent immigrants, nonimmigrants, and illegal residents. Given the lack of INS data on settlement and secondary migration patterns of immigrants, the necessary data would be difficult to construct even if perfect interagency data linkage were possible. Are immigrants net contributors to, or recipients from, public revenues? Again, many relevant data exist, but it is not possible to link records for a particular individual or even group, such as all nonimmigrants. Whatever the question, the data gaps are similar--detail, individual identifiers for record linkages, visa history, history of life in the United States, and history of life before coming to the United States. Turning to admission criteria, the policy questions are rather more concrete. Since 80 percent of immigrants are admitted under family reunification preferences, one can examine the underlying rationale for such a policy by examining the results in terms of the family structure of such admissions. Do the families remain united? One could find out whether emigration rates of principal aliens whose spouses or children are admitted under the second preference are higher or lower than those of other aliens; whether emigration or secondary migration rates are higher or lower for those admitted under the family reunification preferences than for other immigrants and how they vary by preference category; whether naturalization rates are higher or lower for some preference admissions than for others. One could also determine whether immigrants admitted under the various family reunification preferences perform better or worse than other immigrant groups in terms of income, assimilation, naturalization, and the like. For immigrants admitted with occupational preferences, policy makers would probably like to know whether the immigrants so admitted actually alleviate labor market shortages, whether they continue to work in the same field after admission, how well they perform relative to native-born workers in that field and to earlier cohorts of immigrants, and what proportions become naturalized or emigrate. What are needed are data on the history of life in the United States by the preference category of entry and country of origin of the immigrant; existing sources provide very little, since it is not possible to link data across data sets and agencies for given individuals. Refugees Data needs for refugees are somewhat different, since the admissions policy is at least partly altruistic, numbers of admissions are set by the perceived world refugee pressure, and refugee admissions have an immediate cash cost in terms of resettlement assistance. Selection, however, is based only partly on need and partly on family ties or other connections in the United States. Data on the world refugee situation come largely from the U.N. High Commissioner for Refugees; although the data are of limited scope, covering mainly refugees living in camps, and of limited accuracy (as detailed in Chapter 7), they provide a broad indication of the numbers and geographical concentrations of refugees throughout the world. Improvement of that data system, though useful, is not essential for U.S. policy purposes and would require an international
131 cooperative program that the United States could stimulate but could not run. Given that the number of refugees who need resettlement is known with adequate accuracy the question of how many the United States should admit depends in part on how much they cost in terms of cash and program assistance; how quickly they become self-supporting; how well they assimilate; whether on a lifetime (and suitably discounted) basis their contributions to public revenues exceed their receipts; whether they displace domestic workers; how much impact they have on local communities in which they settle; and so on--much the same subquestions, with a few additions about cash assistance, as for permanent immigrants. Data availability is substantially higher, however, for refugees than for immigrants, with a tracking system for the 3-year period during which they are eligible for benefits and a regular though small follow-up survey by telephone. After the 3-year eligibility period, responsibility for refugees passes from the Office of Refugee Resettlement to the INS, and data availability declines drastically; it ceases to be possible to track individual performance or to distinguish refugees from other foreign-born residents. Thus, the most important unmet data needs relate to the long-term performance in the United States of those admitted as refugees and the impact on future immigration of refugees who become permanent residents or naturalized citizens and apply for family reunification benefits. Reasonable data, though lacking depth of detail, already exist for most policy and program purposes for the early stages of the resettlement process thanks to the efforts of the Office of Refugee Resettlement. Asylees The data needs for establishing policy concerning the granting of asylum share some common elements With the data needs concerning the admission of refugees, since the justification for asylum is largely altruistic, although the issue of cash benefits, to which asylees are not entitled, does not arise. Asylum is granted on the grounds of well-founded fear of persecution or discrimination in the country of origin, so information is needed to establish how well-founded such fears are in particular cases. However, asylees have a social and economic impact on the United States, so the question of how many applications to grant depends not only on the numbers meeting the formal requirements but also on their costs and benefits to society, implying data needs similar to those for immigrants. Temporary Workers Temporary workers are admitted to the United States for short, specified periods to meet temporary labor shortages or for such special purposes as musical or sporting events. The number of people thus admitted is small, about 40,000 in fiscal 1981, and their long-te`Q~ economic and social impact probably is also small. The policy issues involved are whether labor shortages really justify the admission of such workers or the workers thus admitted are taking jobs that legal residents would
132 otherwise take. This question is not as simple as it sounds: residents may not be willing to take such jobs for the minimum wages offered, but might take them at the higher wage levels that would have to be offered if temporary workers were not available, thus increasing costs and prices, but also increasing domestic employment and reducing losses from remittances abroad. There is also the question of whether the workers actually leave the country when their work is completed (or the admission period runs out) or stay on illegally. The first issue requires estimates of the wage elasticity of the supply of domestic labor and of the wages paid to the temporary workers, as well as information on the potential for substituting capital for labor; such information is best provided by micro-level studies of particular industries rather than by a national immigration statistics system. The second issue, of compliance with terms of entry, requires the sort of linkage within the INS of arrival, departure, and location of Reportable alien records that will become available when the INS long-range ADP plan is fully implemented. Thus, apart from a need for small-scale industry studies and a need for more complete coverage of departing aliens, the data needs for this group are in the process of being met. Illegal Aliens Illegal aliens are important for a number of policy reasons. First, they attract more political attention and generate more political passion than any other group of noncitizens. The presumed ill effects, both social and economic, of the presence of illegal aliens in the United States also affects public attitudes to, and debate about, broader issues of immigration and refugee policy. The policy questions related to illegal aliens are very similar to those about legal immigrants. Do they take jobs that legal residents would otherwise fill, or do they take jobs that legal residents do not want at the going wage rates? Do they hold down wage rates for menial jobs and slow productive investment? Do they take more in services than they contribute to revenue, and at which levels of government? Do heavy concentrations of illegal aliens increase crime rates, either as perpetrators or as victims? Do they overburden education and health services? Do they come to work temporarily or to settle permanently? Since it costs money to keep illegal aliens out and would cost a very large amount of money to reduce illegal immigration to a trickle, policy makers have to decide how much should be spent on the Border Patrol and other INS activities in trying to keep illegal aliens out: if a steady stream of illegal migrants is beneficial overall, then legal immigration limits could be increased and enforcement activities could be cut back. Although the data needs for illegal aliens are much the same as those for legal immigrants, virtually no large-scale data sets are available about illegal aliens, and the official collection of such data, with illegal aliens voluntarily identifying themselves as such, is impossible. Some data are collected involuntarily, for instance by the INS from located deportable aliens, but there is no information about either how representative located aliens are of all illegal aliens or what the location rate is. Some illegal aliens are included in official
133 statistics--for example, in the 1980 census results and in birth and death registration--but are not directly identifiable as such. So-called informed guesses of the number of illegal aliens in the country made in the early 1970s have given way in recent years to estimates derived from a variety of empirical bases; these estimates, reviewed in more detail in Appendix B. are all indirect and rely on numerous assumptions; in general, however, they suggest a range of between 2 and 4 million illegal immigrants in the United States around 1980. Furthermore, there is no evidence to support the view that the illegal population has grown rapidly since 1980, and INS locations data by duration of illegal stay suggest little general change. These estimates of the number of illegal aliens include their distribution by age, sex, and country of origin (though the estimates may be wrong by a factor of two), but little else is known about this 1 to 2 percent of the U.S. population. What is known comes from small-scale, often ethnographic studies carried out by nongovernment researchers, and it is of uncertain generalizability to the total illegal population. An ethnographic study of Mexican immigrants described by Massey in Appendix C illustrates the information that can be obtained from such an approach. Program Needs There are also program, as opposed to policy, needs for immigration data. The Bureau of the Census, for instance, is a major user as well as a producer of data on immigration. Current data on international migration are needed to derive postcensal population estimates that are used, among other purposes: as independent controls for the monthly Current Population Survey; for evaluating the coverage of decennial censuses; for the distribution of revenue-sharing funds; and in the computation of widely used and important ratios, ranging from birth and death rates to life insurance survival probabilities. The immigration data used to derive population estimates for the United States have serious deficiencies in addition to the lack of timeliness already mentioned. No reliable information is available on the flow of illegal immigrants to the United States or on emigration from the country. Furthermore, estimates of the migration between the United States and Puerto Rico are computed annually as the residual between the arrival and departure of millions of people to and from Puerto Rico. Finally, the estimates of international migration used by the Census Bureau to derive population estimates exclude any allowance for migration of civilian citizens who are not affiliated with the U.S. government (e.g., employees of international corporations, university personnel, students, retirees, etch. These needs are for information on the international migration of all U.S. residents, rather than just immigrants or the foreign-born. APPROACHES TO DATA IMPROVEMENTS Unmet data needs of immigration policy and program management can be seen to range from a lack of timeliness and quality of data that are produced to data that are not, and never have been, available or even collected. Approaches to improvement, ranked in cost from improved tabulation
134 through improved quality control and broadened scope to new data collection processes, have already been outlined above. We now turn to a consideration of what each of these approaches can be expected to contribute to meeting unmet needs for data. Improved Data Tabulation The cheapest and quickest way of increasing the usefulness of data is by improving the tabulation of machine-readable data sets or by preparing public-use data tapes. However, the potential for this method of improvement is limited by what exists; one cannot tabulate what is not there. The most important improvement that can be made is speed, since the more up-to-date the information, the more useful it is. The INS statistical yearbook for fiscal 1980 was issued in early 1984 and that for fiscal 1981 was issued in mid-1984; these time lags compromise the value of the data. The ADP systems now being implemented make an improvement in timeliness readily attainable. No obvious improvements in data tabulation are necessary, but some tables in the statistical yearbook could be simplified to reduce both detail and the number of empty cells by grouping countries, could have revised layouts to improve readability, and could make use of fuller, more comprehensible footnotes. The addition of a glossary to the 1981 yearbook represented a major improvement. Public-use tapes of samples of both immigrants and nonimmigrants admitted should be prepared each year as a matter of routine. The panel therefore recommends that the INS: o Maintain its efforts to bring the statistical yearbook up to date; 0 Reinstate the publication of figures on temporary entrants; o Review the content of each table; 0 Publish the statistical yearbook no later than 6 months after the end of the fiscal year; and o Prepare and release public-use samples covering both immigrants and nonimmigrants. The Bureau of the Census is to be commended for meeting United Nations recommendations for tabulations of data on the foreign-born and on households including foreign-born members from the 1980 census. However, the gain has been eroded by the excessive time lag involved; the tables were not available until mid-1984. The Bureau should ensure that comparable tables are prepared more quickly from the 1990 census. Given the data collected and the form in which it was collected, there are no clear ways to improve the tabulation program. However, the collection method could be improved by, for example, using preceded periods of entry for the foreign-born that correspond to the periods used for the 1970 census. The panel therefore recommends that the Bureau of the Census: o Ensure speedier tabulation of data on the foreign born from the 1990 census; and
135 o Ensure the maximum comparability with data from earlier censuses, particularly concerning period of entry. The Office for Refugee Resettlement collects considerable amounts of cross-sectional and longitudinal data concerning refugees, but staff time constraints have limited the amount of data published or made available for outside analysis. Substantially better use could be made of the data through more extensive tabulation or through the release of public-use tapes, to permit analysis of the data beyond the bare reporting requirements specified by Congress. Such improvements cannot be achieved given current ORR staffing levels and would thus require either some increase in staff or collaborative arrangements with outside organizations, either of which could be readily justified given the relative costs of data collection on the one hand and of on the other. The panel therefore recommends that the Office of Refugee Resettlement: data processing o Allocate the additional resources necessary to ensure the adequate dissemination of existing data in both tabular and machine-readable form. The Social Security Administration is an agency that offers some potential for improved data tabulation. It is not primarily interested in statistics--and still less in statistics about immigrants--but it collects information that could be useful for statistical studies of immigration. Systematic tabulation of data from the NUMIDENT file (new applications for social security numbers) for foreign-born people could provide revealing information about patterns of first settlement. We note that tabulations of beneficiaries receiving payments abroad have been used to study the extent of return migration of elderly immigrants. The Internal Revenue Service also processes some data of potential value for estimating flows of U.S. citizens out of, and back into, the country. Citizens living abroad can, under certain conditions, claim tax allowances for foreign residence. A minimum figure for gross outflow in a year can be obtained as the number of new claims for foreign residence allowances, weighted by number of dependents claimed, while a minimum figure for gross inflow in a year can be obtained as the number of returning residents, with no claims to foreign residence allowances when such a claim had been made the year before (again weighted by number of dependents). Though the policy value of data on inflows and outflows of citizens is low, and the estimates would be affected by changes in tax law, by filing delays, or by citizens not filing at all, the costs of producing suitable tabulations, by country of residence, would not be high, and the program value of the information would be substantial. Processing of Data Already Collected Some data collected for administrative purposes may have a statistical value that goes unrealized. Processing and tabulation of such data may be a cost-effective way of increasing data availability. A case in point
136 is the INS form I-213, record of a Reportable alien located. While very little is known about the population of Reportable or illegal aliens, a considerable amount of information of uncertain quality is collected, supposedly for administrative purposes, for each such person located by the INS. With somewhat more emphasis on data quality and with regular processing, insights into the structure, economic activity, and even size of the illegal alien population could be obtained with very little increase in workload. Indeed, workload might not be increased at all, since the regular processing of I-213 forms would eliminate the need for hand tallies of locations of Reportable aliens for summarized reporting on form G-23 (see Chapter 4~. The panel therefore recommends that the INS: o Process and tabulate data on a regular basis from at least a substantial sample of I-213 forms, and put more emphasis on the quality of the basic data collected. Improved Record Linkage Record linkages across and within agencies offer tremendous potential for improving the statistical base for studies of migration. Linkages across agencies would be most valuable. If it were possible to link INS records on immigrant admissions with decennial census data on residence, current and past occupation, income, and recent internal migration, and with Social Security Administration or IRS data on income (or covered earnings) and residence, much of what policy makers need to know about immigrants, nonimmigrants, and even illegal immigrants would become available at modest cost. Unfortunately, such linkages have never been made; the INS has never participated in such an interagency data linkage project, perhaps because of an understandable modesty about its own data sets. One stumbling block to attempts to link files across agencies is the rules concerning the confidentiality of the respective files. Each agency that collects and maintains data from or about individuals or business establishments, whether for administrative, program, or statistical purposes, strictly limits its release of information to ensure that the persons (or firms) cannot be identified. In many instances, release of individual information beyond the collecting agency is prohibited by statute (as in the case of the Census Bureau); in others, it reflects an administrative decision consistent with maintaining credibility for the program. As a general rule, adherence to confidentiality has been accomplished by deleting the name and specific address of individuals from any publicly released files, by limiting geographic detail to a sufficiently high level (such as a city with 250,000 or more people) to eliminate any possibility of individuals' being identifiable or, in some instances, deleting what might be perceived as unique information from the file (such as exact dollar amounts for people with incomes in excess of $100,000~. The confidentiality issue, and the responses to it in terms of the record file structures of various agencies, raise a number of problems related to the linking of files produced by two or more agencies. The
137 neces sity for a high degree of accuracy in the matching proces ~ requires the presence of a common, unique characteristic in each file; name, for example , is insuf ficient , since there may be many John Smiths in any file. Adding other characteristics, such as address, date of birth, wife's maiden name, number of children, will improve matching precision but at the same time will inevitably increase the risk that a particular record in the file can be identified subsequently as that of a particular individual. Thus files that in themselves do not violate confidentiality become suspect in the matching process as the number of characteristics expands. The use of unique identifiers such as social security number, by their very nature, permit the unique identification of an individual. In recent years, serious discussion has taken place about the issues of privacy and confidentiality. Studies have been undertaken to explore public perception of the meaning of confidentiality and public concerns with the issues (see for example National Research Council, 1979~. Debate also has taken place on how confidentiality can be maintained and individual privacy protected while, at the same time, data are provided for important policy purposes. One approach that has been proposed would recognize the major federal statistical agencies as a single entity within which data files could be exchanged for linkage or other statistical use while still honoring the requirements for confidentiality. Research also is under way on methods by which individual data can be modified sufficiently to ensure the confidentiality of the individual, without harming the data for analytic or linkage purposes. Given the ever-growing resource of administrative data, the large savings to be had in terms of cost and respondent burden, and the gains to be made in analytic terms from linking files, it is essential that efforts continue to develop acceptable solutions to the problem. The potential of interagency linkage may at present be limited by a lack of suitable identifiers. Although the Social Security Administration, the Internal Revenue Service, and some Census Bureau surveys all collect social security number, all machine-readable data sets suffer from some nonresponse, reporting error, or keying error, which reduces match rates and increases mismatches. The INS data sets do not include social security number in general, more commonly using the A-file number, so linking INS files with records from other agencies would not in practice be easy. Thus, although the potential benefits of interagency linkage are obvious, the practical obstacles make its implementation doubtful. However, intra-agency linkages are feasible and offer solid though less spectacular benefits. In the past, INS data systems have been designed and operated as discrete entities, not surprisingly given their administrative rather than statistical origins. The new ADP systems being implemented now represent a major change of direction, with data sets generated by each INS process being viewed as modules of a grand, integrated system linked through the Central Index. Once operational, the new systems will make it straightforward to link records of immigration or adjustment of status with subsequent naturalizations; to link apparent overstayers from the I-94 form (arrival records with no matching departure record) with I-213 records of Reportable aliens located; to link petitions for immigration benefits with characteristics of the principal alien; and to link notifications of change of address to other records of an alien. It is
138 important that the INS recognize not only the statistical but also the program value of such linkages, and implement regular, routine data tabulation across functionally independent data sets. The panel thus recommends that the INS: o Examine and implement procedures to exploit the potential of linking data sets for statistical and program management purposes as an integral part of the long-range ADP plan. The Social Security Administration is another agency with data sets that could usefully be linked. Current records of contributions and benefits provide information on area of residence, employment, and earnings, while records of initial applications for social security numbers provide background information on age, sex, year of application (a potential surrogate for year of admission), and country of birth. Though gaps in the record would be impossible to interpret (such gaps might result from absence from the United States, low income, or employment not covered by the system), the linkage of data sets internal to the agency would still provide a substantial amount of information about the economic activity of foreign-born residents, and make possible a direct assessment of contributions paid in against benefits paid out. Modification of Existing Data Collection Procedures Existing data collection procedures can be improved by raising data quality and by collecting additional pieces of useful information. Data that fail to meet minimum quality standards waste resources devoted to their collection, processing, and analysis and, worse, can result in misleading analytical conclusions and poor policy decisions. As detailed in Chapter 4, many of the INS data collection activities suffer from shortcomings of design, standardization, adequate supervision, and quality control. These shortcomings are particularly serious for data provided by INS administrative data sets--for example, data for border crossers--but have also affected the timeliness and quality of data on immigrants, temporary admissions, and naturalizations. The highest priority must be given to instituting sound collection and processing procedures incorporating step-by-step quality control, without which the collection of additional data would be pointless. Specific recommendations for necessary improvements are presented in Chapter 4 (and repeated in Chapter 9), and in fairness to the INS, some progress has already been made through the introduction of new ADP systems. At the level of particular data elements, emphasis must be put on the quality of occupational data for immigrants, by ensuring that INS interviewing officers probe the type of work performed by the applicant; at present, these data are virtually useless. There are also some items that could usefully be added to existing collection processes; applications for immigrant status should include a question on formal education; the I-94 arrival and departure form should reinstate questions on gender and port of embarkation or disembarkation; petitions to naturalize should also include questions on formal education. This list
139 is meant to be illustrative rather than exhaustive; a thorough review of the content of all INS forms is overdue and recommended in Chapter 4. Other agencies have traditionally paid more attention to data quality than has the INS, but they could still improve the usefulness of their data for purposes of U.S. immigration policy by modifying or adding to questions included in existing collection systems. As recommended in Chapter 5, the Bureau of the Census should continue to include questions relevant to the foreign-born consistent with earlier censuses--in particular, should reinstate questions on birthplace of parents in the 1990 census--and to clarify the question on date of entry to the United States to refer clearly to date of entry to take up residence, coding the responses to be consistent with periods used in previous censuses. A module to measure emigration of both immigrants and native-born citizens should also be included in the Current Population Survey, since little is known about emigration levels or patterns and the cost would be modest. New Data Collection Initiatives The modifications to data tabulation, processing, linkage, and collection procedures outlined in the previous four sections represent cost-effective improvements of the statistical base available for policy formation, but they cannot fill the largest single lack: good-quality longitudinal data on the process of settlement in the United States by immigrants and refugees, and on the social and economic impact of such settlement on the existing resident population. Even an automated system of record linkage, in which each contact of an immigrant with any official agency would be added to a historical file for the individual, would go only part of the way toward meeting the longitudinal data need, since the individual records would include gaps for periods without official contact and omit important occurrences such as further education, short- or long-term absence from the country, and changes in family and household relationships. To meet such needs, the panel strongly recommends that Congress mandate that the INS be the lead agency in: 0 The establishment of a longitudinal panel survey of a sample of aliens entering the United States or changing visa status during a 1-year period. This sample of an entry cohort would be followed up for a minimum period of 5 years. The survey should be repeated by drawing a new sample of entrants every 5 years thereafter. The sample would consist of: (a) Those admitted to permanent resident status, both new immigrants and those changing status; (b) Those admitted as temporary residents under educational, training, and short-term work visas; and Illegal aliens given legal status under amnesty provisions included in any future amendments to the INA. For each participant, data would be collected on:
140 (a) Initial characteristics: sex, age, country of birth, education, occupational history, year of entry, marital status, visa status and admission preference, family ties in the United States, place of initial settlement, and household structure; (b) Demographic changes, including : marital status, births, death, internal migration, temporary absence from the United States, emigration, formal or vocational education, and household characteristics; Income and labor force experience in the United States; and Program participation and service use, including educational and health costs of children; local, state, and federal taxes paid, and social security benefits and contributions. We recommend that the survey be funded by the INS but conducted under contract by a recognized survey research organization, either public or private, experienced in longitudinal panel design and execution. The sample should be selected from a 1-year cohort of entrants or those changing status, to ensure that the sampling frame is complete and that potential respondents can be located (at time of entry or change of status). Every effort, including the collection of social security numbers and names and addresses of close relatives or friends and the provision of incentives to respondents, should be incorporated as part of the survey approach in order to minimize the dropout rate and to help to locate those who migrate during the life of the study. The study design should incorporate the use of administrative data sets, partly to obtain data and partly for mutual evaluation. To obtain broad support for the study, as well as to identify key data items and to ensure sound design, an advisory panel of representatives of key agencies and experts in the field of immigration research and immigration policy should be established. Implementing this survey will not be inexpensive--we estimate a cost of around $5.5 million over 5 years for a sample of about 6,000 cases--although this cost is small relative to the $58 million budgeted by the INS for fiscal 1984 alone on ADP development and data processing. Such a survey is overdue and data needs are pressing, so work on the survey should start as soon as possible. A longitudinal sample survey such as that outlined above will meet many data needs, but it cannot be expected to meet all data needs, particularly for small-area or small-group data for which the sample would be too small. There will remain a need for continued analysis of other data and for in-depth studies of particular areas, issues, or groups; such work is best left to universities and other nongovernment research organizations. It should also be stressed that the proposed survey is complementary to other administrative data collection activities. It cannot tell policy makers how many entries of particular categories of aliens there are in a year, but it will provide a basis for predicting what the effects of such entries will be, and of what the effects of changing the numbers in each category would be.
141 IMPLEMENTATION OF RECOMMENDATIONS Immigration is an important and emotional area of public policy, yet as we have seen in this report the statistics on which informed debate and policy formation are based are woefully inadequate. Two of the major reasons for this inadequacy have been a lack of interest in or commitment to the production of relevant, high-quality statistics by the agencies having contact with aliens, and the failure of any one agency to take the lead in fostering a governmentwide coordinated system for collecting, processing, and analyzing the necessary data. This leadership role belongs by right to the INS as the agency primarily concerned with immigration policy and process. However, the INS has consistently failed to look beyond its immediate management needs for information, an attitude clearly expressed in its mission plan and in the assumptions underlying its ADP program, and it has on occasion actually impeded existing collaborative interagency agreements by introducing process changes without consultation or regard for outside needs. Blame for the lack of enthusiasm for producing immigration statistics shown by the agencies involved lies partly with the agencies themselves, but some part of the blame must also be borne by the executive branch and Congress: the agencies have not been told clearly enough to produce useful data. Two courses of action are necessary to improve the present unsatisfactory situation. One is a congressional initiative to mandate specific reporting requirements, particularly for the INS. The Refugee Act of 1980 shows what Congress can do in the area of data production by legislation, and the Simpson-Mazzoli bill also was a clear movement in the right direction. The panel therefore recommends to Congress that: 0 Specific language covering data collection, analysis, and reporting requirements be incorporated into an amendment to the INA and into all other legislation dealing with immigrants, refugees, and other aliens. The second is for the executive branch to establish an interagency review group to ensure coordination between agencies, action on necessary new initiatives, and due regard for quality control within agencies. Accordingly, the panel recommends: 0 That an interagency review group for immigration statistics be established under the aegis of the Statistical Policy Office of the Office of Management and Budget. This group would be charged with ensuring the coordination across agencies of data collection and processing in the area of migration and refugees and with overseeing the implementation of improvements within agencies, particularly with reference to timeliness, quality control, and responsiveness to changing data needs. An early task for the group would be to examine the recommendations on statistics of international migration of the United Nations (1980), with a view to proposing changes leading to greater conformity with the recommendations. The group would thus provide the leadership that has been so lacking in the past.
142 Improved coordination alone will go some way toward remedying the past neglect of immigration statistics, but the interagency review group must go further, to bear responsibility for the testing and implementation of the panel's specific, agency-directed recommendations. With the exception of the proposed longitudinal survey, we have not provided detailed cost estimates for the recommendations given in this chapter. The reason for this omission is that we believe that the cost of the proposed measures, again with the exception of the longitudinal survey and the reorganization within the INS, are small enough to be met within existing budgets, at least in the initial stages of implementation. Some reallocation with current appropriations will be necessary to effect these actions, but such changes fall well within the scope of normal managerial discretion. Estimates of the cost of reorganizing statistical activities in the INS are given in Chapter 4; as noted, we estimate additional recurrent expenditures of some $2.5 million per year once the proposed system is fully operational. This money must be spent to reverse past neglect, but the returns in terms of improved policy making and program monitoring fully justify the additional expenditure. REFERENCES National Research Council 1979 Privacy and Confidentiality as Factors in Survey Response. Committee on National Statistics, Assembly of Behavioral and Social Sciences. Washington, D.C.: National Academy of Sciences. United Nations 1980 Recommendations on Statistics of International Migration._ _ Department of International Economic and Social Affairs. Statistical Office. Statistical Papers Series M No. 58 (ST/ESA/STAT/SER.M/58) New York: United Nations.