5

Future Model Development: The Role of Administrative Records

OVERVIEW

Estimates for school districts and other subcounty areas from the Census Bureau's Small Area Income and Poverty Estimates (SAIPE) Program currently cannot be produced by using regression models similar to the state and county models. The latter models are advantageous not only because they use updated information to form the dependent and predictor variables, but also because the modeling procedure improves the precision of the resulting estimates. Instead, for subcounty areas a shares approach must be used: estimates from the previous census long-form sample of the shares or proportions for each subcounty area of the county total are applied to updated estimates from the county model. The estimates of shares are subject to high levels of sampling variability for many small areas and also necessarily assume that the relative proportions of poor people among areas within each county have not changed since the census. If appropriate variables could be found to use in regression models to predict poverty or income for subcounty areas, such models would likely be better than the current shares procedure.

The difficulty is that no administrative records data sources currently exist that can provide consistently measured, updated predictor variables for a subcounty model, in the way that tax return and food stamp data are used in the state and county models. There is also an issue of the source of the dependent variable in a subcounty model–the American Community Survey (ACS) might serve this purpose, perhaps calibrated in some



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond 5 Future Model Development: The Role of Administrative Records OVERVIEW Estimates for school districts and other subcounty areas from the Census Bureau's Small Area Income and Poverty Estimates (SAIPE) Program currently cannot be produced by using regression models similar to the state and county models. The latter models are advantageous not only because they use updated information to form the dependent and predictor variables, but also because the modeling procedure improves the precision of the resulting estimates. Instead, for subcounty areas a shares approach must be used: estimates from the previous census long-form sample of the shares or proportions for each subcounty area of the county total are applied to updated estimates from the county model. The estimates of shares are subject to high levels of sampling variability for many small areas and also necessarily assume that the relative proportions of poor people among areas within each county have not changed since the census. If appropriate variables could be found to use in regression models to predict poverty or income for subcounty areas, such models would likely be better than the current shares procedure. The difficulty is that no administrative records data sources currently exist that can provide consistently measured, updated predictor variables for a subcounty model, in the way that tax return and food stamp data are used in the state and county models. There is also an issue of the source of the dependent variable in a subcounty model–the American Community Survey (ACS) might serve this purpose, perhaps calibrated in some

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond manner to the March Current Population Survey (CPS), as discussed in Chapter 4. Alternatively, it might be possible to develop a bivariate model (see Chapter 3) in which both the ACS and March CPS provide dependent variables. 1 In this chapter we first review the advantages and problems of developing two possible data sources for predictor variables for subcounty income and poverty regression models: IRS tax return records, which could be used in modeling both income and poverty, and food stamp records, which could be used in modeling poverty. Both of these sources currently provide significant variables in the Census Bureau's state and county models; their use for subcounty models would require further development of the Census Bureau's capabilities for geocoding addresses to small geographic areas. Use of food stamp data would also require arrangements to obtain microlevel data on a regular basis from state agencies, or, alternatively, to enable state agencies to geocode the records and provide area summaries to the Census Bureau. In contrast, the Census Bureau already has access to selected information on individual income tax returns. If geocoding capabilities can be improved for subcounty areas but access arrangements cannot be worked out for food stamp data, it would not be possible to develop a subcounty poverty model with both IRS and food stamp data. However, it is possible that an acceptable subcounty poverty regression model could be developed with IRS data (including 2000 census data as another predictor variable), but not including food stamp data. 2 Alternatively, it may be possible to use geocoded IRS data to develop withincounty shares to apply to updated estimates from the Census Bureau's county model. Whether it is preferable to form within-county shares by using IRS data or ACS data (once they become available) is a question. Another question is whether it might be possible to combine ACS and IRS data in some manner to form the shares. It is possible that a shares model along these lines, which would use more current data to form the shares 1   The recent availability of funding to adjust the CPS sample size and design to support reliable state estimates of low-income children who lack health insurance coverage could make that survey more valuable as the source of a dependent variable for subcounty areas. 2   The Census Bureau conducted preliminary work on estimating a county model of poor school-age children, excluding the food stamp variable that is used as a predictor variable in the current model. The results demonstrated somewhat poorer performance for the model without food stamps, based on comparisons with 1990 census estimates and an estimate of goodness of fit. However, the model without food stamps was still an improvement for estimates of poor school-age children in 1989 over simpler procedures that assumed little change from the 1980 census (see Siegel, 1997; National Research Council, 2000c).

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond than is currently done by using the previous census, could be as effective as a regression model. Following the discussion of IRS and food stamp data and the prospects for geocoding these sources to subcounty areas, we consider the potential uses of data from the National School Lunch Program for improving poverty estimates specifically for school districts. School lunch data, which do not require geocoding, might be used, alone or in some combination with ACS data (and possibly 2000 census data, as well), to form within-county shares to apply to updated county model estimates. Alternatively, school lunch data might be used as a predictor variable in a regression model for school district poverty estimates. We then discuss data needs for improved population estimates, which are required for many uses of small-area income and poverty estimates from SAIPE: for example, in fund allocation formulas for which SAIPE estimates of numbers of poor need to be converted to poverty rates. Population estimates between censuses are developed by using administrative records, such as tax returns (see Chapter 3), and we discuss future directions for research and development to improve the data sources and methods for producing small-area estimates of total population and population by age. Based on its review and analysis, the panel lastly presents its recommendations on the possible uses of administrative records data to improve income, poverty, and population estimates from the SAIPE Program, which are listed at the end of the chapter. The panel is cognizant that enhancements to administrative records data sources and improved geocoding are likely to be costly. As part of planning its research program for the next decade, the Census Bureau should consult with user agencies about their needs for small-area estimates, particularly at the subcounty level. It could be useful to develop rough cost-benefit calculations jointly with these agencies to help guide further research and development. Such calculations might assess, for example, the benefits for fund allocation and other program purposes from improving the accuracy of estimates of poor school-age children for school districts (in terms of bias and variance) against the costs of developing the necessary data and geocoding capabilities. In its planning, the Census Bureau should also consider the benefits of possible improvements to administrative data and geocoding capabilities for other Bureau programs. TAX RETURN DATA Tax return data from the Internal Revenue Service (IRS) have long been used by the Census Bureau for small-area estimation. Each year the Bureau obtains a file from IRS of selected information on 1040 tax returns,

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond including type of return (e.g., joint, single), adjusted gross income, and other variables, which the Bureau uses for statistical purposes. 3 The 1040 data contribute importantly to the Census Bureau's population estimates for states and counties by providing the basis for estimating year-to-year net internal migration rates for people under age 65 between counties (see Chapter 3).4 They were used from 1971 to 1987 to estimate per capita income for local government jurisdictions for fund allocation under the General Revenue Sharing program. Currently, they are being used to form predictor variables in the SAIPE income and poverty models for states and counties. The process of assigning poverty status to the IRS records, in general terms, involves comparing adjusted gross income for families on tax returns to a poverty threshold that corresponds to the number of adult and child exemptions reported on the return (including exemptions reported for children away from home). Although there are differences between the definitions of families and income in IRS records and in household surveys and the census, they are not critical for purposes of developing a predictive model. More important is that the data provide consistent measures across areas. IRS data have the advantage that the rules for reporting of income are uniform across the nation. However, the Census Bureau 's evaluation has found some differences across states in the completeness of the tax files that it obtains from IRS that may affect use of the data in models (Cardiff, 1998). These differences occur because the Census Bureau receives an early version of the data for each tax filing year from the IRS. The Census Bureau should further investigate the quality of the data from the early version and determine if a somewhat later version would be preferable and could be used without delaying preparation of the estimates. In the Census Bureau's current CPS-based state model for estimating proportions of poor school-age children, the IRS data contribute two of the four predictor variables for each state: (1) proportion of child exemptions reported by families in poverty on tax returns and (2) proportion of people under age 65 who are not included on an income tax return, which is obtained by subtracting the estimated number of exemptions on in- 3   There is no reverse flow of information: that is, the Census Bureau does not provide individually identifiable information of any kind to the IRS (nor to anyone outside the Bureau). 4   Demographic information contained in the Social Security Administration Numident File, linked to IRS tax returns, also contributes to state population estimates by age, sex, race, and Hispanic origin (see Chapter 3). The Numident File will likely play an even more important role in small-area population estimates, now that the Census Bureau has access to 100 percent of the records and not only a 20 percent sample of them (see “Data Needs for Population Estimates” below; see also National Research Council, 1994).

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond come tax returns for people under age 65 from the estimated total population under age 65 that is derived from demographic analysis. The reason for including a variable to estimate people not reported on tax returns is because they are believed to be poorer on average than other people. IRS data also contribute to the SAIPE state models for median household income, total poverty, poverty for children under age 5, and poverty for people under age 18 (see Table 3-2 in Chapter 3). In the Census Bureau's current CPS-based model to estimate numbers of poor school-age children for counties, the IRS data contribute two of the five predictor variables for each county with poor sampled households containing school-age children in 3 years of the March CPS: (1) log (number of child exemptions reported by families in poverty on tax returns) and (2) log (number of child exemptions reported on tax returns). The second variable is included together with another predictor variable – log (estimated population under age 18 from demographic analysis) –to cover children not reported on tax returns (i.e., in nonfiling families), who are assumed to be poorer on average than other children. IRS data also form predictor variables in the SAIPE county models for median household income, total poverty, and poverty for people under age 18 (see Table 3-3 in Chapter 3). IRS tax return data are not identified by county of residence. In order to use the data in the county model, the Census Bureau must first assign the address on each tax return record to a geographic area. Over the years, the Census Bureau has refined its methods for geocoding addresses to counties so that the process is believed to work well in most instances. For both states and counties, there are errors in assigning addresses to area of residence because some tax returns are filed from a person's business address or the address of the tax preparer, which may not be in the same state or county as the taxpayer's residence. The extent of nonresidential tax-filing addresses, and, in particular, the number of such addresses that differ from the filer's state or county of residence, is not known. To use IRS tax return data to form predictor variables for an income or poverty model for school districts or other subcounty areas, or, alternatively, to form within-county shares or proportions for subcounty areas, the Census Bureau will need to further refine its geocoding capabilities so that addresses can be assigned geographic codes below the county level. As discussed below (see “Geocoding with TIGER and MAF”), the development of the 2000 census Master Address File (MAF) and the refinement of the TIGER geocoding system may make it possible to geocode addresses to subcounty areas with acceptable accuracy, although the problem of nonresidential tax-filing addresses will remain.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond FOOD STAMP DATA States and Counties The Census Bureau uses the proportions and numbers of food stamp recipients, respectively, as predictor variables in the SAIPE state and county poverty models. Two key eligibility requirements for food stamps, which make it suitable for modeling poverty, are that households must have gross income below 130 percent of the applicable poverty guideline and net income, after certain deductions, below 100 percent of the applicable poverty guideline.5 The gross and net income limits for eligibility and the ceilings on allowable deductions are higher in Alaska and Hawaii than in the other states due to their higher cost of living. The Census Bureau obtains monthly totals of food stamp recipients for states from the U.S. Department of Agriculture (USDA). The Bureau conducted research to determine how best to use these data for input to the state poverty models. Based on that research, the Bureau decided to use the monthly counts averaged over a 12-month period centered on January 1 of the calendar year subsequent to the income reference year for the poverty estimates. The Census Bureau further refines the food stamp counts in three ways: it subtracts counts by state of the numbers of people who received food stamps due to specific natural disasters from the counts of the total number of recipients; it uses the results of time-series analysis of monthly state food stamp data from October 1979 through September 1997 to smooth outliers; and it adjusts the counts of food stamp recipients in Alaska and Hawaii downward to reflect the higher eligibility thresholds for those states. For counties, the Census Bureau obtains counts of food stamp recipients from USDA and, in some instances, from state agencies, but the information obtained is not always the same for different counties: in most counties, the counts of food stamp recipients pertain to July; for some counties, they are an average of the monthly counts for the year. For input to the county poverty models, the Census Bureau rakes the county food stamp numbers to the adjusted food stamp state numbers. Although there are nearly uniform rules for administration of the Food Stamp Program across states, estimated participation rates– the proportion of eligible households that apply for and receive benefits–differ appreciably by state. Such differences, which may stem from differences 5   The poverty guidelines used for determining program eligibility are derived by smoothing the official poverty thresholds for different size families (see Fisher, 1992).

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond in outreach efforts, the stigma associated with participation, or other factors, could affect the comparability of food stamp counts across areas in terms of how well they relate to poverty. Interarea comparability may also be affected by changes in the design and administration of income support programs consequent to recent welfare reform legislation (the 1996 Personal Responsibility and Work Opportunity Reconciliation Act [PRWORA] and subsequent amendments). The legislation denied food stamp benefits to many immigrants, who are not distributed uniformly across the country. It also greatly limited benefits for able-bodied adults without dependents who do not meet work requirements, and, at the same time, it permitted waivers from those provisions for high-unemployment areas, which could affect interarea comparability. Another possible effect on interarea comparability may result from the marked decline that has occurred in welfare caseloads under the Temporary Assistance to Needy Families (TANF) program established by PRWORA. The extent of the decline has differed among states, in part due to differences in state efforts to move families off the welfare rolls. These differences appear to have affected food stamp caseloads as well, perhaps because families who leave TANF are discouraged from applying for other benefits, such as food stamps, for which they may still be eligible. A priority for the Census Bureau should be to assess the comparability across states and counties of food stamp data for years subsequent to passage of the 1996 legislation in terms of how well the data relate to poverty (see Recommendation 5-1, below). For example, analysis of trends in estimated participation rates for states before and after welfare reform could indicate whether some states have diverged from national trends. If comparability appears to have markedly decreased, it may not be appropriate to use food stamp data as a predictor variable in the state poverty models or even in the county poverty models as they are currently specified. However, if the data remain reasonably comparable across counties within states, then it might be possible to use food stamps in estimating poverty for counties. For example, it might be possible to develop a form of county model that would predict changes in poverty on the basis of changes in food stamp data and other predictor variables. The results could then be controlled to estimates from state models that did not include food stamp recipients as a predictor variable. Another problem with the use of food stamp data in the county models concerns the time that is required to obtain the data, which are not available until almost 2 years after the year to which they refer. This delay is one of the reasons that the Census Bureau currently produces county poverty estimates with a minimum 3-year time lag (e.g., estimates for income year 1995 were completed in fall 1998). As discussed in Chapter 3 ,

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond a priority for the Census Bureau should be to evaluate ways to reduce the time lag between the reference year of the estimates and the date when they are released. A way to reduce the delay that is due to lags in obtaining food stamp data could be to use the data for the year prior to the income reference year in the models. Subcounty Areas For subcounty poverty estimates, it is not now possible to consider using counts of food stamp recipients in a model because, in contrast to tax returns, the Census Bureau does not have access to individual food stamp records and hence cannot undertake to geocode the addresses to local areas. State agencies, and not the USDA, have custody of and control over the individual food stamp records, and state record systems differ in format and provisions for access. In some states, county agencies maintain and control their own food stamp databases. To obtain food stamp data for subcounty poverty models would most likely require a substantial investment of staff time and resources to build cooperative arrangements for geocoding and tabulating the data (see Becker, 1998). Such a cooperative enterprise would need to involve the Census Bureau, the U.S. Department of Education, the U.S. Department of Agriculture, other federal agencies interested in subcounty poverty estimates, and state agencies. Arrangements would need to be worked out for geocoding the microlevel records (assuming geocoding capabilities are improved, as discussed below) and for resolving discrepancies and errors in geocoding. Arrangements would also need to be worked out for access to the geocoded files. Given state confidentiality requirements as well as the benefits of local knowledge for geocoding, it may be that a workable arrangement would be to have state agencies perform the geocoding of food stamp records in their state and then provide summaries to the Census Bureau for counties, school districts, and other subcounty areas. Critical to the success of a decentralized system, such as that just outlined, or any other arrangement for geocoding and tabulating food stamp records for small areas, is to develop compelling incentives for the different stakeholders to participate. The benefits to the Census Bureau and to the Department of Education and other federal agencies of having food stamp data available for use in poverty models are clear, provided that the costs are bearable. State agencies perhaps could benefit from having geocoded food stamp records available for such purposes as streamlining the enrollment of children in school lunch and breakfast programs who are automatically eligible because their families are en-

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond rolled in food stamps.6 However, it would likely require resources from federal agencies for the geocoding work and other aspects of running a successful cooperative program, such as training and documentation, to enlist the full participation of state agencies. We are not optimistic about the prospects for developing an effective federal-state cooperative program for geocoding food stamp records to subcounty areas. Moreover, changes in the operation of the Food Stamp Program, as discussed above, raise questions about the usefulness of food stamp data for modeling in the future. But, if the demand for updated small-area poverty estimates continues to grow, then the benefits, costs, and feasibility of some type of cooperative program could be investigated. A possible approach would be for the Census Bureau and the Department of Education to identify one or two interested states that might be willing to establish pilot programs to serve as feasibility studies. GEOCODING WITH TIGER AND MAF The Census Bureau's TIGER (Topologically Integrated Geographic Encoding and Referencing System) database was developed after the 1980 census to provide a complete mapping of every line segment in the United States, including streets, rivers, other physical features, and invisible boundaries of governmental and statistical areas; to link address ranges for city-style addresses to line segments;7 and to link codes for census geographic areas (counties, census tracts, blocks, etc.) to the map spaces defined by the line segments. TIGER thus makes it possible to geocode (assign) addresses on administrative records to small census geographic units when the addresses are in city-style format-that is, when the address has a house or building number and street name, such as “104 Main St.” For larger areas, such as counties, the Census Bureau has developed methods for geocoding not only city-style addresses to those areas, but also non-city-style addresses, which include rural route numbers and post office box numbers. Essentially, the Bureau uses ZIP-plus-4 codes to assign non-city-style addresses to counties. The geocoded records can then be tabulated to provide statistics of interest for small areas. However, TIGER cannot now be used for geocoding addresses to 6   For this purpose, each school district in a state might be provided with the list of families participating in food stamps whose addresses were geocoded to that district, provided that such a procedure is compatible with the state's confidentiality provisions for food stamp data. 7   Because TIGER generates public use products, it contains address ranges and not individual street addresses; the Census Bureau regards the latter as confidential under Title 13 of the U.S. Code.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond subcounty areas because a significant percentage of addresses are not in city-style format, and the coding method used for counties for such addresses would not likely be accurate for subcounty areas. In addition, the address ranges in TIGER do not reflect all of the city-style addresses that exist. The completion of the Master Address File (MAF) for the 2000 census will make it possible to expand the address coverage in TIGER. The MAF contains individual addresses for housing units–separately identifying units in apartment buildings and other multiunit structures– together with the applicable codes for the state, county, census block, and other geographic entities. The MAF includes not only units with city-style addresses, but also units that lack such addresses. For the latter type of unit, MAF contains not only an address that, together with an appropriately marked map, can be used by an enumerator to locate the unit (e.g., “white trailer with green shutters”), but also, to the extent field staff are able to obtain the information, the mailing address for that unit (e.g., P.O. Box 8). Development of TIGER/MAF TIGER began with 1:100,000-scale maps of the entire country from the U.S. Geological Survey and obtained input from three previously separate sources of geographic information that were used in the 1980 census. These sources were Geographic Base Files, developed for the densely settled portions of metropolitan areas, which linked address ranges to blocks, census maps, and Geographic Reference Files, which linked blocks to other geographic units (census tracts, towns, counties, etc.). Originally, about 65 percent of total addresses were contained in the address ranges that were associated with line segments in TIGER. By adding information from the Address Control File that was developed for the 1990 census, TIGER address range coverage was expanded to about 85 percent of addresses. The remaining addresses could not be linked to line segments because they were not in city-style format. By adding information from the 2000 MAF, the address range coverage in TIGER is being expanded yet further. The MAF began with the 1990 census address list and has been updated over the decade with the U.S. Postal Service's Delivery Sequence File (DSF). As the MAF is updated, new addresses are geocoded by TIGER to the extent possible. When new street names are identified from the DSF (e.g., a new subdivision), the Census Bureau's regional offices check them using local maps or in the field, and the map locations and address ranges for the segments are added to TIGER. Several operations were conducted to further update the MAF and TIGER in preparation for the 2000 census. These operations included

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond field canvassing of the entire country by Census Bureau staff and review of the MAF and TIGER-derived maps by localities and tribal governments. Also, before the census, consistency checks were run between MAF and TIGER. To update governmental unit boundaries in TIGER, the Census Bureau every year conducts a Boundary and Annexation Survey to ascertain boundary changes for counties, cities, townships, and American Indian areas. In addition, beginning with the 1995-1996 school year, the Department of Education is providing funding for school district boundaries to be updated and put into TIGER every 2 years. Prospects for Improved Geocoding The Census Bureau hopes to obtain funding for a TIGER/MAF modernization and continuous updating program following the 2000 census. As part of this program, the Census Bureau would exchange electronic files of addresses and geocodes with local and tribal governments, when possible, and perhaps use satellite imagery data for more precision for structures and physical features. The Census Bureau will in any case use the U.S. Postal Service's DSF to update both the MAF and TIGER on a continuous basis after 2000–at least as often as every year, and perhaps two or three times a year. In addition, plans are being developed in conjunction with the American Community Survey for a Community Address Updating System (CAUS) as another source of input for the continuous development of TIGER/MAF, primarily in parts of the United States that do not have city-style addresses for mail delivery. In turn, the MAF will be used as the sampling frame from which the ACS monthly samples are drawn. As outlined in Alexander (1999), the CAUS will involve field work conducted by ACS and other Census Bureau survey field staff to check areas of housing growth and to correct errors and omissions in TIGER/MAF. In areas with city-style addresses, CAUS staff will field check growth areas to identify omissions in the DSF updates. In areas with non-city-style addresses that have locally developed geocoded address lists, CAUS staff will validate the local lists and check growth areas for errors and omissions. In non-city-style areas without local lists, CAUS enumerators will field check all areas of growth. Growth areas will be identified from community sources, administrative records, and observations by interviewers. As a result of all these activities, the address range coverage in TIGER should become ever more complete for geocoding purposes. The 2000 version of TIGER, after a full cross-check with the 2000 MAF, should have more complete address range coverage than the pre-2000 census version

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond affected by such factors as perceived stigma (it is believed that high school students are less likely to participate than elementary school students for this reason) and the extent of outreach by school officials to encourage families to sign up for the program. Students approved for free lunches include children enrolled in participating schools in the district, whereas the Census Bureau is charged to produce estimates of poor school-age children who reside in the district. The two populations differ to the extent that poor resident children attend nonparticipating private schools or schools outside their district (nonresident poor children may also attend schools in the district).12 If the relationship between students approved for free lunches and poor school-age children varies across jurisdictions, it would not be possible to use school lunch data to estimate school-age poverty for school districts directly (e.g., by applying a constant factor to the school lunch counts to obtain estimated numbers of poor school-age children). If school district estimates are obtained by suballocating or distributing county-level estimates, as is done in the current county shares approach, then school lunch data could be used in modeling the suballocation if the relationship between school lunch participants and poor school-age children is constant across school districts within counties. However, variations in the relationship within counties would be a problem for such modeling. There are two other reasons that such modeling could be problematic if school lunch data appeared suitable to use in models for some but not all states and counties. First, there would be practical difficulties for the Census Bureau to collect the data and develop and evaluate different estimation procedures for different sets of school districts, even when it might be possible to improve the accuracy of the estimates in some cases. Second, if the use of different estimation procedures produced estimates with different biases across school districts, there could be a problem of equity for education programs, such as Title I concentration grants, that have a sizable threshold for allocating funds: given a fixed appropriation and a threshold, the allocations to one area can affect the allocations to other areas (see Chapter 6). Yet the number of students approved for free lunches is an indicator of low income that relates specifically to the population of school-age 12   The increased numbers of charter schools, which may have ill-defined boundaries that overlap existing school districts, could also make it difficult to relate the number of students approved for free lunches to the number of poor school-age children who reside in a district.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond children and is available annually. Moreover, it is not subject to the sampling error that is such a serious problem for school district estimation for indicators based on sample data, such as the census long form and the American Community Survey. Thus, if school lunch data were available and determined to relate in a reasonably consistent manner to school-age poverty across jurisdictions, the Census Bureau could consider using such data to modify its current estimation process. For example, as noted above, it could use school lunch counts instead of 1990 (or 2000) census data to develop within-county shares for school districts to apply to updated estimates from the county poverty model. Or it could consider using a combination of school lunch and census data or school lunch and ACS data (when those become available) to form within-county shares. Alternatively, changes in school lunch counts, instead of shares, could be applied to updated county estimates. 13 Yet another alternative is the possibility of developing a school district poverty model similar to the state and county regression models, and using school lunch counts, or year-to-year changes in those counts, as a predictor variable in the model (assuming comparability of school district school lunch data over counties and states). Evaluations The panel undertook a limited evaluation of a school lunch-based shares approach for estimating school-age poverty in two states for which it was able to obtain complete free and reduced-price school lunch data for almost all public schools and assign them to school districts: 1989-1990 for New York and 1990-1991 for Indiana.14 The analysis compared three sets of estimates of poor school-age children in 1989 for school districts in each of the two states with 1989 estimates from the 1990 census. The three sets were developed by allocating 1990 census county estimates of poor school-age children to school districts using three different methods: (1) a method, similar to the Census Bureau's shares model, in which 1980 cen- 13   To illustrate, a change model could produce school-district estimates for, say, 2005 by calculating the ratios of school lunch counts from 2005-2006 to the counts in 1999-2000, applying those ratios to 2000 census estimates, and then controlling the sums of the adjusted 2000 census estimates for the school districts (or school district parts) within each county to the 2005 estimates from the county model. 14   The New York State evaluation was carried out at the State University of New York-Albany by Dr. James Wyckoff, a member of the panel, assisted by Frank Papa (see National Research Council, 2000c:App. D). The Indiana evaluation was carried out at the University of Notre Dame by Dr. David Betson, a member of the panel (see Betson, 1999b).

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond sus within-county school district shares of poor school-age children were applied to the 1990 census county estimates;15 (2) a method in which 1989-1990 (or 1990-1991) within-county school district shares of the number of students approved for free lunches were applied to the 1990 census county estimates; and (3) a method in which 1989-1990 (or 1990-1991) within-county school district shares of the combined number of students approved for free or reduced-price lunches were applied to the 1990 census county estimates. We found that even though the school lunch data pertained to the same year as (or 1 year later than) the 1990 census comparison estimates, neither set of school lunch-based estimates was much more accurate in either state than the estimates that were based on 1980 census data, which were 10 years out of date. Looking at both overall differences and differences for categories of school districts, the use of the number of students approved for free lunches as the basis for estimates of poor school-age children was marginally more accurate than the other two methods that were evaluated.16 These results are not encouraging for the use of school lunch data as a consistent measure of poverty for school-age children. However, the finding that free lunch counts are marginally more effective than the previous census for estimating within-county shares of poor school-age children for school districts suggests that it could be worthwhile for the Census Bureau to further evaluate the potential uses of school lunch data for SAIPE (see recommendation 5-3, below). Also, school lunch data are widely used by states as a proxy measure for poverty in allocating state funds and suballocating federal funds to school districts (see Chapter 2), and they carry considerable face validity in that context. Further evaluations by the Census Bureau could thus be helpful not only for the SAIPE Program, but also for other uses of school lunch data. For further evaluation, the Census Bureau could replicate the panel 's analysis for a few more states if there are states other than Indiana and New York for which it is possible to obtain 1989-1990 (or 1990-1991) school lunch counts for school districts. When 2000 census data become avail- 15   The 1980 census share estimates were not ratio-adjusted, as was done for the 1990 census share estimates (see Chapter 3). 16   For example, the average absolute difference between the 1990 census estimates of poor school-age children for school districts in New York State and the estimates from each of the three methods, as a percentage of the average number of poor school-age children in the districts, was 23.9 percent for the method that used 1980 census data, 22.3 percent for the method that used free lunch data, and 24.2 percent for the method that used free and reduced-price lunch data.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond able, the Bureau could also conduct similar evaluations that compare estimates of school-age poverty for 1999 (instead of 1989). The evaluations could examine the performance of models in which changes in the numbers of students approved for free lunches are used to develop estimates, as described above, as well as the performance of a method in which school lunch data and census data are used in combination to develop estimates of within-county school district shares of poor school-age children. Because some formula allocations, such as Title I concentration grants, impose a threshold for receiving funds, it is important for the evaluations to include an analysis of the threshold effects. For example, the panel's analysis for New York found that using school lunch data that were not adjusted to county estimates greatly overestimated the number of districts that exceeded the Title I concentration grant eligibility threshold of more than 15 percent or more than 6,500 poor school-age children. The reason for this result is that school lunch counts include children in families with incomes that are near but not below the poverty threshold. Adjusting the school lunch data to add up to county estimates of poor school-age children–that is, using the school lunch data to form within-county shares–greatly improved the accuracy of estimates of districts that were eligible for Title I concentration grants. The results of a more extensive set of evaluations along the lines suggested could indicate whether the Census Bureau should continue to consider the use of school lunch data for school district poverty estimates. If these data are to be used, a major effort would be needed to improve the reporting of the data to NCES for use by the Census Bureau for estimation purposes. DATA NEEDS FOR POPULATION ESTIMATES Uses The Census Bureau's program of population estimates serves a variety of needs of federal, state, and local government agencies. National-level estimates by age, sex, race, and Hispanic origin are used as controls for weighting the responses to such surveys as the CPS and the Survey of Income and Program Participation (SIPP), and the ACS uses county-level estimates by age, sex, race, and Hispanic origin for weighting. Population estimates are also used as denominators for vital rates (e.g., birth and death rates), and they have extensive uses in fund allocation: currently, $180 billion of federal dollars are allocated to states and other areas by formulas that include population estimates in the formula (U.S. Census Bureau, 1999d; see also U.S. General Accounting Office, 1999).

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond Fund allocation programs that use poverty estimates from the SAIPE state and county models often require state and county population estimates to convert estimated numbers of poor to estimated proportions of poor and vice versa. This use requires population estimates of persons under age 5 (states only), aged 5-17, under age 18, and total population. The SAIPE poverty models for states require state estimates of total population and persons under age 65 to serve as predictor variables in one or more of the models. State estimates of total population and population by age are also used to convert estimated poverty rates from the state poverty models to estimated numbers of poor. The SAIPE poverty models for counties require county estimates of total population and people under age 18 to serve as predictor variables in one or more of the models. For school districts, population estimates of children aged 5-17 are needed to convert SAIPE model estimates of numbers poor to proportions poor for determining eligibility for Title I concentration grants. Also needed for Title I allocations are school district estimates of total population–due to a provision in the legislation whereby states can use estimates other than SAIPE estimates to allocate funds to school districts with fewer than 20,000 people. Future Research and Development Although evaluations have shown that the population estimates are considerably more accurate than the poverty estimates for counties and school districts and appear to have relatively little effect on the poverty estimates (see Chapter 3), there is still room to improve the population estimates, particularly for school districts. In this section we discuss how improvements in population estimates may be achieved in the next decade either by the use of new data becoming available or by new applications of existing data series (see recommendation 5-4, below). Administrative records have been the mainstay in the preparation of population estimates for many decades (see Chapter 3), and we discuss possible new uses and improvements in two major administrative sources: tax returns (linked with Social Security data for population estimates by age) and school enrollment data. We also consider possible new roles in the population estimates program of the Master Address File and the American Community Survey. Tax Returns Total Population Federal income tax return (form 1040) files are critical for state and county total population estimates because they are used to estimate the intercounty migration component of the county estimates

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond (for people under age 65), which, in turn, are summed to states. About 85-90 percent of the population is covered by the tax files but with significant geographic variation: the lowest state population coverage is about 80-85 percent, but for many small counties, the population coverage averages under 70 percent (Creech and Sater, 1999). The proportion of the population serving as the basis of the estimates is further reduced by the year-to-year matching process used to estimate net migration. Thus, a large proportion of the population is being estimated indirectly by using the migration rates of matched taxfilers as proxies for persons not covered or matched in the tax files. It would likely improve the population estimates if the tax files covered a higher proportion of the population. A possible approach for improving overall coverage is illustrated by research done at the IRS (Sailar and Weber, 1998), which involved unduplicating files of information documents (Forms 1099 and W-2) and matching them to 1040 forms. Information documents are forms that employers, government agencies, and other organizations are required to file to report income paid to individuals. The Information Returns Master File (IRMF) includes information from many different information documents, the bulk of which (1993 tax year) are Form W-2, wages and salaries (27%); Form 1099-INT[erest] and Form 1099-DIV[idends] (42%); Form 1099-B, sales of capital assets other than real estate (10%); Form 1099-G, government transfer payments, and Form 1099-SSA, Social Security benefits (11%). The challenge in using the IRMF is to identify the small percentage of forms that relate to people who are not already included on the individual tax returns (Form 1040). The frequency of appearance of a type of 1099 form in the IRMF is no indicator of its importance to improving the population count. For example, there are many more 1099-INT and 1099-DIV forms than there are 1099-SSA forms; however, most recipients of 1099-INT and 1099-DIV forms already file tax returns, whereas many Social Security recipients do not, so the 1099-SSA forms will make a greater contribution to the population count. In Sailer 's study, unduplicating and merging the 1099 forms into the IRS 1040 files by using Social Security numbers (SSNs) and other information increased the overall coverage of the population from 85-90 percent to 97 percent, which likely reduced the geographic variation in coverage as well. This magnitude of coverage increase would likely improve the quality of the migration estimates–and, in turn, the population estimates–derived from the tax files. Methods and procedures for regularly using information returns–which amount to some 1 billion documents annually–are yet to be developed, but they warrant the Census Bureau's close attention. The Census Bureau is planning to conduct research and experimentation as part of the 2000 census on the use of tax return and other adminis-

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond trative records to obtain population and housing information. In a limited set of sites, the Census Bureau will merge and unduplicate several administrative files obtained from other federal agencies, geocode the addresses to census blocks, and compare the block-level population counts to census counts.17 The Census Bureau will also match the merged administrative records file to the MAF to compare household-level data. The results of this work could lead to improved data for developing population estimates, particularly if files are included that expand coverage of the population beyond what federal agency files are likely to provide. A National Research Council (2000a) panel has recommended that the Census Bureau obtain food stamp files for the areas for which the experiment is to be conducted. Population by Age At present, tax files are used only to derive migration rates in developing county estimates of total population. However, since SSNs for filers and all dependents are now required on tax returns, it should be possible to generate county estimates by age group using the same methodology as for the total population, assuming that IRS can provide the full file with all the necessary codes to the Census Bureau.18 At present, IRS provides SSNs for filers and the first four dependents on each tax return on the file extract furnished to the Census Bureau. If SSNs were provided for all dependents, the Census Bureau could obtain their ages and those of the filers by matching to the Social Security Numident File, which the Bureau now regularly receives and which contains birthdates. With this information, the Census Bureau could develop updated county estimates by age directly instead of using the current raking-ratio procedure in which county age estimates from the previous census are adjusted to agree with updated county total population estimates and updated state population estimates by age. This method would need to be evaluated, including the extent of errors in SSNs, particularly for dependents. If the information documents (1099 forms) could be merged with the 1040 tax files and the population coverage of the files thereby increased significantly, it would be possible to develop simpler methods with which 17   Files the Census Bureau plans to use include IRS 1040 tax returns and 1099 information documents, the SSA Numident file, Medicare enrollment records, Selective Service registration files, Department of Housing and Urban Development tenant rental assistance certification files, and Indian Health Service patient registration files. 18   The Census Bureau is experimenting with such an approach for state population estimates by age; see Chapter 3.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond to estimate the population under age 18 for counties by making use of aggregate data on tax returns on the number of child dependents. However, it is not clear to what extent merging information documents with the 1040 tax files will improve coverage of children (rather than adults). School District Estimates For school districts, there are no indicators of population change that are currently available for use in the population estimation process, either for the total population or for the population aged 5-17. As a consequence, school district population estimates are less accurate than county or state population estimates. The small size of most school districts also makes estimates for them less accurate than estimates for states and counties. Data from IRS tax files could likely contribute to improved school district population estimates if the individual records could be geocoded to school districts, as they are for other levels of geography. We recommend that the Census Bureau assign high priority to evaluating the extent of geocoding of tax records to school districts that can be achieved with the TIGER system after the 2000 census. Assuming the results are reasonably positive, the Census Bureau should proceed with research to determine how best to use tax records for improved small-area population estimates, as well as improved small-area income and poverty estimates. Research will also be required to determine how best to maintain and improve the geocoding capabilities of TIGER/MAF throughout the decade. School Enrollment For many decades, information on school enrollment, both public and private, was an important element of the Census Bureau's population estimation methodology for counties and states. Data on enrollment in the elementary grades were especially useful because school attendance at the relevant ages is compulsory. As a result, the number of children enrolled in elementary school was close to the total population of elementary school-age children, and the relationship between the two numbers was fairly stable over time. This close relationship permitted the development of a methodology (component method 2) to derive relatively reliable net migration rates of the school-age population, which in turn were used to estimate net migration rates of the total population of areas (for a detailed description of the methodology, see U.S. Census Bureau, 1987). The method was dropped in the 1980s for a variety of reasons, including the disappointing results of evaluations, carried out with 1980 census data, of population estimates that used school enrollment for estimating

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond total migration; the extensive data collection required to obtain reliable data for all counties in the United States; and the deterioration of the relationship between enrollment and school-age population over time, possibly due to the growth of private schools, for which county of residence and area of attendance do not always coincide, and of busing across county lines, among other reasons. However, school enrollment information is still used in estimating the population by age for states (for people under age 65). In light of the need for estimates of the population aged 5-17 at the county level and the very close relationship between school enrollment and the age group of interest, we encourage the Census Bureau to reexamine the school enrollment approach for developing these estimates, including an assessment of data sources. This methodology should be evaluated as part of the Bureau's 2000 census test program for evaluating population estimates. School enrollment data may be useful in two ways. They could be used to derive migration estimates to feed into county population estimates for children aged 5-17. They could also perhaps be used directly to measure changes in the distribution of the school-age population among counties within states and among school districts within counties. For this purpose, the U.S. Department of Education's Common Core of Data school enrollment information may be useful, although the data pertain only to public school enrollment. For school districts, it could be possible to estimate within-county changes over time in contrast to the current system of maintaining the relative distribution from the last decennial census. Master Address File The Master Address File, the list of addresses on which the 2000 census enumeration is based, will be maintained and updated continuously throughout the decade (see above, “Geocoding with TIGER and MAF”). Sources for updating the MAF, and the associated TIGER geocoding system for assigning addresses to geographic areas, will include the U.S. Postal Service's Delivery Sequence File, input from local communities, and listing operations in selected areas by enumerators for the American Community Survey. A continuously updated MAF will provide a current nationwide inventory of residential addresses and housing units. The MAF can very likely be used to improve the methods for population estimates in future years. To begin with, it would provide a firm starting point and control for the housing unit method of population

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond estimation, which the Census Bureau currently uses for population estimates for places and other county subdivisions.19 Beyond that, a more far-reaching application of the MAF for population estimates would be to explore matching administrative records and merging population information onto MAF address records to provide data on the characteristics of the population for areas of interest. For example, it might be possible to develop population estimates by age from such matching operations. As noted earlier, work along these lines is planned as part of the Census Bureau's 2000 census research and experimentation program for use of administrative records. The ACS might also be able to contribute to improved population estimates. For example, ACS data on vacancy rates, household size, and type of structure, averaged over several years, might be used together with housing unit control counts from the MAF to improve the housing unit estimation method. ACS data on measures of change over time, including migration, could perhaps also augment measures derived from other sources, such as tax files, to improve estimates for the total population and age groups. These and other uses of the ACS for population estimation will require evaluation of such aspects of the ACS as the sampling variability in the estimates and the differences between ACS and census residence rules (see Chapter 4). The census is the basis for carrying forward population estimates, and differences in residence rules could affect the comparability of census and ACS data, particularly for areas with transient populations. RECOMMENDATIONS 5-1 The Census Bureau and other agencies that produce small-area estimates by using administrative records, such as tax returns and food stamp data, should regularly devote resources to reviewing the quality, comparability, and timeliness of those administrative data for their use in estimation. The review should consider possible changes to administrative records systems that would benefit estimation without undue cost to the data collection agency or undue burden on respondents. For the Census Bureau's small-area models of poverty, it is particularly important to review the interarea comparability of food stamp data before and after the 1996 welfare reform legislation in terms of how these data relate to differences in poverty. 19   In this method, estimates of changes since the previous census in the housing stock, derived from building permits and other sources, are combined with census-based estimates of housing vacancy rates and the number of people per housing unit to estimate the change in population for an area since the previous census.

OCR for page 125
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond 5-2 The Census Bureau should give high priority to enhancing the capabilities of its TIGER/MAF system to geocode addresses from administrative records to small areas. The Bureau should conduct a study, as soon as possible after the 2000 census is completed, of the extent to which TIGER can be used to geocode addresses on IRS tax returns to school districts. 5-3 The Census Bureau should consider conducting evaluations of the possible uses of National School Lunch Program data to develop improved estimates of poor school-age children for school districts. 5-4 The Census Bureau should conduct research on improved data and methods for small-area estimates of total population and population by age. In particular, such research should include: ways to improve population coverage in tax return files on the basis of information documents, to use tax returns for estimates of population by age, and to geocode tax returns to subcounty areas; reassessment of the usefulness of school enrollment data for county and school district estimates of school-age children; and ways to use the Master Address File and, perhaps, the American Community Survey to improve population estimates.