4

Future Model Development: The Role of Surveys

USER OVERVIEW

Evaluation studies of the Census Bureau's estimates of poor school-age children, produced as part of its Small Area Income and Poverty Estimates (SAIPE) Program, have established that the updated estimates are more accurate than outdated estimates from the decennial census (see Chapter 3). However, these same studies have also highlighted a need for further improvement in the estimates, particularly for subcounty areas. Research and development of the state and county models, as recommended by the panel, can help. However, marked improvement in the SAIPE estimates, particularly for school districts or other very small areas, will require new data sources. Possible new sources of household survey data, discussed in this chapter, may support significant improvements in the quality of the estimates in the next decade and beyond. (Improved administrative records data that may also play an important role are discussed in Chapter 5.)

Estimates from the SAIPE Program now reflect the income and poverty measurements in the Current Population Survey (CPS) March Income Supplement, which asks each March about the previous year's income for a sample of about 50,000 households. The state and county models are tied to the CPS in that the dependent variable in the regressions –the variable being predicted–is from 1-year CPS estimates in the state model and from 3-year average CPS estimates in the county model. Other data sources, including the 1990 census and administrative records,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond 4 Future Model Development: The Role of Surveys USER OVERVIEW Evaluation studies of the Census Bureau's estimates of poor school-age children, produced as part of its Small Area Income and Poverty Estimates (SAIPE) Program, have established that the updated estimates are more accurate than outdated estimates from the decennial census (see Chapter 3). However, these same studies have also highlighted a need for further improvement in the estimates, particularly for subcounty areas. Research and development of the state and county models, as recommended by the panel, can help. However, marked improvement in the SAIPE estimates, particularly for school districts or other very small areas, will require new data sources. Possible new sources of household survey data, discussed in this chapter, may support significant improvements in the quality of the estimates in the next decade and beyond. (Improved administrative records data that may also play an important role are discussed in Chapter 5.) Estimates from the SAIPE Program now reflect the income and poverty measurements in the Current Population Survey (CPS) March Income Supplement, which asks each March about the previous year's income for a sample of about 50,000 households. The state and county models are tied to the CPS in that the dependent variable in the regressions –the variable being predicted–is from 1-year CPS estimates in the state model and from 3-year average CPS estimates in the county model. Other data sources, including the 1990 census and administrative records,

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond provide predictor variables in the models, but the goal is to predict CPS-measured income and poverty. The school district model is tied to the CPS as well: 1990 census shares or proportions of poor school-age children for school districts within counties are applied to updated estimates from the CPS-based county model. The use of the CPS as the dependent variable in the SAIPE models reflects a shift from the previous standard of measurement for many uses of small-area income and poverty estimates (e.g., allocating Title I funds), which was the decennial census long-form survey. The definitions of income and poverty are the same in the census and CPS, in that both use the official concept of income (before-tax money income for a calendar year), the official poverty thresholds for different size and type families, and the official unit of measurement (families and unrelated individuals as defined by the Census Bureau). However, differences in data collection procedures and other aspects of the two surveys result in somewhat different measurements. For example, the 1990 census estimate of U.S. median household income (for 1989) was 4 percent higher than the corresponding estimate from the March 1990 CPS, continuing a pattern from previous censuses (see Citro, 1996). Similarly, the 1990 census estimate of the proportion of U.S. poor school-age children was 6 percent lower than the corresponding March 1990 CPS estimate (National Research Council, 2000c:Ch.3). The CPS is currently the source of official annual income and poverty statistics, and it has several advantages over the decennial census for that purpose. It is conducted more frequently than the census and so permits more regular updating of estimates. Also, the CPS is believed to provide more accurate measures of poverty and income than the census, primarily because it asks more questions about income and is conducted by personal and telephone interviewing instead of mailout/mailback techniques.1 A main drawback of the CPS, which the regression modeling procedure is intended to address, is the small size of the sample compared to the census long-form sample. This small sample size, together with the clustering of the CPS sample design, results in sizable sampling variability of the CPS state estimates and a lack of any sample in most counties and school districts. Looking to the future, several household surveys could contribute to improved estimates from the SAIPE program, and, in addition, the sample size of the March CPS itself may increase. These surveys are: 1   In the evaluations of the SAIPE estimates of poor school-age children, the 1990 census was used as a standard of comparison for SAIPE estimates produced for 1989 because of a lack of other sources for external evaluation (see Chapter 3). However, this use does not make the census a “better” standard of measurement than the CPS.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond The 2000 census long form, which will provide small-area estimates of income and poverty for 1999 from a sample of about 18 million housing units (about one-sixth of total housing units, similar to the 1990 census long-form sample size); The American Community Survey (ACS), which is currently under development and contains content similar to the census long form (see Chapter 2); and The Survey of Income and Program Participation (SIPP), which plans to start a new panel in 2001 (see Chapter 2). In the remainder of this chapter, we first compare the major features of the 2000 census long-form survey, ACS, March CPS, and SIPP. We then consider alternative uses for these surveys in the SAIPE Program, including: direct estimates for some areas; estimates to use as dependent variables in models; estimates to use as predictor variables in models; estimates for smaller areas of their shares or proportions of the poor population in larger areas; and estimates for controlling or calibrating other estimates on selected characteristics. The chapter ends with a summary of the panel's conclusions and recommendations on these uses. To evaluate which uses would be feasible and desirable for one or more of the surveys, we focus on the reliability of survey estimates in terms of their error due to sampling variability; how frequently survey data are available and on what time schedule; and the quality of survey income measurements and how they compare with CPS measurements. Comparability is particularly important if another survey (e.g., the ACS) is to provide the basis for the dependent variables in the SAIPE models in place of the CPS. Depending on the extent of comparability, such a change could alter the standard of measurement and have unintended consequences for the use of estimates in formula allocations (see Chapter 6). However, using another survey for this purpose would be warranted if the change is judged likely to significantly improve the estimates. Because no survey can provide direct estimates of sufficient reliability, timeliness, and quality to replace all of the SAIPE estimates, the panel concludes that SAIPE must continue to rely primarily on models for updated estimates for small areas. To determine how SAIPE models can best use the income and poverty data from surveys, the Census Bureau will need to learn more about measurement differences among them. To this end, the panel recommends exact matches and other comparisons of the CPS, ACS, and SIPP with the 2000 census records. If it is implemented as planned, the ACS will provide subnational estimates that are available as frequently as estimates from the March CPS and are more reliable than those estimates. For states, the ACS estimates, averaged over a year, will be sufficiently reliable that they

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond could be used directly for SAIPE. For most smaller areas, the ACS estimates will not be sufficiently reliable to be used directly, even when averaged over several years, but they could be used in models. For the SAIPE county models, the panel recommends that the Census Bureau begin research and development now to explore the use of ACS estimates either to provide one of the predictor variables in CPS-based models or to serve as the dependent variable in the models. The Census Bureau should also conduct research on using ACS estimates, in place of or possibly combined with estimates from the previous census, to form within-county shares or proportions for school districts and other subcounty areas to apply to updated county model poverty estimates. The shares approach for subcounty estimates is necessary until such time as appropriate administrative data are developed for subcounty areas that can support a statistical model similar to the state and county models. If the ACS is to play a major role in the SAIPE Program along the lines suggested by the panel, the survey needs to have consistent levels of funding over the next decade that are sufficient for the planned sample sizes. Insufficient funding would likely lead to reduced sample sizes and other discontinuities in the data that could jeopardize the usefulness of the ACS for SAIPE and, more generally, make it difficult to assess the potential of ACS data for small-area estimation. The panel sees a continuing role for indirect use of the census long-form estimates in the SAIPE Program. The Census Bureau should plan to use 2000 census estimates as predictor variables in the current SAIPE state and county models. The role of the 2000 census direct estimates is less clear. These estimates will be quite reliable for states, many counties, and some smaller areas and will have face validity with users. However, to use these estimates as the SAIPE estimates (for 1999) could result in inconsistencies in the time series of estimates. Also, 2000 census long-form estimates will be unreliable for many school districts and other small areas, and the estimates may not be available in time to meet the Census Bureau's current production schedule, which calls for 1999 SAIPE estimates to be released in fall 2002. The panel recommends that the Census Bureau review alternative approaches for the 1999 SAIPE estimates with key users, so that the Bureau's decisions about whether and how to use the 2000 census direct estimates for 1999 are well understood. Finally, work is under way at the Census Bureau on experimental measures of poverty, based on the report of a National Research Council panel (1995a), which recommended revising the poverty threshold concept and the definition of family income and using income estimates from SIPP, which is believed to obtain better measures of income and poverty than the CPS. Should the Census Bureau decide to use SIPP for official

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond poverty statistics based on a revised concept (changes in the SIPP design and more timely data processing would be needed to make this feasible), then it would be important to consider how to adjust SAIPE estimates to agree with SIPP totals for selected characteristics, such as age, race, and geographic region. The panel has outlined an ambitious program of research and development for the Census Bureau to determine the best uses of household survey data for SAIPE models. Such a program may be quite costly, and the Census Bureau will need to monitor progress carefully to try to identify the most promising approaches on which to focus scarce resources. Offsetting the costs is that many of the activities recommended– such as exact matches of survey and census records–will be helpful for many other uses of household survey data, in addition to SAIPE. SURVEY FEATURES This section describes the main features of the 2000 census long-form sample, ACS, March CPS, and SIPP, including content, sample size and design, data collection schedule and procedures, residence rules, response rates and other quality measures, and data processing and release. Table 4-1 summarizes the key features of each survey. 2000 Census Long Form The 2000 census, like every census since 1960, included a long-form questionnaire that was administered to a sample of households. The long form contains the short-form questions that are asked of all households and additional questions. The added questions include total income and income by type from seven different sources (e.g., wages, Social Security) for the previous calendar year for each household member aged 15 or older. Both the short-form and long-form census questions are mandatory. Design The sample design for the 2000 census long form was somewhat modified from that used in the 1990 census. In 1990 the overall sampling rate was about 1 in 6, producing a sample of about 15.7 million occupied housing units. Variable sampling rates were used to provide somewhat more reliable estimates for small areas and to decrease respondent burden in more densely populated areas. Specifically, the sampling rate was 1 in 2 housing units for governmental areas with an estimated 1988 population of fewer than 2,500 people. For other areas, the sampling rate was

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond 1 in 6 housing units in census tracts and block numbering areas with a precensus housing count of fewer than 2,000 housing units (fewer than about 5,200 people) and 1 in 8 housing units in larger census tracts and block numbering areas. The definition of areas for the 1-in-2 sampling rate included counties, towns, and townships, but not school districts (unless they happened to be coterminous with another governmental area). In 2000 the overall sampling rate was also about 1 in 6, producing a sample of about 18 million housing units, but the variable rates were somewhat different from 1990. In 2000 the sampling rate was 1 in 2 for governmental areas with fewer than 800 housing units (fewer than about 2,100 people); 1 in 4 for governmental areas with 800-1,200 housing units (about 2,100-3,100 people); 1 in 6 for census tracts with fewer than 2,000 housing units (fewer than about 5,200 people); and 1 in 8 in larger census tracts. This design adds one more sampling rate, so that governmental areas with populations only slightly larger than areas with a 1-in-2 sampling rate will have a smaller increase in the proportional sampling error of their estimates compared with the 1990 sample design. For determining sampling rates in 2000, governmental areas were defined to include school districts in addition to counties, towns, and townships. Data Collection Data collection in the census is mainly by self-enumeration: a respondent for each household fills out a questionnaire received in the mail. Enumerators follow up those households that fail to return a questionnaire and collect the information through direct interviews. The follow-up enumerators are usually temporary workers who are given limited training. Residence Rules Residence rules for reporting household members in the census are that people who “usually” live at a residence should be reported and that people who are temporarily visiting should be excluded, unless they have no other permanent home. The usual residence for college students is their college residence and not their home residence; similarly, the usual residence for people who work away from home is their workplace residence if they live there most of the time. The usual residence for people with two homes is their permanent residence and not their vacation home.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond This page in the original is blank.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond TABLE 4-1 Key Features of Major Household Surveys Feature 2000 Census Long Form American Community Survey March Current Population Survey Survey of Income and Program Participation Type of Survey, Frequency Mandatory survey, part of census every 10 years since 1960 Mandatory monthly survey, tested in 4 sites in 1996, 8 sites 1997-1998, 31 sites 1999-2001, national survey 2000-2002, full implementation planned beginning in 2003 Voluntary monthly labor force participation survey, begun in 1940s; income supplement every March Voluntary panel survey: each of 1984-1993 panels covered about 2.5 years; 1996 panel covered 4 years; 2000 panel to cover 1 year; 2001 panel to cover 3 years Income Data Total income; income from seven sources for previous calendar year Total income; income from seven sources for previous 12 months Detailed questions on about 28 sources for previous calendar year Detailed questions on about 65 sources for each month or for 4-month period preceding interview Sample Size and Design Systematic sample of household addresses and residents of group quarters: average sampling rate of 1-in-6; rates of 1-in-2 or 1-in-4 for small governmental units and 1-in 8 for large census tracts; total sample size about 18 million housing units Similar design to 2000 census long form; planned sample size before nonresponse of 3 million housing units (including vacant units) per year; design alternatives being considered that would oversample rural and hard-to-enumerate areas Clustered sample of household addresses with state-representative design: addresses are in the sample for 4 months, out for 8 months, and in again for 4 months; total sample size of 50,000 occupied households plus 2,500 Hispanic households interviewed in previous November Clustered sample of household addresses: original sample of occupied households was 12,500-23,500 for 1984-1993 panels; 37,000 for 1996 panel, with oversampling of low-income households; 11,000 for 2000 panel; 37,000 planned for 2001 panel Data Collection Mode Mail survey, personal follow-up for nonresponse Mail survey, telephone follow-up, and then personal follow-up for one-third of mail and phone nonrespondents 1st and 5th interviews in person; other six interviews by phone 1st, 2nd, and one interview in each subsequent year of a panel in person; other interviews by phone Residence Rules Usual residence; college students in dorms counted at college location “Current” or 2-month residence rule Usual residence; college students in dorms counted at parents' address Similar to CPS; members of originally sampled households followed for life of panel Response Rates 1990 mail response rate 74% for occupied households; net undercount of 1.8% after follow-up; 19% of aggregate income imputed Mail response rate 61% in 4 test sites, plus 8% from phone follow-up, plus 9% from one-third follow-up of remaining nonrespondents, for weighted response rate of more than 95%; item response may be better than census, but not coverage 94-95% households respond, but some do not respond to income supplement or for all household members; coverage estimated at 92% of census; 20% of aggregate income imputed 91-95% households respond to 1st wave, but sample attrition occurs; cumulative response only 69% by wave 8 of 1996 panel; coverage similar to CPS; 11% of aggregate income imputed

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond Publication Long-form data planned to be released in 2002; planned to be controlled to short-form data adjusted for undercount; data are published for such small areas as census tracts Annual reports planned of 12-month averages for areas with 65,000 or more people, 5-year averages for areas with fewer than 20,000 people; goal is to publish 6 months after data collection Income and poverty data published for nation and population groups 6 months after data collection; limited data published for states on basis of 3-year averages No regular publication series; special reports published for nation and population groups; historically 1-2 year (or more) lag from data collection to publication Proposed Changes Long form may not be included in 2010 or later censuses May replace census long form Recently received funding to expand sample size for state estimates of low-income children not covered by health insurance Funding being requested to expand sample size and number of panels and for state-representative design Response Rates Household response rates to the census mailout have declined between 1970, when mailout-mailback techniques were first used, and 1990. In 1990 approximately 74 percent of U.S. households returned their questionnaires with some or all of the requested information; the response rate for households receiving long forms was somewhat lower (70%) than that for households receiving short forms (75%). Data from the balance of the population were obtained by personal interviews (National Research Council, 1995b:189-190). As in all censuses, some people were uncounted in 1990, and there were also duplications and other erroneous enumerations. The net undercount in 1990 (gross undercount minus gross overcount) was estimated at 1.8 percent for the total population, but there were substantial differences among population groups. For example, the net undercount was estimated at 5.7 percent for blacks and 1.3 percent for nonblacks. The net undercount also varied significantly by age: almost two-thirds of the estimated omitted population consisted of children under age 10 and men aged 25-39 (Robinson et al., 1993:13). The undercount was higher in large cities than in other areas, and it was disproportionately concentrated in the inner areas of those cities. It is likely that undercount rates were higher for lower income groups. Item nonresponse rates in 1990 were generally higher for income than for most other items. When household income information is missing, the Census Bureau uses statistical techniques to impute it on the basis of

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond nearby households with similar characteristics. On average, 19 percent of aggregate household income was imputed for 1990 (National Research Council, 1995b:387). Publication Processing and release of the long-form sample data occur later than for the short-form, and long-form estimates on such characteristics as age, race, and sex are controlled to match the corresponding estimates from the short form for various levels of geography. For 2000, the long-form data are planned to be controlled to short-form data that have been corrected for measured population undercount. Long-form data, including income and poverty estimates, are provided for areas as small as census tracts, school districts, and block groups. Typically, long-form data products are released beginning in year 2 and continuing through year 3 after the census year. American Community Survey The American Community Survey is planned to be a large-scale, continuing monthly sample survey of housing units in the United States, conducted primarily by mail. Its content will be similar to that of the decennial census long-form sample, including questions that permit constructing income and poverty estimates for households in small areas. The income questions ask about total income and income from seven

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond different sources for the 12 months preceding the interview month. It is planned that the ACS will be mandatory, like the census, rather than a voluntary survey (although some or all of the ACS questions could be made voluntary in the future). If the ACS is successfully implemented, there will likely be no long form in the 2010 and subsequent censuses. Development and Design The ACS was tested in four sites in 1996 and in eight sites in 1997-1998. Beginning in 1999 and extending through 2001, the ACS will be conducted in 31 sites, chosen to facilitate comparison with the 2000 census long-form data for census tracts and other areas. In 25 of the 31 sites, about 0.4 percent of housing units are being sampled each month, which will generate a sample of about 5 percent of housing units for each of the 3 years, or 15 percent for the 3-year period. In the other 6 sites, for budgetary reasons, the 3-year sample will be about 9 percent in 5 of the sites and 3 percent in 1 site. For each year from 2000 to 2002, a nationwide survey, using the ACS questionnaire, will sample about 700,000 housing units, using a clustered sample design. Beginning in 2003, the full ACS sample will be 250,000 housing units each month throughout the decade, for an annual sample size of about 3 million housing units spread across all counties in the nation. Over a 5-year period, the addresses selected for the ACS sample will cumulate to about 15 million housing units, similar to but somewhat smaller than the expected 2000 census long-form sample size of about 18 million housing units. Some of the ACS sample housing units will be vacant, and the sample size that is available for analysis will be further reduced by the ACS data collection procedures (see below). Each month's ACS sample will be drawn from the Census Bureau's Master Address File (MAF) for the entire nation. The MAF is a comprehensive residential address list developed for the 2000 census that the Census Bureau intends to update on a continual basis following the census (see Chapter 5). The current design calls for the ACS to use a sample design similar to that of the 2000 census long form, with higher sampling rates for small governmental units (including school districts) and lower sampling rates for large census tracts. The sampling rates would be applied by systematic sampling from the MAF. Some alternative sampling rates are being considered for the ACS. One scheme would make sampling rates decline as a smooth function of population size rather than vary by population size categories, until reaching a maximum sampling rate for very small areas. The maximum rate, cumulated over 5 years, could be higher than the highest long-form sampling rate in order to provide more reliable data for rural communities.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond produce indirect estimates of income and poverty for small areas. None of these surveys can provide direct estimates that are of sufficient reliability, quality, and timeliness to replace all of the small-area estimates produced by SAIPE. The 2000 census long-form estimates will be reliable for all states and many counties, in that they will have acceptably low levels of error due to sampling variability, but the census estimates are only available for income year 1999. Also, the census estimates will not be reliable for most subcounty areas, such as most school districts, even for income year 1999. One-year average estimates from the monthly ACS, once it is fully implemented, will be reliable only for states and a small percentage of counties, while 5-year average estimates will be reliable for a larger percentage of counties. However, there will still be a sizable proportion of counties and many smaller areas for which the estimates will have low reliability. Also, 5-year average estimates will not begin to be available until very late in this decade, and they could be viewed as problematic for some program uses because they will reflect changes in income and poverty with a considerable lag. For example, two areas may have the same 5-year average poverty rate, but one area may have a sharply increasing poverty rate over the period and the other area a sharply decreasing poverty rate.13 Moreover, the quality of the ACS income and poverty estimates has yet to be established. Consequently, using the ACS to provide direct estimates for the SAIPE Program, except for states, does not seem warranted absent considerable evaluation work. The March CPS provides high-quality annual estimates, but it does not currently provide reliable direct estimates for any subnational areas, except for the very largest states. However, the CPS may provide reliable state estimates in the future, given the recent appropriation to adjust the sample size and design to provide reliable state estimates of low-income children who lack health insurance coverage. The estimates from SIPP at present are neither reliable for any subnational area nor available on a 13   For fund allocation, the use of 5-year averages would gradually shift funds from areas with declining poverty rates to areas with increasing poverty rates, which could be viewed as beneficial if localities value stability of funding more than faster response to changing levels of need (see Chapter 6; see also Waksberg, Levine, and Kalton, 1999). Also, 5-year averages from the ACS could be preferable to the currently available SAIPE model-based estimates because the ACS estimates could likely be produced on a faster time schedule and so use more current data. For example, it should be possible in late 2010 to produce 5-year average estimates for counties from ACS data for 2005-2009. In contrast, it is likely that estimates released in late 2010 from the current county model would be based on 3-year average data from the CPS for 2006-2008 because of the lags in obtaining administrative data for the model (see Chapter 3).

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond timely basis. Thus, we conclude that some type of modeling must be used for most SAIPE estimates for the foreseeable future, which may involve using one or more or all of the available surveys—2000 census long form, ACS, March CPS, and SIPP. Measurement Research Although the 2000 census, ACS, CPS, and SIPP currently measure the same concepts of income and poverty, differences in their measurements can be expected due to the many differences in their design and operation. Detailed understanding of measurement differences is essential to determine the best ways to use the data from these surveys in the SAIPE Program. To date, only limited data and information are available for this purpose. As part of a measurement research program, we urge the Census Bureau to conduct a planned exact match of the March 2000 CPS and the 2000 census long-form sample (exact CPS-census matches were performed for the 1950-1980 censuses). The Census Bureau should also conduct an exact match of the 1996 SIPP panel, for which the last year of interviews covers 1999 income, with the 2000 census.14 The Census Bureau should also carry out a planned set of aggregate comparisons between the 2000 ACS and the 2000 census. An exact ACS-census match for 2000 will not be possible because of a decision not to send long-form questionnaires to any of the ACS households in the sample around the time of the census, in order to minimize respondent burden and confusion between the two surveys. However, a planned exact match of the ACS with the census short-form may help evaluate within-household population coverage in the census and ACS and should be carried out. Another useful set of comparisons would be exact matches of the 2000 census, 2000 ACS, 2000 March CPS, and 1996 SIPP with Internal Revenue Service (IRS) tax return records for 1999.15 Census-IRS, CPS-IRS, and SIPP-IRS matches have been performed in the past (see, e.g., Childers and Hogan, 1984; Coder, 1991, 1992; David et al., 1986). Such matches for income year 1999 could provide valuable information not only for comparing income reports among the household surveys as they relate to the IRS records, but also for assessing the performance of IRS 14   A SIPP-census match might be restricted to SIPP rotation groups that were interviewed close to census day.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond data in small-area estimation models. One issue that could be addressed, for example, is the extent to which the IRS records cover the low-income population (see further discussion in Chapter 5). For matching purposes, it will be important to include 1999 tax returns that were filed late as well as returns that were filed on time. The Census Bureau should explore ways to make the exact matches of census, IRS, and household survey data available to the research community–for example, by providing access to the files at the secure research centers that the Bureau has established in cooperation with several universities around the country. The availability of such files, with appropriate safeguards to protect the confidentiality of individual responses, would likely stimulate research on measurement error and modeling that would be beneficial to the SAIPE Program. Role of the ACS County Models Careful evaluation of the strengths and weaknesses of the ACS, including in-depth comparisons with other surveys, will be needed to determine the best strategy for using ACS data for SAIPE estimates.16 For states, it appears possible to use direct estimates from the ACS, averaged over a year, once the survey has been fully implemented. For counties, our necessarily preliminary review of reliability, timing, and response quality issues suggests that two possible uses of ACS data merit serious consideration. Both uses, for which the Census Bureau should begin research and development now, involve indirect rather than direct estimation. One approach is for the Census Bureau to continue to base county estimates on statistical models for which the March CPS estimates form the dependent variable and ACS estimates are used as one of the predictor variables, along with the other variables that are currently in the models. (For the school-age poverty model, these variables include IRS tax return data, food stamp data, census data, and population estimates.) For this purpose, the ACS estimates could be averaged over the same 3 years as the CPS estimates to make them consistent for the time period covered. This averaging would also reduce the sampling variability of the ACS estimates, which could improve the predictive power of the ACS variable in the models. It could also be possible to use ACS estimates for several years in a time-series or multivariate modeling approach (see Chapter 3). 15   The Census Bureau obtains limited tax return information each year from the IRS, such as wages and salaries and adjusted gross income, for research and estimation purposes (see Chapter 5). 16   See also National Research Council (2000b) for discussion of issues in using the ACS.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond Continuing to base the county models on the CPS, which is the official source of poverty statistics, could be advantageous because the CPS can be expected to have less bias in the measurement of annual income and poverty than the ACS. However, CPS-based models for income and poverty estimates for counties do have some limitations. Even when 3-year averages are used, the sampling variability of the CPS county estimates is high, so that very few counties receive a significant weight on the direct estimates when they are combined with the model estimates in the estimation procedure (see National Research Council, 2000c). Also, many counties are excluded from the modeling because they have no sample households (due to the clustered sample design), or, in the case of poverty estimates, no poor households (or no poor households with school-age children) in the sample. In contrast, the ACS uses an unclustered design with sample households in every county each month. Hence, a second strategy to investigate is to construct statistical models for county income and poverty estimates in which the dependent variable is taken from the ACS estimates. An issue for evaluation is whether the dependent variable is best constructed as an annual average of 12 monthly samples centered on the calendar year (i.e., using months from July of year t to June of year t + 1) with appropriate inflation adjustments, or as an average of, say, 24 or 36 monthly samples centered on the calendar year. In either case, there would be less reliance on the models, compared with the CPS-based models, because ACS direct estimates would be available for all (or almost all) counties. The use of 2-year or 3-year average ACS estimates would place more weight on the direct estimates when they are combined with the model estimates than if average annual estimates, which have greater sampling variability, were used.17 Given the likely measurement biases for ACS income and poverty estimates, estimates from the ACS-based county models could perhaps be improved by calibrating them in some way to selected estimates from the March CPS. For example, counties could be grouped into broad categories on such dimensions as race, ethnicity, and geographic region, and raking factors could be developed that would achieve consistency between the ACS model-based estimates for each county group and the corresponding March CPS estimates. For this purpose, the CPS estimates could be based on weighted 3-year averages in order to reduce their sampling variability. Alternatively, calibration could be achieved by a bivariate model in which ACS and CPS estimates form the dependent variables in two linked equations (see Chapter 3). 17   However, average annual estimates may have less bias than 2-year or 3-year average estimates.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond If a calibration procedure is adopted, it should then be applied to ACS estimates for states as well as counties, so as to achieve a consistent measurement standard for the direct state estimates and the model-based county estimates. The goal of any calibration procedure would be to reduce the mean square error of adjusted ACS estimates by taking advantage of the lower variance of the ACS data and the presumed lower bias of the March CPS data. If SIPP becomes the preferred source for national estimates of poverty (see “A Revised Poverty Measure,” below), there would be reason to calibrate the ACS estimates to the SIPP estimates and not to the March CPS estimates. Substantial research and development could be required to develop an appropriate calibration approach in either case. School District Estimates A possible role for the ACS (once it is fully implemented) that could improve the SAIPE estimates for school districts and other subcounty areas is to use ACS data to form within-county shares to apply to updated county poverty estimates.18 The advantage of this approach, in comparison with the current estimation procedure in which the most recent census data are used to develop within-county shares to apply updated county model estimates, is that the ACS estimates will be more current. Also, if the ACS estimates of shares are applied to estimates from an ACS-based county model, the two sets of estimates would reflect the same measurement standard. However, the ACS estimates of shares will exhibit higher sampling variability than the census estimates of shares, particularly if the ACS estimates are averaged over, say, a 3-year rather than a 5-year period. For use in a shares model, statistical smoothing of the ACS estimates for subcounty areas within counties should be investigated to reduce their sampling variability (see Chapter 3).19 Another possibility for investigation is whether the ACS estimates could be combined in some way with 2000 census estimates to form within-county shares. If in the future it 18   The shares would be each subcounty area's proportion of the total number of poor school-age children (or other population group) in the county. 19   Whether smoothing county and subcounty estimates in order to reduce the mean square error of the latter would be successful with the ACS is not clear, given the sizable sampling variability of the ACS estimates for many counties. The other techniques suggested in Chapter 3 for reducing the variability of census long-form estimates for subcounty areas, which involve using short-form and long-form data in a simple or stratified ratio adjustment, are not applicable to the ACS. There is no ACS short form that is to be completed for all households.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond proves feasible to assign IRS tax return data to subcounty areas (see Chapter 5), then it might be possible to combine ACS estimates and IRS data for this purpose. For the greatest improvement in subcounty estimates of income and poverty, it will likely be necessary to develop a statistical regression model for these areas that makes use of administrative data for predictor variables (see Chapter 5). However, development of appropriate administrative data is a long-range effort, so the Census Bureau should pursue the alternative of using ACS estimates, perhaps together with 2000 census estimates, in a within-county shares model. ACS Funding The ACS has the potential to play a major role in the SAIPE Program because of its large sample size and continuous operation. To do so, the ACS has to have consistent levels of sufficient funding over the next decade for the planned sample sizes. Reductions in funding would likely lead to reduced sample sizes and other discontinuities in the data that could jeopardize the usefulness of the ACS for SAIPE and make it difficult to evaluate how effective the ACS could be for SAIPE if carried out as now planned. More generally, if the ACS does not receive consistent funding, it will be difficult to properly assess its potential for small-area estimation for such important purposes as fund allocation and program evaluation. Role of 2000 Census Long Form Models Estimates from the 1990 census long form are used-in somewhat different ways–as predictor variables in the current SAIPE state and county regression models, and these variables contribute importantly to the models (see National Research Council, 2000c:Ch.6). It makes sense to plan for a similar role for estimates from the 2000 census long form and perhaps to include predictor variables from both the 1990 and 2000 censuses for a time.20 Long-form census estimates may prove to be less effective predictors in the models for some years than others because of different economic 20   Planning to use the 2000 census as a predictor variable in models for states as well as counties is necessary, given that it will not be feasible to use ACS direct state estimates at least until data are available for income reference year 2003 and the quality of the data has been determined.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond conditions. Economic changes may occur immediately following a census as well as later in a decade. This problem is naturally handled in a modeling framework, given that the model is refitted for each estimation year. For smaller areas, such as school districts and other subcounty areas, it is likely that income and poverty estimates from the 2000 census will exhibit high sampling variability even with oversampling of small governmental units and the use of the most effective procedures to reduce variance. The census estimates will also be available for only one year. However, it will be necessary to use the 2000 long-form estimates to form within-county shares to apply to updated county estimates until it becomes possible to use the ACS for this purpose or until it becomes possible to develop a subcounty model similar to the state and county models. The development of such a model depends on obtaining appropriate subcounty administrative records data. If such data can be developed for use in a subcounty model, then the 2000 census estimates are a likely candidate to serve as one of the predictor variables. The sampling variability in the census estimates would weaken the predictive power of the census variable, but the model would produce unbiased predictions. Direct Estimates for 1999 While it clearly makes sense to plan to use 2000 long-form estimates as predictor variables in SAIPE state and county regression models and, for the time being, in a county shares model for school districts and other subcounty areas, it is far from clear what use, if any, to make of the direct long-form estimates for income year 1999. On the one hand, direct estimates will be reliable for states and many counties, and they will have considerable face validity for users, so that not to use these estimates for income year 1999 seems problematic. However, their use would likely produce anomalies in the time series of estimates because the standard of measurement provided by the census direct estimates would not likely be the same as that underlying the estimates produced for prior years from the SAIPE CPS-based models nor that underlying the estimates produced for subsequent years from another model (e.g., one based on the ACS or CPS or either of these two surveys adjusted to SIPP controls). Moreover, the census estimates for 1999 will not be reliable for many counties and most subcounty areas, and they may not be available in time to meet the Census Bureau 's SAIPE schedule, which calls for 1999 estimates to be delivered to users by fall 2002 (although the census estimates could be used to produce revised 1999 SAIPE estimates when they become available). We do not believe there is a clearly preferred answer for whether and

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond how to use 2000 long-form direct estimates for SAIPE for income year 1999. We urge the Census Bureau to consider several options, which include: using the direct long-form estimates (either for release on the SAIPE schedule or for later release); using the long-form estimates with a ratio adjustment to short-form data to reduce the sampling variability of the estimates (see Chapter 3); using the long-form estimates with a calibration to CPS aggregate estimates; not using the direct long-form estimates but, instead, using the current SAIPE models to produce indirect estimates for income year 1999. The Bureau should convene a meeting of key users to discuss these options so that the basis for the Bureau's decision is well understood. A Revised Poverty Measure U.S. poverty statistics for the total population and population groups are currently based on a measure in which annual before-tax money income for a family or unrelated individual is estimated from the March CPS and compared with the applicable poverty threshold for the family size. A report of the National Research Council (1995a) concluded that the current measure is not adequate to inform public policy and recommended that it be replaced with a revised measure, in which disposable after-tax money and near-money income would be estimated from SIPP and compared with an appropriate poverty threshold (see also Betson, Citro, and Michael, 2000). An earlier National Research Council report (1993) also recommended that SIPP become the basis for measuring poverty. The revised poverty measure would differ from the current measure in how the thresholds are developed, updated, and adjusted for different size families and areas of the country. The revised measure would also differ in how family resources are measured from survey data. Starting with gross money income, as in the current measure, the revised measure would add the value of near-money in-kind benefits (e.g., food stamps, subsidized housing, school lunch, energy assistance), and subtract the following items: payroll taxes; net federal and state income taxes (for some recipients of the earned income tax credit, a positive amount would be added to income); expenses necessary for work, including work-related transportation and child care costs; child support payments to another family; and out-of-pocket medical expenditures. Whether some or all of the recommendations in the 1995 report will be adopted is not known at this time. A report of the U.S. Census Bureau (1999c) illustrated the use of the revised poverty measure with March CPS data for 1990-1997, and the Bureau plans to regularly release revised estimates (labeled “experimental”) on its Internet web site at the same time

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond that the official estimates are released each fall. The Bureau is also working to make it possible to implement a revised measure with SIPP by adding some questions and seeking funding to revise the design so that a new panel is introduced each year, which could be used to equalize the bias due to sample attrition across years. Additional funding is also being sought for expanded sample size to support direct estimates for the largest states. Using a revised official poverty measure in SAIPE would mean changing the measurement standard to a measure that is more appropriate for policy purposes–because it takes account of taxes and in-kind transfer programs and other family circumstances that are not reflected in the current measure. Such a change raises issues of implementation. The 2000 census does not ask questions on in-kind benefits or nondiscretionary expenses (e.g., work expenses) that would be needed to calculate family resources under the revised measure. The ACS includes questions on in-kind benefits (food stamps, energy assistance, school lunch, and subsidized housing), but not on nondiscretionary expenses, and it is unlikely that the questionnaire could be expanded to provide all of the elements of the revised definition. In contrast, the revised definition of disposable money and near-money income can fairly readily be calculated from either the March CPS or the SIPP, although imputations for some kinds of expenses needed to calculate disposable money and near-money income are required (more so in CPS than in SIPP). The limitations of the 2000 census and ACS income data constrain but do not preclude the use of these sources in estimating poverty for small areas with a revised measure. Direct estimates could not be obtained from either the 2000 census or the ACS that fully implemented a revised measure. However, if the CPS remains the official source of poverty statistics with a revised measure, then 2000 census or ACS estimates that are based on the current measure could be used as predictor variables in CPS-based regression models that use a revised measure. Poverty estimates reflecting the current measure from the 2000 census or the ACS could also be used to form within-county shares for subcounty areas to apply to updated county poverty estimates developed from CPS-based models for which the dependent variable reflected a revised measure.21 Finally, if ACS estimates are calibrated in some manner to March CPS estimates of poverty developed with a revised measure, then the calibrated ACS estimates could be used as a dependent variable in regression 21   This use of census long-form or ACS estimates of shares would require the assumption that the distribution of poverty within counties is similar under the current and revised poverty measures.

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond models. If SIPP replaces the CPS as the official source of poverty statistics, then such calibrations should be implemented with that survey instead of the CPS. Conclusions and Recommendations 4-1 The Census Bureau's Small Area Income and Poverty Estimates (SAIPE) Program must continue to rely primarily on models for updated income and poverty estimates for small areas. None of the existing or planned surveys can produce direct estimates of sufficient reliability, timeliness, and quality to provide all of the SAIPE income and poverty estimates. 4-2 To inform decisions about the use of the 2000 census long form, American Community Survey, CPS March Income Supplement, and Survey of Income and Program Participation for SAIPE, the Census Bureau should conduct research to understand and document the differences in their measurement of income and poverty. For this purpose, the Census Bureau should conduct a series of exact matches and analyses: the planned exact match of the March 2000 CPS and the 2000 census long form; an exact match of interviews from the 1996 SIPP panel covering 1999 income and the 2000 census long form; the planned set of aggregate comparisons of income and poverty estimates from the 2000 ACS and the 2000 census long form; an exact match of the 2000 ACS and the 2000 census short form to examine differences in measurement of household composition and demographic characteristics that relate to income and poverty; and exact matches of Internal Revenue Service tax returns for income year 1999 with the 2000 census long form, 2000 ACS, March 2000 CPS, and 1996 SIPP. 4-3 Research and development by the Census Bureau should begin now to explore two possible uses of ACS estimates in SAIPE models for counties: to form one of the predictor variables in statistical models for which the March CPS continues to provide the dependent variable and to serve as the dependent variable in county models. For the latter use, the ACS estimates might possibly be calibrated in some way to selected estimates from the March CPS. 4-4 The Census Bureau should conduct research on using ACS estimates for school districts and other subcounty areas, possibly combined

OCR for page 82
Small-Area Income and Poverty Estimates: Priorities for 2000 and Beyond with 2000 census estimates, to form within-county shares or proportions to apply to updated county model poverty estimates. 4-5 If the ACS is to fulfill its potential to play a major role in the SAIPE Program, it is important that the survey have sufficient funding for planned sample sizes over the next decade. Reductions in funding could jeopardize its usefulness for SAIPE and, more generally, make it difficult to properly assess the potential uses of ACS data in small-area estimation. 4-6 The Census Bureau should plan to use 2000 census long-form estimates to form one of the predictor variables in the SAIPE state and county models. 4-7 For SAIPE estimates for income year 1999, it may be possible to use the direct estimates from the 2000 census long form, but whether this is feasible or desirable is not clear. The Census Bureau should consider the available options and discuss them fully with users. 4-8 If the recommendations of the National Research Council for changes in the official measure of poverty are adopted, the Census Bureau will need to consider the implications for the SAIPE Program. In particular, it may become feasible and desirable to use estimates from the Survey of Income and Program Participation for calibration purposes.