5
The Weighting of ACS 1-Year Period Estimates

As described in earlier chapters, the American Community Survey (ACS) comprises a time series of monthly samples of housing units selected each year from the Master Address File (MAF). The Census Bureau accumulates sets of monthly samples to produce 1-year, 3-year, and 5-year estimates, based on calendar years; 1-year estimates are produced only for areas with populations of 65,000 or more, 3-year estimates are produced for areas with populations of 20,000 or more, and 5-year estimates are produced for all areas (refer back to Table 2-5). This chapter presents a description and evaluation of the Census Bureau’s weighting methods for producing 1-year estimates. Chapter 6 examines the weighting methods used for producing 3-year and 5-year estimates. These chapters provide a more detailed examination of the ACS weighting procedures than earlier chapters and are intended primarily for survey methodologists.

5-A
OVERVIEW

As described in Chapter 2, the data collection for the ACS sample selected for a given month is spread over 3 months: mail responses are collected in the first month; computer-assisted telephone interviewing (CATI) responses are collected in the second month; and in the third month, computer-assisted personal interviewing (CAPI) responses are collected from a subsample of the housing units that have not yet responded. For purposes of analysis, the Census Bureau classifies a monthly sample as the sample units resolved in that month (the tabulation month), not the sample selected



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 184
Using the American Community Survey: Benefits and Challenges 5 The Weighting of ACS 1-Year Period Estimates As described in earlier chapters, the American Community Survey (ACS) comprises a time series of monthly samples of housing units selected each year from the Master Address File (MAF). The Census Bureau accumulates sets of monthly samples to produce 1-year, 3-year, and 5-year estimates, based on calendar years; 1-year estimates are produced only for areas with populations of 65,000 or more, 3-year estimates are produced for areas with populations of 20,000 or more, and 5-year estimates are produced for all areas (refer back to Table 2-5). This chapter presents a description and evaluation of the Census Bureau’s weighting methods for producing 1-year estimates. Chapter 6 examines the weighting methods used for producing 3-year and 5-year estimates. These chapters provide a more detailed examination of the ACS weighting procedures than earlier chapters and are intended primarily for survey methodologists. 5-A OVERVIEW As described in Chapter 2, the data collection for the ACS sample selected for a given month is spread over 3 months: mail responses are collected in the first month; computer-assisted telephone interviewing (CATI) responses are collected in the second month; and in the third month, computer-assisted personal interviewing (CAPI) responses are collected from a subsample of the housing units that have not yet responded. For purposes of analysis, the Census Bureau classifies a monthly sample as the sample units resolved in that month (the tabulation month), not the sample selected

OCR for page 184
Using the American Community Survey: Benefits and Challenges for that month, in order to make all the data for each monthly sample relate to the same time period. The units resolved in a given tabulation month comprise the mail, CATI, and CAPI responses received in that month and also the units determined in that month to be final nonresponding households, vacant housing units, and ineligible units. This procedure can be viewed as a form of nonresponse “replacement procedure” (Kish and Hess, 1959), in which sampled units resolved in the given month that were selected for prior months are treated as replacements for units selected for the given month that were resolved in later months. Given this definition of the monthly samples, all the data used for analysis for a 1-year or multiyear period are collected during the specified calendar year or years. (An attraction of using tabulation months is that data collection is completed at the end of the year; if the monthly samples were defined in terms of sample months, it would be necessary to wait until the following February before all the data were collected for a given year.) Survey sampling weighting methods are applied to the respondents for the given period in order that valid estimates can be produced. These methods include weights to compensate for unequal selection probabilities, weighting adjustments for nonresponse, and calibration adjustments that compensate for noncoverage and can improve the precision of some survey estimates. Separate sets of weights are developed for person-level and housing unit-level analyses. The Census Bureau has developed a nine-step weighting process for each 1-year data file, as summarized in Box 5-1. This box and the chapter text apply only to the weighting process for the housing unit population; see Box 5-2 for a brief description of the weighting process for the group quarters population. Step 1 in Box 5-1 is the standard inverse selection probability weighting: if, say, a housing unit is selected with a probability of 1 in 10, the unit is assigned a base weight of 10, since it represents 10 housing units in the population. Subsequent steps adjust the base weights to compensate for deficiencies in the sample and to improve the precision of some estimates. These adjustments are performed within “estimation areas,” which are single larger counties or combinations of smaller counties (the nonresponse adjustments in step 3 are carried out at the tract level; see below). Steps 1 to 5 are adjustments made to the housing unit weights. The weights resulting from steps 1 to 5 apply to the household and all persons in it. Step 6 is an adjustment that is applied at the person level, leading to different weights for persons in the same household, and a revised household weight is developed in step 7. The last two steps are final adjustments to the weights. Section 5-B describes these nine steps in more detail, and Sections 5-C and 5-D examine steps 5 and 6 more carefully. The calibration of the

OCR for page 184
Using the American Community Survey: Benefits and Challenges BOX 5-1 The Nine-Step Weighting Process for Housing Units and Household Members in 1-Year ACS Data Files Base weights. The base weights are the inverses of selection probabilities, including an allowance for the CAPI subsampling, computed for all selected housing units. Variation in monthly response factor. This factor is associated with the “replacement procedure.” It compensates for variations in the number of sample cases resolved across months. Noninterview factors 1 and 2. These factors adjust for housing unit nonresponse. Mode bias noninterview factor. This factor aims to compensate for the fact that the noninterview factors are applied to all responding households, not the households responding by CAPI. Housing unit control factor 1. The weights developed up to step 4 are adjusted to make the weighted total of the number of housing units in an estimation area conform to an independent housing unit estimate obtained by updating counts from the last census. Population control factor. The person weights are adjusted to make the weighted person counts for major demographic subgroups in an estimation area conform to independent population subgroup estimates obtained by updating counts from the last census. Housing unit control factor 2. To obtain a household weight, the weight of the principal person is assigned to the household, and the housing unit weights are recalibrated to conform to the independent housing unit estimates. Adjustments to eliminate extreme weights. If some weighting adjustments to the base weights exceed a factor of 8 in an estimation area, the weighting adjustment process is revised to eliminate such large weights. Rounding of weights. All weights are rounded to be integers. SOURCE: U.S. Census Bureau (2006:Ch. 11). weights to make the sample conform to independent housing unit estimates (step 5) and to independent population estimates (step 6) raises a number of issues that require special attention. 5-B THE 1-YEAR NINE-STEP WEIGHTING PROCESS The Census Bureau’s weighting scheme for 1-year estimation starts with the standard base weights that are inverses of selection probabilities, and then makes adjustments to those weights to compensate for sample deficiencies. The adjustments are made within estimation areas, which are individual larger counties or groups of smaller counties; county size is

OCR for page 184
Using the American Community Survey: Benefits and Challenges BOX 5-2 The Weighting Process for Residents of Group Quarters (GQ) in the 2006 ACS Base weights. The base weights are the inverses of selection probabilities, computed for all selected GQ residents. This weight is 40 in most instances. When a GQ facility has more people than expected, a subsample of residents is selected so that only 10 people are eligible for interview. The base weights of these people equal 40 times the inverse of the subsampling factor (see Section 4-C for a description of the GQ sampling procedures). Noninterview factor. A single factor is used in which the noninterview adjustment cells are defined by combinations of GQ types, as determined by research. Each cell must contain at least 10 people to be retained as a separate cell for the adjustment. The noninterview adjustment is carried out by cell within each state. Population control factor. The GQ person weights are adjusted to make the weighted GQ person counts in a state conform to independent population estimates developed in the Census Bureau’s postcensal population estimates program by major GQ types. These estimates start with the 2000 census state counts of GQ residents by GQ type and are updated from information provided by state partners in the Federal State Cooperative Program for Population Estimates, the Defense Department, and other agencies (http://www.census.gov/popest/topics/methodology/2005_co_char_meth.html). SOURCE: U.S. Census Bureau (2006:11-9). defined as the number of people living in housing units in the 2000 census. For weighting purposes the Census Bureau collapsed the 3,141 U.S. counties into 1,951 estimation areas with a minimum population size of about 16,000 persons (there are about 50 estimation areas in Puerto Rico). An important consideration in assessing the individual adjustments is the extent to which they change the weights. In particular, substantial variation in the weighting adjustments can appreciably affect some ACS estimates, likely reducing bias, but probably also lowering precision. Information on the distribution of the weighting adjustments is provided below for some of the adjustments that were used with the 2004 ACS test survey. 5-B.1 Base Weights Each housing unit is selected from the MAF for the ACS with a probability specified for the block in which it is located. For the 2005 ACS these probabilities range from 1.6 percent (1 in about 63) to 10 percent (1 in 10), depending on the estimate of occupied housing units for the small-

OCR for page 184
Using the American Community Survey: Benefits and Challenges est governmental unit or the census tract in which a block is located (see Table 2-3, Part A). The base weights (step 1) for sampled housing units not subjected to CAPI subsampling are simply the inverses of the MAF selection probabilities. For housing units subject to subsampling, the overall selection probability is the product of the original selection probability from the MAF and the subsampling rate. The subsampling rate is 66.7 percent (2 in 3) for unmailable addresses; it varies between 33.3 percent (1 in 3) and 50 percent (1 in 2) for mailable addresses, with higher subsampling rates for census tracts expected to have lower mail and CATI response rates (see Table 2-3, Part B). The base weights thus vary from about 189 (a housing unit in the CAPI subsample with an initial sampling rate of about 1 in 63 and a subsampling rate of 1 in 3) to 10 (a housing unit not subject to CAPI subsampling that is selected with an initial probability of 1 in 10). The base weights are determined by design decisions and are changed only by changing the design. The variation in the initial selection probabilities resulted from the need to satisfy precision requirements for estimates for governmental units of different sizes. The subsampling rates for CAPI interviews were determined by cost factors and the need to retain adequate sample sizes for census tracts with lower expected mail and CATI response rates. However, as illustrated in Box 4-2, the variation in base weights has the effect of lowering precision for analyses that include households or persons with differing base weights, as compared with an analysis in which the sample size is the same and the weights are constant. 5-B.2 Variation in Monthly Response Factor The first adjustment to the base weights (step 2) arises because of the Census Bureau’s decision to process the ACS monthly samples by the tabulation month in which sampled units are resolved rather than by sample month. The variation in monthly response factor (VMS) is used to correct for the imbalance in the rate of resolving sampled units across months with the aim of producing a sample that is balanced across months of the year. To carry out the adjustment, the sum of the base weights for the units resolved in a given month (including nonresponding, vacant, and ineligible units and including the CAPI subsampling factor) is adjusted to conform to the sum of the base weights (but excluding the CAPI subsampling factor) of all units initially sampled for that month. The adjustment is made within estimation areas by applying the following simple ratio adjustment to the base weight for each resolved unit in the month in question:

OCR for page 184
Using the American Community Survey: Benefits and Challenges To see the effect of the VMS factor, it is necessary first to recall that the initial ACS sample (before CAPI subsampling) is selected each year in two parts (see Section 4-A.1). The main sample for a given year is drawn from the appropriate segment of the MAF existing in August–September of the previous year (main sample MAF), and the supplemental sample is drawn in January–February of the given year from a segment of the new addresses subsequently added to the MAF. (There is no attempt to sample MAF addresses added during the year.) The main sample is allocated evenly across the 12 months of the given year to produce the monthly samples. However, for timing reasons, the data collection for the supplemental sample is restricted to the April–December period. The small supplemental sample is spread evenly across only these 9 months. Consider first just the main sample. Using the base weights before CAPI subsampling, the weighted count of all units originally sampled in a given month in an estimation area is approximately equal to the number of units on the main sample MAF divided by 12 (since the sample comprises one-twelfth of the annual sample). This equivalence also holds approximately when the base weights that include the CAPI subsampling component are applied to all units originally sampled in a given month that are resolved at some time in the 3-month fieldwork period for that month’s sample. However, the weighted count of all units resolved in a month does not necessarily equate to the number of units on the main sample MAF divided by 12 for two reasons: (1) there may be variations in the numbers of mail, CATI, and CAPI cases resolved by month; and (2) the cases resolved by CATI and CAPI in January and by CAPI in February are carryovers from the November and December monthly samples selected from the MAF for the previous year. The VMS factor is introduced to compensate for this lack of equivalence between the resolved cases and the MAF count. The restriction of the supplemental sample to the last 9 months of the year makes the situation more complicated. In the first 3 months, when only the main sample is fielded, the VMS factor aligns the resolved cases with the main sample MAF only. The complications here are that the adjustment does not cover the units on the supplemental frame and that a number of the resolved cases are carryovers from the previous year based on a sample from that year’s MAF. In the last 9 months of the year, the numerator of the VMS represents approximately one-twelfth of the main MAF plus one-ninth of the units on the supplemental frame. It thus exceeds one-twelfth

OCR for page 184
Using the American Community Survey: Benefits and Challenges of the full MAF population. In addition, a number of the cases resolved in April and May were selected only from the main sample MAF. As a consequence of these factors, the VMS adjustment does not provide a fully balanced representation over the months of the year. Although the monthly balance is not fully achieved, the VMS does give the required representation both to units on the main MAF and to units on the supplementary frame. The failure to represent units on the supplementary frame in the first 3 months of the year is compensated by their overrepresentation in the last 9 months. In essence, over the 12 months, the VMS adjustment can be viewed as one in which 1/36 of the supplemental sample units in later months are substituted for the supplemental sample units that were not surveyed in the first 3 months. This “replacement” scheme makes the assumption that the characteristics of the replacement units are the same at the time of data collection as they were earlier on. For some characteristics—for example, occupancy status—that assumption may be questionable. While in most areas the small numbers of units on the supplemental frame make these issues unimportant, the ACS estimates could be noticeably affected in growth areas that have large numbers of new units. In practice, the variation in the VMS factor is not great. Over all months of the 2004 ACS test survey, the value of the VMS factor for the 5th percentile is 0.87 and that for the 95th percentile is 1.24. The effect of this additional variation in the resulting weights on sampling errors is likely to be small. A limitation of the VMS factor is that it can distort the distribution of different types of sampled cases. For example, suppose that a larger-than-average number of CAPI households is resolved in a given month. The global VMS factor compensates for this outcome by downweighting all the resolved housing units in that month, not just the CAPI housing units. To the extent that CAPI housing units have different characteristics from the rest, the monthly estimates will be biased. This limitation could be avoided by more complex adjustment factors that weight each of the types of resolved unit in the tabulation month (mail, CAPI, CATI, nonresponding, vacant, ineligible) separately to conform to the outcomes for the sample for the sample month. However, these factors would have greater variability than the VMS factors, and hence they would inflate sampling errors more. As noted above, the VMS factor is introduced because of the decision to use the tabulation month as the basis for ACS analyses. The concern about the alternative of using the sampling month as the basis of the analyses is that responses provided about characteristics in the following 2 months will not accurately reflect those characteristics in the sample month. Moreover, the use of the sampling month would delay completion of all data collection for a given year.

OCR for page 184
Using the American Community Survey: Benefits and Challenges Accepting the use of the tabulation month for the ACS, the utility of the VMS factor should still be investigated to assess how the current imperfect adjustment performs and whether its use warrants the (admittedly slight) increase in sampling error that it causes. In addition, the utility of this factor should be assessed in relation to the housing unit and population adjustments carried out in steps 5, 6, and 7, which all relate to a single point in time (July 1). When the population size of the area changes over the year, these adjustments are inconsistent with the VMS objective of representing the population across the year. 5-B.3 Noninterview Factors 1 and 2 The next step in the development of the weights (step 3) is to compensate for the fact that some sampled housing units do not respond to the ACS or the data collected for them are too scant to process. These housing units are dropped from the analytic data file, and the weights for responding housing units are inflated to provide representation for them. Since it is assumed that all the nonresponding units have been determined to be occupied, the adjustments are made only to the responding housing units that are occupied or temporarily occupied. Three variables often related to response rates are used in the adjustments: census tract, single-unit versus multiunit structure (building type), and month of data collection. Since the cross-classification of these three variables would create cells with very small sample sizes, the noninterview weighting adjustments are carried out in two stages. Each stage is applied to all of the occupied housing units in the file for a given calendar year. The first stage of the noninterview adjustment (NIF1) is carried out within cells created by the cross-classification of building type and census tract, with census tracts combined if the cells contain too few responding housing units. Within each cell, the weights of responding occupied and temporarily occupied housing units are multiplied by an adjustment factor that makes the sum of the weights for responding housing units equal to the sum of the weights for responding and nonresponding housing units. The adjustments are not large: no adjustment was made for more than half the responding housing units in the 2004 ACS, and 95 percent of the adjustments were less than 1.20. The second stage of the noninterview adjustment (NIF2) starts from the NIF1 adjusted weights and then adjusts them in the same manner within cells created by the cross-classification of building type and tabulation month, combining adjacent months if the responding sample size is too small. The 5th percentile of the NIF2 adjustments in 2004 was 0.99, and the 95th percentile was 1.10. The noninterview adjustment process is the first step of a raking algorithm in which the process is iterated until convergence is reached. In view

OCR for page 184
Using the American Community Survey: Benefits and Challenges of the small sizes of the adjustments, no further iterations are performed. With the high weighted response rates achieved in the ACS, the noninterview adjustments should not lead to a substantial loss of precision. 5-B.4 Mode Bias Noninterview Factor A significant drawback to the step 3 noninterview adjustments is that they increase the weights of all mail, CATI, and CAPI responding housing units to represent the nonresponding housing units, whereas the nonresponding housing units are virtually all a subset of the subsampled CAPI housing units. The Census Bureau has chosen to spread the adjustments over all responding occupied and temporarily occupied housing units because of the much smaller responding sample size that would have been available had the adjustment been confined to CAPI housing units. That smaller sample size would have severely limited the extent of control that could have been achieved on the census tract, building type, and month of data collection variables used in the step 3 adjustments. Also, restricting the adjustments to CAPI housing units would have concentrated the adjustments on responding housing units that already had larger weights because of the subsampling. Since CAPI responding housing units likely differ in some characteristics from other responding housing units, however, it seems likely that the step 3 noninterview adjustments create some bias in some estimates. The Census Bureau introduces another adjustment (step 4), the mode bias noninterview factor (MBF), to address the bias concern with the step 3 weighting. The MBF adjustments are generally small, with only 5 percent being 0.96 or less and 5 percent being 1.03 or more, but the combined effects of their use, together with the noninterview adjustment factors in step 3, on the biases and sampling errors of ACS estimates are not transparent. The MBF procedure, in essence, has three steps. The first step is to develop an adjustment factor for the step 2 weights—mode noninterview factor (NIFM)—similar to the adjustment factors NIF1 and NIF2 under step 3 but applied only to CAPI occupied and temporarily occupied housing units. The second step is to produce some survey estimates based on the step 2 weights with these adjustments and also the corresponding estimates with the step 3 adjustments and to calculate the MBF as a ratio of the two quantities. The third step is to multiply the step 3 weights by the MBF so that the estimates with the resultant weights conform to those produced with the NIFM-adjusted weights. In view of the smaller sample size when adjusting only the weights of CAPI respondents, the NIFM adjustments take account of only building type and tabulation month, not census tract, within an estimation area.

OCR for page 184
Using the American Community Survey: Benefits and Challenges When the respondent sample size in a cell of the cross-classification of these two variables is too small, adjacent months are combined. Within each cell, the NIFM is calculated as the ratio of the sum of the weights after step 2 for responding and nonresponding CAPI housing units to the equivalent sum for only the responding CAPI housing units. No adjustments are made to the step 2 weights for non-CAPI, vacant, or ineligible units. No adjustment is made for over 75 percent of housing units, and only 5 percent of the NIFM adjustments are 1.10 or larger. The next step is to calibrate the estimates based on the weights after step 3 to those produced using the NIFM weighting adjustments. The calibration is performed within each estimation area for the cell totals of the cross-classification of tenure (owned, rented, or temporarily occupied), tabulation month, and marital status of the householder (married and widowed or single). When the sample size in a cell is deemed too small, the two marital status cells are combined. Estimates of the cell totals are produced for each cell, and then the mode bias noninterview factor for a cell is computed as the ratio of the estimated cell total using the step 2 NIFM-adjusted weights to the corresponding estimated total using the step 3 adjusted weights. In the final step, the MBF factors are applied to the step 3 weights for all occupied housing units. As noted above, the MBF adjustments are generally small. The effects of the compensation for nonresponding housing units using the combination of steps 3 and 4 are not obvious and need to be carefully assessed. For example, it is not clear that the NIFM weighting adjustments, which are confined to CAPI cases but drop the tract-level control, lead to less biased estimates for the cells in the cross-classification of the control variables, let alone estimates for other variables. Also, since the estimates of the cell totals for the control variables under the weights developed up to this point are equated to those using the NIFM-adjusted weights, their sampling errors are those of the latter estimates. These sampling errors are likely larger than those based on the NIF1 and NIF2 (step 3) adjustments alone, because the NIFM adjustments are applied only to CAPI cases and also because CAPI cases start with higher base weights because of the subsampling. Thus, the effect of the MBF step 4 adjustments, which derive from the calibration of the NIFM weights to the step 3 weights, on estimates of the control variables and on other ACS estimates needs examination. Given the high response rates achieved in the ACS, the nonresponse adjustments have mostly a minor impact. However, for areas with lower response rates, the adjustments may be significant. Research to compare the current adjustments with other, more standard, adjustments is warranted. For example, in some estimation areas, a raking adjustment procedure applied to the marginal totals by census tract, building type, and month of data collection and confined to CAPI cases might be more effective.

OCR for page 184
Using the American Community Survey: Benefits and Challenges 5-B.5 Housing Unit Control Factor 1 After step 4, the sum of the weights for the combination of responding housing units, vacant housing units, and ineligible units in an estimation area over the year is approximately equal to the number of units on the MAF from which the sample was selected. The next step adjusts the weights within each estimation area so that the estimated number of housing units from the ACS conforms to the independent estimate of total housing units for July 1 of the year in question produced by the Census Bureau’s postcensal population estimates (PE) program. This adjustment (step 5) serves to compensate for under- or overcoverage in the MAF. The Census Bureau also argues that this step serves to make the housing unit counts consistent with the PE population controls employed in step 6. There are several issues concerning the use of the housing unit control factor, including the quality of the PE housing unit estimates. These issues are taken up in Section 5-C below. 5-B.6 Population Control Factor After all the preceding adjustments have been applied, each housing unit has a weight, and that weight also applies to all the persons in the housing unit. The weighting adjustment in step 6, however, leads to different weights for persons in the same household. In step 6 the person weights are adjusted so that in each estimation area the weighted sums of persons in certain sex-age-race/ethnicity subgroups in occupied housing units conform to the subgroup estimates produced by the PE program for July 1 in the year in question. The aims of the population control factor adjustment are to compensate for person noncoverage (particularly for person noncoverage within some housing units) and to improve the precision of ACS person-based estimates. Section 5-D below discusses the use of the PE population estimates as controls in the ACS, including issues relating to the quality of these estimates. 5-B.7 Housing Unit Control Factor 2 Step 6 results in variable weights for persons within a household, raising an issue of what weight to assign to a household. The solution adopted is to start by assigning a household the weight of one of its members. The person chosen for this purpose—termed the principal person—is identified as the wife in a household in which both husband and wife are present and otherwise one of the persons who rents or owns the housing unit. The

OCR for page 184
Using the American Community Survey: Benefits and Challenges estimates. The utility of these controls depends on the quality of the postcensal housing unit estimates compared with the quality of the MAF. The Census Bureau has carried out an evaluation of the postcensal housing unit estimates for 2000 based on updating the 1990 census and comparing these estimates to the 2000 census housing unit counts (Devine and Coleman, 2003). Table 5-1 presents the mean absolute percentage errors (MAPEs) for the housing unit county estimates for 2000 obtained in this evaluation by 1990 size of the county. Overall the MAPE was 4.6 percent, but it varied from 1.9 percent for the largest counties to 7.3 percent for the smallest counties. The MAPE also varied by the amount of change in the number of housing units over the decade, with larger values for counties that had grown or declined considerably. The MAPEs were particularly large for small counties that had changed considerably in size (data not shown). In assessing these results in relation to the use of the postcensal housing unit estimates in weighting adjustments in the ACS, two factors need to be borne in mind. First, the evaluation applies to estimates for 2000, 10 years after the 1990 census; as such it may be viewed as a worst-case evaluation. Second, the ACS weighting adjustments are performed in estimation areas that combine small counties, so, again, the above results may overestimate the amount of error. Nevertheless, the postcensal housing unit estimates are likely subject to appreciable error in some types of counties. Some of these errors may be random in nature, but some may be systematically biased upward or downward for certain types of counties. An indication of magnitude of the housing unit control factor for the 2004 ACS is presented in Table 5-2. Most (84 percent) of the control fac- TABLE 5-1 Mean Absolute Percentage Error (MAPE) of April 1, 2000, County Housing Unit Estimates Compared to April 1, 2000, Census Counts, by 1990 Size of County 1990 Housing Unit Count Number of Counties MAPE All Counties 3,141 4.6 0–2,499 units 336 7.3 2,500–4,999 518 5.7 5,000–9,999 749 5.0 10,000–19,999 653 4.4 20,000–49,999 507 3.1 50,000–99,999 178 2.2 100,000 and over 200 1.9 NOTE: The percentage error is calculated for each county as: (the 2000 census housing unit count – the 2000 postcensal housing unit estimate)/the 2000 census housing unit count. The signs of these errors are dropped, and then the absolute errors are averaged across counties. SOURCE: Devine and Coleman (2003).

OCR for page 184
Using the American Community Survey: Benefits and Challenges TABLE 5-2 Distribution of the Housing Unit Control Factor 1 Across Counties in the 2004 ACS Housing Unit Control Factor 1 Number of Counties Percentage of Counties Under 0.90 64 2.0 0.90–0.94 145 4.6 0.95–0.99 1,173 37.3 1.00–1.04 1,463 46.6 1.05–1.09 219 7.0 1.10 and over 77 2.5 Total 3,141 100.0 SOURCE: Based on data provided by the Census Bureau. tors fall between 0.95 and 1.05 and represent minor adjustments. There are, however, a number of counties for which the factors are substantial, with extremes as low as 0.7 and as high as 1.42. More than half of the factors are 1.0 or greater, consistent with a net undercoverage in the MAF in those counties, but the factors are less than 1.0 for 44 percent of counties, suggesting a net overcoverage in the MAF for those counties, under the assumption that the postcensal estimates are accurate. Undercoverage in the MAF arises because new housing units added after the January MAF update are not covered and also because of other housing units that are missed. Overcoverage arises if some housing units are listed more than once in the MAF, or if group quarters addresses are misclassified as housing units.1 Both missed housing units and duplicate or misclassified listings can occur in a county. Under the assumption that the postcensal estimates are accurate, the housing unit control factors in Table 5-2 represent the net effects of overcoverage and undercoverage. As with all weighting adjustments, the housing unit factors are based on certain assumptions that merit review by the Census Bureau. In the case of undercoverage, the factors increase the weights of MAF housing units to represent those not included in the MAF frame under an assumption that the missed housing units are missing at random. This assumption would be false if, for instance, the proportion of vacant units in the postcensal estimates is higher than in the MAF, as it might well be. In the case of overcoverage, the factors decrease the weights of MAF housing units to ad- 1 Another potential source of coverage problems with the MAF is that some housing units are listed in the wrong county, leading to undercoverage in the county in which they should be listed and overcoverage in the county in which they are listed. However, county misclassification seems likely to be rare.

OCR for page 184
Using the American Community Survey: Benefits and Challenges dress the problem of duplication. This procedure brings the MAF number of housing units in line with the postcensal estimates. However, in other regards it depends on an assumption that the listings of some housing units are randomly duplicated. If the MAF was complete and not subject to duplicate listings and if the intercensal estimates were accurate, then the housing unit control factors would all be close to 1, simply adjusting the housing unit counts from the January MAF to the midyear intercensal estimates to allow for new and demolished units in the interim. (In this case the ACS housing units could be weighted to a midyear MAF count.) That many of the housing unit control factors are appreciably larger or smaller than 1 raises concerns about the quality of the MAF or the quality of the postcensal estimates or both. Recent research by the Census Bureau (Reese, 2007) examined differences between the independent housing unit estimates for 2002–2005 and the housing unit addresses on the MAF used for the ACS in these years. The results show an increasing divergence between the two series, with the national MAF count exceeding the housing unit estimate by 2.6 percent in 2002 and rising to 4.0 percent in 2005. These results suggest a failure to completely identify and weed out duplicate, demolished, and nonresidential addresses from the MAF. The differences between the increase in the MAF and the increase in the housing unit estimates varied among counties as a function of county size, with larger differences occurring for counties with larger numbers of housing units. The Census Bureau plans to conduct more research to gain an understanding of large discrepancies between the MAF counts and the postcensal housing unit estimates. It is, for example, possible, that the quality of the MAF differs between urban and rural areas associated with a differential updating of the frame, using the Delivery Sequence File in urban areas and the Community Address Updating System in rural areas (see Sections 4-A.4a and 4-A.4.b). As discussed in Section 4-A, a prime concern for the ACS is the continuous maintenance of a high-quality MAF for use each year throughout the decade and beyond. Weighting adjustments that attempt to compensate for deficiencies are necessarily an imperfect remedy. If this research identifies deficiencies in the MAF sampling frame, then steps should be taken to correct the frame. At present the MAF and the postcensal estimates are developed independently. However, in the panel’s view, they should be integrated to the benefit of both. For example, building permit data could be used to improve the MAF, either collected on an individual permit basis within location or simply using the current aggregates that would indicate areas in which special MAF updating is needed. Similarly, the MAF—and the ACS—could provide valuable information in updating the postcensal estimates. Another issue that needs examination is the level at which the housing

OCR for page 184
Using the American Community Survey: Benefits and Challenges unit controls are applied. At present the controls are applied at the estimation area, but they could alternatively be applied at higher or lower levels. If the postcensal estimates are of high enough quality, they could be applied at the level of the county or a subcounty area, such as a census tract, thereby targeting the adjustments more directly to the areas where they are needed. The Census Bureau has carried out some initial research in this area (Starsinic, 2005), and the panel encourages further research along these lines. Recommendation 5-2: The Census Bureau should evaluate the quality of the postcensal housing unit estimates and the MAF sampling frame in relation to one another. In the light of this evaluation, the Census Bureau should assess the suitability of the current housing unit control factor adjustment and modify it as necessary. The Census Bureau should attempt to identify areas in which improvements can be made to the postcensal housing unit estimates and to the MAF sampling frame. In particular, it should investigate an integrated approach for developing the postcensal housing unit estimates and for continuously updating the MAF that would benefit both and reduce the variability in the housing unit control factor. 5-D POPULATION CONTROLS After the application of the housing unit controls in step 5, each household has a weight that can be used for analysis, and that same weight could be used for each person in the household. Step 6 in the process is an adjustment to the person weights. This adjustment is used to compensate for person noncoverage in sampled households and to reduce the sampling errors for person-level estimates. The adjustments are based on the Census Bureau’s PE subnational resident population estimates by age, sex, race, and Hispanic origin for July 1 each year. For these estimates the resident population in an area is defined using the decennial census “usually resident” rule as distinct from the ACS “2-month resident” rule. As with the PE housing unit estimates, the population estimates start from the 2000 census counts and adjust for estimated changes between April 1, 2000, and July 1 of the year in question. At the outset, the 2000 census population is divided into the household population and the group quarters population. For the 2005 ACS, the population controls are based on only the household population estimates, and only the methodology for developing those estimates will be reviewed here. The Population Estimates and Projections Area of the Population Division at the Census Bureau produces county household population estimates using a cohort-component technique that adjusts the census counts to allow for births, deaths, net international migration, net domestic migration, and

OCR for page 184
Using the American Community Survey: Benefits and Challenges net military movement during the intervening period.2 Reliable estimates of births and deaths are obtained from vital records data. National estimates of net international migration are generated from the ACS for earlier years, and the numbers are then distributed across counties based on the distribution of noncitizen foreign-born persons in the 2000 census. Net domestic migration is estimated from Internal Revenue Service 1040 tax return records, which are matched to the Social Security Administration’s Numident file in order to obtain age, sex, race, and Hispanic origin data for the tax filers and their dependents. Data on the net movement of military personnel and their dependents are provided by the Department of Defense. A complication in developing the population estimates by race is that race in the 2000 census and the ACS is classified into six race groups (white; black; American Indian and Alaska Native; Asian; Native Hawaiian and Pacific Islander; and Some Other Race), and individuals can report multiple races, whereas most administrative data employ only four race groups (white; black; American Indian, Eskimo, or Aleut; and Asian and Pacific Islander). To address this complication, the census race categories are reduced to the four categories by eliminating the Some Other Race category and proportionately allocating all individuals into one of the four categories. The estimates are produced for the four categories and are then reallocated to the 2000 census categories for publication. For the purposes of the ACS weighting adjustments, race and Hispanic origin are combined into six categories: (1) non-Hispanic white; (2) non-Hispanic black; (3) non-Hispanic American Indian or Alaska Native; (4) non-Hispanic Asian; (5) non-Hispanic Native Hawaiian or Pacific Islander; and (6) Hispanic. The Census Bureau has conducted an evaluation of the 2000 county total population estimates by comparing them with the 2000 census counts, in a similar way to the evaluation of the housing unit estimates described in the previous section (Blumerman and Simon, 2006). The mean absolute percentage errors displayed in Table 5-3 indicate that the level of error for the smallest counties is appreciably larger than the overall average of 3.4 percent. However, since the ACS population control adjustments are applied at the level of estimation areas—which are combinations of counties in the case of smaller counties—the average MAPE for the estimation areas should be less than that for counties. The MAPEs are also larger for counties that experienced a growth of 19.5 percent or more. Note that this evaluation applies to estimates updated to 2000 from the 1990 census and that estimates updated over shorter intervals are likely to have smaller MAPEs. The preceding results apply to the total population estimates at the 2 For further details, see http://www.census.gov/popest/topics/methodology/2004_co_char_meth.html.

OCR for page 184
Using the American Community Survey: Benefits and Challenges TABLE 5-3 Mean Absolute Percentage Error (MAPE) of April 1, 2000, County Population Estimates (Official Series) Compared with April 1, 2000, Census Counts by County Population in 2000 County Population in 2000 Number of Counties MAPE All Counties 3,141 3.4 Under 2,500 115 6.9 2,500–4,999 177 4.3 5,000–9,999 405 3.7 10,000–19,999 651 3.3 20,000–49,999 879 3.2 50,000–99,999 390 2.6 100,000 and over 524 2.9 NOTE: The percentage error is calculated for each county as: (the 2000 census population count – the April 1, 2000, postcensal population estimate)/the 2000 census population count. The signs of these errors are dropped, and then the absolute errors are averaged across counties. SOURCE: Blumerman and Simon (2006:Table 4). county level. However, the ACS population controls are applied within subgroups defined by sex, age in 13 groups (0–4, 5–14, 15–17, 18–19, 20–24, 25–29, 30–34, 35–44, 45–49, 50–54, 55–64, 65–74, 75 and over), and race/ethnicity in 6 groups at the estimation area level. In practice the numerous cells in the sex by age by race/ethnicity cross-classification often have to be reduced by collapsing cells. A complex set of collapsing rules is employed, starting with collapsing categories of race/ethnicity to create collapsed cells with a minimum sample size of 10 persons and for which the weighting adjustment is less than 3.5. Subsequent collapsing of the sex by age cross-classification within collapsed race/ethnicity cells is then undertaken as necessary (U.S. Census Bureau, 2006:11-7). The panel undertook a simple analysis in order to derive a rough indication of the level of error in the population controls being used in the ACS. For this purpose we dropped the race classification because of the problems with the differences in that classification between the population estimates and the 2000 census, retaining only Hispanic versus non-Hispanic (equivalent to collapsing the first five of the six race/ethnicity categories listed above). We also collapsed the 15–17 and 18–19 age groups into a single 15–19-year-old group. We then compared the sex by age by ethnicity cross-tabulations produced from the population estimates for 1999 with the corresponding cross-tabulations from the 2000 census for each of 1,950 estimation areas (excluding one estimation area used in the 2005 ACS that was a new county formed in 2001). The 1999 estimates were used because 2000 estimates were not published.

OCR for page 184
Using the American Community Survey: Benefits and Challenges In a number of the cells of the cross-tabulations, the census counts were small numbers and would have led to small ACS sample sizes and the use of the cell collapsing procedure. For simplicity, rather than attempting to apply the complex collapsing procedure, cells with census counts of fewer than 500 persons are excluded in the results presented below. Table 5-4 summarizes the MAPEs resulting from this analysis, depending on which of the control variables are employed. The table also shows the MAPE for all 3,140 counties for comparison purposes. As with the MAPEs for the housing unit controls in Section 5-C above, the MAPEs for the population controls may include both random errors and systematic upward or downward biases. As expected, the MAPE for the overall population counts is somewhat lower for estimation areas (3.1 percent) than for counties (3.6 percent). The MAPE of 3.6 percent for counties in the panel’s analysis in Table 5-4 is slightly larger than the MAPE of 3.4 percent in the Census Bureau’s analysis reported in Table 5-3, which compared 2000 (not 1999) population estimates to 2000 census counts. Consequently, the MAPEs in the panel’s analysis for population groups are likely to be marginally larger than they would have been if 2000 population estimates could have been used. TABLE 5-4 Mean Absolute Percentage Error (MAPE) of July 1, 1999, Estimation Area Population Estimates Compared with April 1, 2000, Census Counts, by Cells Based on Combinations of Sex, Ethnicity, and Age (Excludes Cells with Fewer Than 500 People in the 2000 Census) Population Classification Number of Cells Included Number of Cells Excluded (Fewer than 500 People) MAPE Total population—no classification       Estimation areas 1,950 0 3.1 (All counties) (3,140) (0) (3.6) Sex 3,900 0 3.2 Ethnicity 3,401 499 9.7 Age group 23,397 3 6.8 Sex by ethnicity 6,130 1,670 11.5 Sex by age group 46,430 370 7.2 Ethnicity by age group 28,553 18,247 9.7 Sex by ethnicity by age group 53,020 40,580 9.0 NOTE: Estimation areas are large counties and groups of smaller counties with at least 16,000 people; the median size is 55,000 people; the average size is 145,000 people; the District of Columbia is included as an estimation area (information from the U.S. Census Bureau). See note to Table 5-3 for the calculation of MAPEs. SOURCE: Computations based on data provided by the U.S. Census Bureau.

OCR for page 184
Using the American Community Survey: Benefits and Challenges Not surprisingly, the MAPE for estimation areas increases only slightly when the population is classified by sex only. However, the MAPEs are much larger when the population is classified by age, ethnicity, or a combination of characteristics. In particular, the population estimates classified by ethnicity are subject to appreciable error. The large number of excluded cells with the cross-classifications involving ethnicity and age groups should be noted: the MAPE values given in Table 5-4 represent only a part of the total population, and the values for the excluded part may be very different. Table 5-5 gives the results of this analysis in a different form, presenting the distributions of the ratios of the population estimates to the census counts. When classified by sex only, the population estimates are within 10 percent of the census estimates for around 97 percent of the estimation areas. However, the corresponding percentage when the population is classified by age group falls to 77 percent, and when classified by ethnicity it falls to 67 percent. Of particular note is the 20 percent of cells with ratios less than 80 percent when the population is classified by ethnicity. Ratios of less than 80 percent occurred for non-Hispanic cells for only two estimation areas. However, the population estimates underestimated the number of Hispanics by 20 percent or more in 46 percent of estimation areas with 500 or more Hispanics. In a quarter of these areas the underestimation was at least 40 percent. The general underestimation of the Hispanic population in the 1990s is well recognized and may not be repeated in the current decade, but this analysis does bring out the problems associated with its concentration in certain geographic areas. If the race/ethnicity classification was dropped and only the sex by age group cross-classification used in the population weighting adjustments, there would still be a quarter of the cells in which the population control was in error (over and under) by at least 10 percent, and in 4 percent of the cells the error would exceed 20 percent. This finding, of course, applies to estimates that are 9 years out from the previous census; the estimates will likely be more accurate, on average, the closer they are to the census year. The panel has identified several alternative strategies that may serve to reduce the effects of errors in the population estimates and to deal with the extent of cell collapsing that is used with the current scheme for applying population controls. One strategy is to apply the cross-classification controls at a higher level of geography than estimation areas, with hopefully less error in the control totals. A drawback of this strategy is that ACS population estimates for counties and cities would not be consistent with the PE estimates for those areas. To ameliorate this problem, application of the cross-classification controls at a higher level of geography could be combined with the use of total population controls at the estimation area

OCR for page 184
Using the American Community Survey: Benefits and Challenges TABLE 5-5 Percentage Ratio of July 1, 1999, Estimation Area Population Estimates to April 1, 2000, Census Counts, by Cells Based on Combinations of Sex, Ethnicity, and Age (Excludes Cells with Fewer Than 500 People in the 2000 Census) Population Classification Number of Cells <80% 80%– 90%– 95%– 100%– 105%– 110%– 120%+ Total No subclassification 1,950 0.1 2.4 14.6 61.4 20.6 0.7 0.2 0.0 100.0 Sex 3,900 0.1 2.9 15.0 58.9 21.8 0.9 0.2 0.0 100.0 Ethnicity 3,401 19.7 7.5 9.1 36.8 18.3 3.1 3.0 2.4 100.0 Age group 23,397 2.0 13.9 20.0 26.4 20.6 10.2 5.8 1.1 100.0 Sex by ethnicity 6,130 16.1 7.3 9.3 39.1 19.8 3.3 2.7 2.4 100.0 Sex by age group 46,430 2.5 14.8 19.2 24.8 19.7 11.0 6.6 1.3 100.0 Ethnicity by age group 28,553 8.5 13.1 16.9 22.4 19.2 10.3 7.0 2.5 100.0 Sex by ethnicity by age group 53,020 6.6 13.9 17.1 22.3 19.1 11.0 7.7 2.3 100.0 NOTE: Estimation areas are large counties and groups of smaller counties with at least 16,000 people; the median size is 55,000 people; the average size is 145,000 people; the District of Columbia is included as an estimation area (information from the U.S. Census Bureau). SOURCE: Computations based on data provided by the U.S. Census Bureau.

OCR for page 184
Using the American Community Survey: Benefits and Challenges level. Indeed, as the Census Bureau is considering, the total controls might sometimes be applied at a lower level, such as individual cities with populations of over 65,000 within estimation areas. A strategy to reduce the amount of cell collapsing needed is to develop the weights through a raking algorithm that makes the ACS sample conform to each of the marginal distributions of the control variables, not to the joint distribution. The paper written for the panel by Jay Breidt and reproduced in Appendix B examines such alternatives. On the issue of whether to use the race/ethnicity classification, the poor quality of the estimates in the 2000 comparison raises concerns about their comparability to what would have been obtained had the ACS interviewed the entire population. The population estimates start with the last census values and update them using administrative data. The reporting or recording of race/ethnicity in the census and in administrative data differ from each other and also from the ACS. As a result, the population estimates by race/ethnicity may not serve well as controls for the ACS sample. The use of population controls for the population census and household surveys has a long history. It is instructive to contrast these uses with the use of population controls in the ACS. Although they appear similar, they are in fact very different. With the long-form sample, the data for the full census controls are collected for the same time and by essentially the same methods. Thus, the controls represent a poststratification adjustment, which improves the precision of the long-form estimates in a standard way. The long-form-sample controls achieve this improvement for small areas; they are applied for weighting areas, which are often as small as a block group or a census tract and never larger than a county (see National Research Council, 2004b:App. H). For household surveys, the controls are the population estimates and so subject to more error than the census counts, but for most household surveys other than the ACS the controls represent the same residence concept as the surveys; and the controls are applied at a high level of aggregation, which reduces the level of error in the population estimates. Generally, the controls for household surveys are applied for the nation as a whole by sex, age, and race/ethnicity groups and sometimes for total population by state (as, for example, in the Current Population Survey; Bureau of Labor Statistics and U.S. Census Bureau, 2002:Ch. 10). In contrast, the population controls used in the ACS are midyear population estimates based on different residence rules and different sources than the yearly accumulation of ACS monthly samples. (For an illustrative example of the effect of population controls on areas with seasonal populations, see Section 3-C.3.) The ACS population controls should therefore not be treated as if they are poststratification controls, as is the current practice. It cannot be assumed that they necessarily improve the quality of the ACS estimates, particularly since they are applied at the estimation

OCR for page 184
Using the American Community Survey: Benefits and Challenges area level and therefore are subject to appreciable error. The panel views the use of the population controls in the ACS as a critical issue that requires major research. Concomitantly, the panel views research on methods to improve the postcensal population estimates as a priority area for the Census Bureau. An important component of that research should be to investigate using ACS data more fully than currently in producing national estimates of international migration and particularly for estimating domestic migration. This research is even more important when considering the time series of ACS estimates. That time series will be affected at each census by the differences between the postcensal controls and the actual census counts. Section 3-G discusses this problem in the context of using the ACS estimates; see also Section 6-D. Recommendation 5-3: As a high priority, the Census Bureau should undertake research to evaluate the effect of the postcensal population controls on ACS estimates and to examine alternative methods of making an adjustment that may be superior to the one currently used (including dispensing with the population controls entirely). The Census Bureau should make users aware in ACS documentation that biases in the ACS estimates caused by errors in the population controls are not reflected in the margins of error reported with the estimates and should conduct research to examine the effects of these errors on ACS estimates. The Census Bureau should also give priority to research on ways to improve the postcensal population estimates at the county level, including estimates of internal migration and international immigration and the classification of race and ethnicity.