B
Controlling the American Community Survey to Postcensal Population Estimates

F. Jay Breidt

Colorado State University

ABSTRACT

The U.S. Census Bureau has proposed the use of postcensal population estimates as population controls for the American Community Survey at a fine level of geographic and demographic stratification. These population estimates are known to be imperfect. Bias and variance of post-stratification estimators with imperfect population controls at various levels of aggregation are considered. The bias and variance are computed with respect to the “model” (including data generation and postcensal population estimation) or with respect to the “design” (including coverage, sampling, response and postcensal population estimation). Bias and variance depend in a complex way on the interactions of postcensal population estimation errors with undercoverage error and nonresponse. Numerical examples illustrate that in the presence of imperfect postcensal population estimations, control at higher levels of aggregation may be better in terms of bias than control at a fine level of post-stratification.

B–1
INTRODUCTION

B–1.1
Background

The estimation procedures used with the American Community Survey (ACS) include a poststratification step that employs postcensal population estimates by demographic strata as controls. The controls are applied within estimation areas that consist of larger counties or combinations of smaller counties. The National Research Council’s Panel on the Functionality and Usability of Data from the American Community Survey (ACS) asked me to evaluate these plans, comparing them to direct estimates that forgo the use of these controls, and comparing them to estimates controlled at higher



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 269
Using the American Community Survey: Benefits and Challenges B Controlling the American Community Survey to Postcensal Population Estimates F. Jay Breidt Colorado State University ABSTRACT The U.S. Census Bureau has proposed the use of postcensal population estimates as population controls for the American Community Survey at a fine level of geographic and demographic stratification. These population estimates are known to be imperfect. Bias and variance of post-stratification estimators with imperfect population controls at various levels of aggregation are considered. The bias and variance are computed with respect to the “model” (including data generation and postcensal population estimation) or with respect to the “design” (including coverage, sampling, response and postcensal population estimation). Bias and variance depend in a complex way on the interactions of postcensal population estimation errors with undercoverage error and nonresponse. Numerical examples illustrate that in the presence of imperfect postcensal population estimations, control at higher levels of aggregation may be better in terms of bias than control at a fine level of post-stratification. B–1 INTRODUCTION B–1.1 Background The estimation procedures used with the American Community Survey (ACS) include a poststratification step that employs postcensal population estimates by demographic strata as controls. The controls are applied within estimation areas that consist of larger counties or combinations of smaller counties. The National Research Council’s Panel on the Functionality and Usability of Data from the American Community Survey (ACS) asked me to evaluate these plans, comparing them to direct estimates that forgo the use of these controls, and comparing them to estimates controlled at higher

OCR for page 269
Using the American Community Survey: Benefits and Challenges levels of demographic and/or geographic aggregation. The panel’s charge for me was to evaluate these options from a theoretical, not an empirical, perspective. In particular, the charge was to look at the bias and variance of these estimators, for both population estimation and estimation of other characteristics, at different levels of aggregation. This paper proposes a simple theoretical framework under which bias and variance can be computed. Cochran (1977), following Stephan (1941), discusses the effect of non-random, imperfect estimates of population controls on mean square error under repeated sampling, with full coverage and response. By contrast, I propose stochastic models for: data generation (“model”); coverage, sampling, and response (“design”); and population estimation. Within this stochastic framework, I consider estimation with or without controls, the level of aggregation for those controls, and whether or not those controls have errors. In Section B–2, I compute bias and variance with respect to model and population estimation or with respect to design and population estimation and discuss the implications of those computations. Numerical examples of bias for a simple, artificial population are computed in Section B–3. Finally, I discuss the results and outline a few directions for further investigation in Section B–4. I focus on bias and variance because the computations can then be performed without knowledge of distributional properties beyond second-order moments. In practice, second-moment properties might feasibly be estimated from data. Risks with respect to losses other than squared error loss might be computed under stochastic mechanisms similar to those considered here. For example, losses that weight bias more heavily might be of interest in certain funding formulas. To begin, note that population controls are helpful in mitigating all three errors of nonobservation: coverage error, sampling error, and nonresponse. To illustrate this, I focus on a theoretical framework that includes a probability sample of elements from a finite population, with stochastic mechanisms for frame inclusion and for nonresponse, and with weighting that reflects the probability sampling and poststratification. (Nonresponse adjustments internal to the sample are not explicitly considered here.) This framework is simplified considerably from that of the ACS, which samples households (clusters) instead of people (elements) and includes up to 11 factors for weighting. This framework should, however, provide insight into the issues surrounding the use of imperfect population controls in the ACS. B–1.2 Notation and Estimators Let U denote a finite population. Let yi denote a generic study variable of interest corresponding to the ith element. Let Fi =1 if element i is in the frame, 0 otherwise; Si = 1 if element i is sampled, 0 otherwise; and

OCR for page 269
Using the American Community Survey: Benefits and Challenges Ri = 1 if element i responds, 0 otherwise. Assume probability sampling of frame elements with inclusion probability πi Fi for element i, where (for notational convenience) πi > 0 for all i ∈ U. Let denote a two-way stratification of the population of interest, with disjoint cells Ugh, and let Cgh denote a perfect census count of the elements in cell (subpopulation) g,h. In practice, the counts {Cgh} are unknown and are projected from past census data using techniques of demographic analysis. Let Dgh denote the population estimate in cell g,h. Furthermore, define the row margin census counts and population estimates, column margin census counts and population estimates, and overall census count and population estimate, Consider estimation of the population total Define the cell indicator and note that T (zkli ) = Ckl is the count in cell k, l; T(zkliyi) is the total of y in cell k, l; T (∑hzghi) is the count in row is the total of y in row g; etc. Thus, both counts and totals at various levels of aggregation are implicitly included in the discussion that follows. Three kinds of estimators of T are of interest. The first is the poststratification estimator (PSE) with cell controls: (B.1) Plugging in zkli for yi in (B.1), we have that , so that the sample is controlled to the postcensal population estimates in every cell. The PSE with cell controls therefore is similar to the Census Bureau’s plan to control at a fine level of demographic stratification within estimation areas.

OCR for page 269
Using the American Community Survey: Benefits and Challenges The second estimator is the PSE with control on one margin. Without loss of generality, take this as the row margin: (B.2) This estimator is controlled to the postcensal population estimates for the row margins, . This estimator represents the alternative of controlling to population estimates at a higher level of aggregation, such as controlling for age and sex within estimation areas but not controlling for race or Hispanic origin. The final estimator has control only to the overall population estimate, D••: (B.3) This will be referred to as the overall-control estimator and represents the option of direct estimates that forgo the use of all but the overall control. Some additional notation is useful in describing properties of the estimators. Denote the empirical mean, variance, and coefficient of variation of a variable {xi }on the set A by Similarly, denote the empirical covariance and correlation between variables xi and yi on the set A by In the computations that follow, the variable xi is often piecewise constant over subsets of A. For example, consider a variable xi defined as xi =bg h for i ∈ Ug h, under the two-way stratification . Then

OCR for page 269
Using the American Community Survey: Benefits and Challenges B–1.3 Assumptions Assume that the postcensal population estimation errors {Dgh − Cgh} are independent random variables, independent of the design variables {Fi, Si, Ri} and of the data {yi}. These independence assumptions might be questionable in a real application. For example, it seemslikely that a subpopulation with serious coverage issues might also suffer from large postcensal population estimation errors for many of the same reasons, so that frame variables and estimation errors would be correlated. Furthermore, estimation errors in different age categories of the same race/sex group might well be correlated. Nevertheless, computations under this simple stochastic model may provide some useful insights into the problems with imperfect controls. Let The special case of perfect population controls is obtained with δgh ≡ 0 and . Factors that may affect these biases and variances in the ACS application could include time since last census, demographic grouping (age/race/sex/Hispanic origin), geographic grouping, and interactions (e.g., young people in college-dominated estimation areas). Properties of the estimators can be derived under assumptions on the errors in the population estimates, along with assumptions on the design or the data-generating model. Expectation with respect to the data-generating (superpopulation) model will be differentiated from expectation with respect to the design by adding a subscript m to the expectation operator. The population estimation error distribution will be included in either design or model expectations. The data-generating model assumptions are that {yi} are independent with common mean Em [yi] = µgh and common variance for i ∈ Ugh. Since yi is a generic study variable, this assumption would ideally hold for any choice of study variable yi, so that the poststratification is appropriate for all study variables. In this case, no design assumptions are required. The design assumptions are the following: {Ri} are conditionally independent given{Si, Fi}, with E [Ri |Si, Fi] = ρghSiFi for i ∈ Ugh; E[Si |Fi] = πiFi for all i; {Fi} are independent, with E[Fi ] = gh for i ∈ Ugh. If these design assumptions hold, then the poststratification is appropriate for all study variables, regardless of their data-generating mechanisms.

OCR for page 269
Using the American Community Survey: Benefits and Challenges Note that either of these sets of assumptions imply that the cell-level stratification is sufficient (fine enough) to account for differences in coverage and response, and so it is an appropriate level of aggregation for poststratification adjustments. It is possible that the cell-level stratification is too fine, but if in fact the coverage and response vary from cell to cell, then the cell-level stratification is necessary in the sense that estimators controlled at higher levels of aggregation have bias due to variation in coverage and response, as demonstrated in the next section. For simplicity in variance computations, I also make the assumption that the probability sampling design is Poisson sampling, that is, {Si} given {Fi} are independent. This assumption is not used in bias computations. Expressions computed using this assumption are compact and interpretable. Computations could be done under other designs as well. B–2 PROPERTIES OF THE ESTIMATORS This section examines properties of the PSE with cell-level controls, margin-level controls, and overall population controls, looking at the bias and variance of each under the model assumptions and under the design assumptions. Computations under the model assumptions are exact, while those under the design require some large sample approximations. Inference under the design is widely accepted by survey practitioners, although both model-based and design-based inference play important roles in official statistics. (Many survey texts, including Särndal, Swensson, and Wretman, 1992, and Brewer, 2002, discuss inference and illustrate computations under model and under design.) In this paper, the derived expressions are perfectly parallel under model or design, so the choice of model or design is not critical. B–2.1 Model Properties of PSE with Cell-Level Controls With respect to both the data-generating model and the distribution of the postcensal population estimation errors, the PSE with cell-level controls has bias so that the estimator is unbiased only if postcensal population estimation biases are orthogonal to cell means. In general, this will not be the case, so

OCR for page 269
Using the American Community Survey: Benefits and Challenges the PSE with cell-level controls will be biased. It is usefulto express this bias as using an extension of the argument in equation (15.6.3) of Särndal, Swensson, and Wretman (1992). The above expression assumes that , , and , so that the correlation and coefficients of variation are well defined. In the rest of this paper, it is always assumed implicitly that any correlations and coefficients of variation are well defined. Then the relative bias is (B.4) A key feature of this bias is the correlation term between postcensal population estimation biases and cell means. If the biases are nearly constant from cell to cell, then , and the correlation term contributes little to the bias. If the biases differ from cell to cell, then the correlation term gives a signed contribution to the bias. On one hand, for example, suppose that the errors in the postcensal population estimates cross-classified by age, race/ethnicity, and sex result in an overestimate of the the number of individuals with high income but do not overestimate elsewhere. Then cvU (δgh/Cgh) ≠ 0, and the correlation is positive (overestimate, high income), so a positive term is contributed to the bias. On the other hand, suppose that the postcensal population estimates overestimate the number of individuals with low income but do not overestimate elsewhere. Then

OCR for page 269
Using the American Community Survey: Benefits and Challenges cvU (δgh/Cgh) ≠ 0, and the correlation is negative (overestimate, low income), so a negative term is contributed to the bias. With respect to both the data-generating model and the distribution of the postcensal population estimation errors, the PSE with cell-level controls has variance (B.5) This is not necessarily larger than the variance with perfect controls. For example, with Dgh ≡ 0, the estimator has zero variance but large bias. Note that Thus, in the case of perfect controls () we have from (B.5) that The lower bound is obtained if Fi ≡ 1, πi ≡ 1 (hence Si ≡ 1), and Ri ≡ 1, that is, a census from a perfect frame with full response. In all other cases, undercoverage, sampling error, and nonresponse increase the variance.

OCR for page 269
Using the American Community Survey: Benefits and Challenges B–2.2 Design Properties of PSE with Cell-Level Controls The PSE is nonlinear in the design variables, so its expectation and variance under the design are approximated from the usual Taylor series linearization, (B.6) The design bias is then approximately so that the PSE is biased under the stated design assumptions, unless postcensal population estimation biases happen to be orthogonal to cell means. By the same argument used for the relative model bias, the relative design bias is then approximated as (B.7) which has the same interpretation as the relative bias under the model. From (B.6), the design variance of the PSE is approximately (B.8) which parallels the model variance in (B.5). A key part of this expression involves squared deviations from subpopulation means, so that if the subpopulations are more homogeneous than the population as a whole, the design variance will tend to be smaller than that of a direct estimator. In addition, the first component of this variance is zero if gh ≡ 1, πi ≡ 1, and ρgh ≡ 1; that is, a census from a perfect frame with full response would have zero design variance in the case of perfect controls. In all other cases, undercoverage, sampling error, and nonresponse increase thevariance because the corresponding probabilities appear in the denominator of the first term.

OCR for page 269
Using the American Community Survey: Benefits and Challenges B–2.3 Model Properties of PSE with Margin-Level Controls Define , the estimated observation probability in cell g, h. Let . The model bias of the PSE with margin-level controls is where it is understood that is a data set of Cg• values: Ag1 repeated Cg1 times, Ag2 repeated Cg2 times, and so forth. Similarly for {µg h}. The relative bias is then computed as (B.9)

OCR for page 269
Using the American Community Survey: Benefits and Challenges The first term is bias attributable solely to population estimation error. It would appear even in the absence of any bias due to undercoverage or nonresponse. The second term is bias due only to undercoverage and nonresponse; this term appears even in the absence of postcensal population estimation errors. The final term represents the contribution to the bias from the interaction between coverage/nonresponse bias and postcensal population estimation bias. The second two bias terms reflect the fact that nonresponse bias and undercoverage bias are not adequately adjusted for at the row margin level, if the response and coverage actually vary from cell to cell within the row. If there is no variation in the estimated observation probabilities {Agh} within a particular row, then , and undercoverage/nonresponse in that particular row contributes no bias because controlling at the row margin was an appropriate adjustment. If there is variation within a particular row, then the bias is determined by the amount of correlation between estimated observation probabilities and cell means. For example, if response and coverage is high in cells that have higher average incomes than other cells in the row, then the correlation between probabilities and cell means is positive and the bias is positive. If response and coverage is high in low-income cells compared with other cells, then the correlation between probabilities and cell means is negative and the bias is negative. The model variance of the PSE with imperfect margin controls is (B.10) B–2.4 Design Properties of PSE with Margin-Level Controls The PSE with margin-level controls is nonlinear in the design variables, so its expectation and variance under the design are approximated from the usual Taylor series linearization, (B.11)

OCR for page 269
Using the American Community Survey: Benefits and Challenges The relative design bias is then given approximately by (B.12) The interpretation of this relative design bias directly parallels the relative model bias described above. The first term is bias attributable solely to population estimation error. It would appear even in the absence of any bias due to undercoverage or nonresponse. The second term is bias due only to undercoverage and nonresponse; this term appears even in the absence of postcensal population estimation errors. The final term represents the contribution to the bias from the interaction between undercoverage/nonresponse bias and postcensal population estimation bias. If {gh ρgh} does not depend on h within row g, then a single control for row g is appropriate, , and undercoverage/nonresponse within row g contributes no bias to the overall estimator. From (B.11), the design variance of the estimator is approximately (B.13) B–2.5 Model Properties of PSE with Overall-Level Control The properties of the PSE with overall-level control can be derived from the previous results for the PSE with margin control by noting that control-

OCR for page 269
Using the American Community Survey: Benefits and Challenges ling on the overall count is analogous to controlling on the row margin when there is a single row. Thus the overall-control estimator has relative model bias given approximately by (B.14) The first term in the relative bias is present even in the absence of any undercoverage/nonresponse bias. The second and third terms in the relative bias reflect the fact that nonresponse bias and undercoverage bias are not adequately adjusted for at the population level, if the response and coverage actually vary from cell to cell. If there is no variation in the estimated observation probabilities {Agh}, then cvU (Agh) = 0, and there is no undercoverage/nonresponse bias because controlling at the population level was in fact the appropriate adjustment. In the more likely scenario of some variation, however, there is bias determined by the amount of correlation between estimated observation probabilities and cell means. The model variance of the overall-control estimator is (B.15) B–2.6 Design Properties of the Overall-Control Estimator Again using the fact that the overall-control estimator is exactly like an estimator with row control and a single row, the overall-control estimator has relative design bias given approximately by (B.16)

OCR for page 269
Using the American Community Survey: Benefits and Challenges The interpretation of this relative design bias directly parallels the relative model bias described above. The first term is bias attributable solely to estimation error. It would appear even in the absence of any bias due to undercoverage or nonresponse. The second term is bias due onlyto undercoverage and nonresponse; this term appears even in the absence of postcensal population estimation errors. The final term represents the contribution to the bias from the interaction between undercoverage/nonresponse bias and postcensal population estimation bias. The design variance of the overall-control estimator is approximately (B.17) B–3 NUMERICAL EXAMPLES Consider a simple artificial population of size C•• = 1,000 that is poststratified into a two-way table of four equally sized cells with the following characteristics:   Column 1 Column 2 Row Totals   C11 =250 C12 =250 C1• =500 Row 1 A11 = 0.7 or 11ρ11 = 0.7 A12 = 0.8 or 12ρ12 =0.8     µ11 = 2 or 11 = 2 µ12 =1 or 12 = 1     C21 = 250 C22 = 250 C2• = 500 Row 2 A21 = 0.9 or 21ρ21 = 0.9 A22 = 1.0 or 22ρ22 = 1.0     µ21 = 10 or 21 = 10 µ22 = 2 or 22 = 2   Column Totals C•1 = 500 C•2 = 500 C•• =1, 000 I focus exclusively on bias in these numerical examples for two main reasons. First, relative bias can be computed without any assumptions on the covariance structure of the postcensal population estimation errors. The independence assumptions under which the variances are computed in Section B–2 may not be plausible in the ACS application, as discussed earlier. Second, mitigation of bias due to undercoverage errors is usually the most important reason for poststratification.

OCR for page 269
Using the American Community Survey: Benefits and Challenges For given levels of postcensal population estimation biases {δgh}, the exact biases and relative biases for the various estimators, under either the model or design assumptions, can be computed directly from these values for cell counts and totals, margin (row or column) counts and totals, and the overall counts and totals. That is, by choosing Agh, µgh from the above table, we obtain the relative model bias, and by choosing gh ρgh, g h, we obtain the relative design bias. The computation is identical in either case. For example, the exact bias of as an estimator of the overall total given 10 percent relative bias in all of the postcensal population estimates is so that the exact relative bias is The corresponding approximate relative biases can then be computed from equations (B.14) under the model or (B.16) under the design. In the numerical examples that follow, I consider three different settings for the postcensal population estimation biases: unbiased (δgh/Cgh ≡ 0), equally biased across cells (δgh/Cgh ≡ 0.1), and unequally biased across cells (δ11/C11 = −0.1, δ12/C12 = 0.0, δ21/C21 = 0.2, δ22/C22 = 0.3). Exact relative biases are tabled; approximate relative biases are accurate to three decimal places in this particular example and are not tabled. Note also that relative bias is scale-invariant in the sense that for c ≠ 0. Since the bias for estimation of a cell total can be computed with study variable µkl zkli or kl zkli in place of yi zkli, the relative bias for estimation of cell totals is the same as the relative bias for estimation of cell counts in every cell, as shown in the following tables. This does not hold for rows, columns, or the overall total. The unbiased case corresponds to perfect population controls, for which the cell-level PSE is unbiased under the model and approximately unbiased under the design, as shown in Table B.1. The PSEs with margin-level controls and with no controls are generally biased, however, because there is

OCR for page 269
Using the American Community Survey: Benefits and Challenges variation in the coverage/response from cell to cell. Exceptions are the unbiased estimates of row counts with , column counts with , and overall population count with any estimator. For the equally biased case, Table B.2 shows that the cell-level PSE has the same relative bias for both counts and totals in all cells and margins. This bias arises solely from the postcensal population estimation bias since cvU (δgh/Cgh) = 0 in equations (B.4) and (B.7). The margin-level PSEs and the overall-control estimator are all biased, both due to the postcensal population estimation biases and the variability of coverage and response from cell to cell. The relative biases are equal for estimation of row counts with , column counts with , and overall population count with any estimator, but generally the relative biases vary. It is interesting to note that for estimation of the overall total, the estimator with smallest relative bias is , followed by , , and finally . Thus, controlling at a higher level of aggregation can be better than controlling at the cell level, or it can be worse than not controlling at all, depending on which margin is chosen. This example illustrates the complexities in deciding among various levels of control. The final example, in Table B.3, has varying postcensal population estimation biases from cell to cell. In this case, there are some large relative biases throughout the table. Each of the estimators outperforms the others for some estimand, and so no approach dominates. For estimation of the overall total, the best estimator uses no controls, and the worst is the cell-level PSE. This example indicates that controlling at a higher level of aggregation (margins) can be better than controlling at the cell level. In this particular case, neither margin-level PSE beats the overall-control estimator. This example illustrates once again the complexities in deciding among various levels of control. B–4 DISCUSSION The derivations and numerical examples described in earlier sections illustrate some important points that may be relevant to the use of population estimates as controls in weighting for the American Community Survey. First, the poststratification estimator that is controlled at the cell level is unbiased only if the postcensal population estimation biases (in correctly specified cells) are orthogonal to cell means, a situation that is unlikely to occur in practice. The cell-level PSE is likely to be biased for many ACS estimates. Second, the margin-level PSE has biases arising from both nonobservation errors (undercoverage and nonresponse) and postcensal population estimation errors, as does the overall-control estimator. These different sources

OCR for page 269
Using the American Community Survey: Benefits and Challenges TABLE B.1 ExactRelativeBiasesforCaseofUnbiasedPostcensalPopulationEstimates   Column 1       Column 2       Row Sums         cell row column overall cell row column overall cell row column overall Row 1                         Count 0 −0.067 −0.13 −0.18 0 0.067 −0.11 −0.059 0 0 −0.12 −0.12 Total 0 −0.067 −0.13 −0.18 0 0.067 −0.11 −0.059 0 −0.022 −0.12 −0.14 Row 2                         Count 0 −0.053 0.13 0.059 0 0.053 0.11 0.18 0 0 0.12 0.12 Total 0 −0.053 0.13 0.059 0 0.053 0.11 0.18 0 −0.035 0.13 0.078 Column Sums Count 0 −0.06 0 0 −0.059 0 0.060 0 0.059 0 0 0 Total 0 −0.055 0.083 0.020 0 0.057 0.037 0.098 0 −0.033 0.074 0.035 NOTE: Shown are exact relative biases for estimators controlled to postcensal population estimates at cell or margin (row or column), or controlled only to overall postcensal population estimate (“overall”), in the case of unbiased postcensal population estimates (δgh/Cgh ≡ 0). Approximate relative biases from formulae in text are equivalent to three decimal places with exact relative biases.

OCR for page 269
Using the American Community Survey: Benefits and Challenges TABLE B.2 Exact Relative Biases for Case of Equally Biased PostcensalPopulation Estimates   Column 1 Column 2 Row Sums   cell row column overall cell row column overall cell row column overall Row 1 Count 0.1 0.027 −0.038 −0.094 0.1 0.17 −0.022 0.035 0.1 0.1 −0.030 −0.029 Total 0.1 0.027 −0.038 −0.094 0.1 0.17 −0.022 0.035 0.1 0.076 −0.032 −0.051 Row 2 Count 0.1 0.042 0.24 0.16 0.1 0.16 0.22 0.29 0.1 0.1 0.23 0.23 Total 0.1 0.042 0.24 0.16 0.1 0.16 0.22 0.29 0.1 0.061 0.24 0.19 Column Sums Count 0.1 0.034 0.1 0.035 0.1 0.17 0.1 0.16 0.1 0.1 0.1 0.1 Total 0.1 0.040 0.19 0.12 0.1 0.16 0.14 0.21 0.1 0.064 0.18 0.14 NOTE: Shown are exact relative biases for estimators controlled to postcensal population estimates at cell or margin (row or column), or controlled only to overall postcensal population estimate (“overall”), in the case of equally biased postcensal population estimates (δgh/Cgh ≡ 0.1). Approximate relative biases from formulae in text are equivalent to three decimal places with exact relative biases.

OCR for page 269
Using the American Community Survey: Benefits and Challenges TABLE B.3 Exact Relative Biases for Case of Unequally Biased Postcensal Population Estimates   Column 1 Column 2 Row Sums   cell row column overall cell row column overall cell row column overall Row 1                         Count −0.1 −0.11 −0.081 −0.094 0 0.013 0.022 0.035 −0.05 −0.05 −0.03 −0.029 Total -0.1 -0.11 -0.081 -0.094 0 0.013 0.022 0.035 -0.067 -0.071 -0.047 -0.051 Row 2 Count 0.2 0.18 0.18 0.16 0.3 0.32 0.28 0.29 0.25 0.25 0.23 0.23 Total 0.2 0.18 0.18 0.16 0.3 0.32 0.28 0.29 0.22 0.21 0.20 0.19 Column Sums                         Count 0.05 0.035 0.05 0.035 0.15 0.16 0.15 0.16 0.1 0.1 0.1 0.1 Total 0.15 0.13 0.14 0.12 0.2 0.22 0.19 0.21 0.16 0.15 0.15 0.14 NOTE: Shown are exact relative biases for estimators controlled to postcensal population estimates at cell or margin (row or column), or controlled only to overall postcensal population estimate (“overall”), in the case of unequally biased postcensal population estimates (δ11/C11 =−0.1, δ12/C12 = 0.0, δ21/C21 = 0.2, δ22/C22 = 0.3). Approximate relative biases from formulae in text are equivalent to three decimal places with exact relative biases.

OCR for page 269
Using the American Community Survey: Benefits and Challenges of bias interact in complex ways, so that it is difficult to characterize the appropriate level of aggregation for poststratification. Numerical examples show that cell control, row control, column control, or no control may be best depending on the parameter settings and population quantity of interest. Control on both margins through some kind of raking procedure was not treated here but is worthy of further consideration. The results in this paper indicate that the Census Bureau’s plan for control at a fine level of demographic stratification within estimation areas may be problematic. It may yield estimators with bias properties worse than no controls at all. This paper is only a first step in evaluation of the possible effects of errors in postcensal population controls on ACS estimates. Research is needed in a number of directions. First, the numerical results are limited and the parameter values in that limited study were chosen to illustrate potential problems, which may or may not occur in real ACS data. For example, the artificial population has cell means that vary by a factor of 10, which may or may not be realistic in ACS applications. It is necessary to explore a range of parameter values (response probabilities, coverage probabilities, cell means, postcensal population estimation bias, etc.) that are plausible in real ACS applications to determine whether or not the potential problems identified in this paper are real problems for the ACS. Second, the numerical experiments focus exclusively on bias, because bias is a major reason for poststratification and because the independence assumptions under which variances are derived in this paper are possibly unrealistic. Certainly bias is critical, and in many applications it dominates variance. Ultimately, the interest is mean squared error, the sum of squared bias and variance. To study variance analytically, it is necessary to make some assumptions about the covariance structure for the various types of errors (for example, assumptions about correlations among postcensal population estimation errors in different cells, or between postcensal population estimation errors and frame imperfections). These assumptions should be guided by a careful consideration of the ACS and the methods of postcensal population estimation. Analytic computations under these assumptions could be supplemented or replaced by simulations. Finally, this paper does not explore the full complexity of the weighting factors used for the ACS, so the issue of bias would need further study, both analytical and empirical, in this more complex setting. REFERENCES Brewer, K. (2002). Combined survey sampling inference: Weighing Basu’s elephants. London: Arnold. Cochran, W.G. (1977). Sampling techniques, 3rd edition. New York: Wiley.

OCR for page 269
Using the American Community Survey: Benefits and Challenges Särndal, C.-E., Swensson, B., and Wretman, J. (1992). Model assisted survey sampling, New York: Springer. Stephan, F.F. (1941). Stratification in representative sampling. Journal of Marketing, 6, 38–46.