This appendix describes the operations of the 2000 Accuracy and Coverage Evaluation (A.C.E.) Program.1 Differences from the analogous 1990 Post-Enumeration Survey (PES) are summarized in Chapter 6, which also describes the dual-systems estimation (DSE) method used to develop population estimates for post-strata from the A.C.E. results. This appendix covers six topics:
sampling, address listing, and housing unit match;
initial matching and targeted extended search;
field follow-up and final matching;
weighting and imputation; and
SAMPLING, ADDRESS LISTING, AND HOUSING UNIT MATCH
The 2000 A.C.E. process began in spring 1999 with the selection of a sample of block clusters for which an independent listing of addresses was carried out in fall 1999. The selection process was designed to balance such factors as the desired precision of the DSE estimates, not only for the total population, but also for minority groups, and the cost of field operations for address listing and subsequent interviewing. In addition, the A.C.E. selection process had to work within the constraints of the design originally developed for integrated coverage measurement (ICM).
First-Stage Sampling and Address Listing of Block Clusters
Over 3.7 million block clusters were formed that covered the entire United States, except remote Alaska.2 Each cluster included one census collection block or a group of geographically contiguous blocks, in which the block(s) were expected to be enumerated using the same procedure (e.g., mailout/mailback) and to contain, on average, about 30 housing units on the basis of housing unit counts from an early version of the 2000 Master Address File (MAF). The average cluster size was 1.9 blocks.
Next, clusters were grouped into four sampling strata: small (0–2 housing units), medium (3–79 housing units), large (80 or more housing units), and American Indian reservations (in states with sufficient numbers of American Indians living on reservations). Systematic samples of block clusters were selected from each stratum using equal probabilities, yielding about 29,000 block clusters containing about 2 million housing units, which were then visited by Census Bureau field staff to develop address lists.
The sample at this stage was considerably larger than that needed for the A.C.E. The reason was that the Census Bureau had originally planned to field a P-sample of 750,000 housing units for use in ICM, and there was not time to develop a separate design for the planned A.C.E. size of about 300,000 housing units. So the ICM block cluster sample design was implemented first and then block clusters were subsampled for A.C.E., making use of updated information from the address listing about housing unit counts.3
Sample Reduction for Medium and Large Block Clusters
After completion of the address listing and an update of the MAF, the number of medium and large block clusters was reduced, using differential sampling rates within each state. Specifically, medium and large clusters classified as minority on the basis of 1990 data were oversampled to improve the precision of the DSE estimates for minority groups. Also, clusters with large differences in housing unit counts from the P-sample address list and the January 2000 version of the MAF were oversampled in order to minimize their effect on the variance of the DSE estimates.
Sample Reduction for Small Block Clusters
The next step was to stratify small block clusters by size, based on the current version of the MAF, and sample them systematically with equal probability
at a rate of 1 in 10. However, all small block clusters that were determined to have 10 or more housing units and all small block clusters on American Indian reservations, in other American Indian areas, or in list/enumerate areas were retained. After completion of the cluster subsampling operations, the A.C.E. sample totaled about 11,000 block clusters.
Initial Housing Unit Match
The addresses on the P-sample address listing were matched with the MAF addresses in the sampled block clusters. The purpose of this match was to permit automated subsampling of housing units in large blocks for both the P-sample and the E-sample and to identify nonmatched P-sample and E-sample housing units for field follow-up to confirm their existence. Possible duplicate housing units in the P-sample or E-sample were also followed up in the field. When there were large discrepancies between the housing units on the two samples, indicative of possible geocoding errors, the block clusters were relisted for the P-sample.
Last Step in Sampling: Reduce Housing Units in Large Block Clusters
After completion of housing unit matching and follow-up, the final step in developing the P-sample was to subsample segments of housing units on the P-sample address list in large block clusters in order to reduce the interviewing workload. The resulting P-sample contained about 301,000 housing units. Subsequently, segments of housing units in the census were similarly subsampled from large block clusters in order to reduce the E-sample follow-up workload. For cost reasons, the subsampling was done to maximize overlapping of the P-sample and E-sample. Table C-1 shows the distribution of the P-sample by sampling stratum, number of block clusters, number of housing units, and number of people.
The goal of the A.C.E. interviewing of P-sample households was to determine who lived at each sampled address on Census Day, April 1. This procedure required that information be obtained not only about nonmovers between Census Day and the A.C.E. interview day, but also about people who had lived at the address but were no longer living there (outmovers). In addition, the P-sample interviewing ascertained the characteristics of people who were now living at the address but had not lived there on Census Day (inmovers).
The reason for including both inmovers and outmovers was to implement a procedure called PES-C, in which the P-sample match rates for movers would be estimated from the data obtained for outmovers, but these rates would then
TABLE C-1 Distribution of the 2000 A.C.E. P-Sample Block Clusters, Households, and People, by Sampling Stratum (unweighted)
be applied to the weighted number of inmovers. The assumption was that fewer inmovers would be missed in the interviewing than outmovers, so that the number of inmovers would be a better estimate of the number of movers. PES-C differed from the procedure used in the 1990 PES (see Chapter 6).
It was important to conduct the P-sample interviewing as soon as possible after Census Day, so as to minimize errors by respondents in reporting the composition of the household on April 1 and to be able to complete the interviewing in a timely manner. However, independence of the P-sample and E-sample could be compromised if A.C.E. interviewers were in the field at the same time as census nonresponse follow-up interviewers. An innovative solution for 2000 was to conduct the first wave of interviewing by telephone, using a computerized questionnaire. Units that were eligible for telephone interviewing included occupied households for which a census questionnaire (either a mail or an enumerator-obtained return) had been captured that included a telephone number, had a city-style address, and was either a single-family home or in a large multi-unit structure. Units in small multi-unit structures or with no house number or street name on the address were not eligible for telephone interviewing. Telephone interviewing began on April 23, 2000, and continued through June 11. Fully 29 percent of the P-sample household interviews were obtained by telephone, a higher percentage than expected.
Interviewing began in the field the week of June 18, using laptop computers. Interviewers were to ascertain who lived at the address currently and who had lived there on Census Day, April 1. The computerized interview—an innovation for 2000—was intended to reduce interviewer variance and to speed up data capture and processing by having interviewers send their completed interviews each evening over secure telephone lines to the Bureau’s main computer center, in Bowie, MD.
For the first three weeks, interviewers were instructed to speak only with a household resident; after then, they could obtain a proxy interview from a nonhousehold member, such as a neighbor or landlord. (Most outmover interviews were by proxy.) During the last two weeks of interviewing, the best interviewers were sent to the remaining nonrespondents to try to obtain an interview with a household member or proxy. Of all P-sample interviewing, 99 percent was completed by August 6; the remaining 1 percent of interviews were obtained by September 10 (Farber, 2001b:Table 4.1).
INITIAL MATCHING AND TARGETED EXTENDED SEARCH
After the P-sample interviews were completed, census records for households in the E-sample block clusters were drawn from the census unedited file; census enumerations in group quarters (e.g., college dormitories, nursing homes) were not part of the E-sample. Also excluded from the E-sample were people with insufficient information (IIs), as they could not be matched,
and late additions to the census whose records were not available in time for matching. People with insufficient data lacked reported information for at least two characteristics (among name, age, sex, race, ethnicity, and household relationship); computer imputation routines were used to complete their census records. Census terms for these people are “non-data-defined” and “whole person imputation;” we refer to them in this report as “people requiring imputation.” In 2000, there were 5.8 million people requiring imputation, as well as 2.4 million late additions due to the special operation to reduce duplication in the MAF in summer 2000 (see Chapter 8).
For the P-sample, nonmovers and outmovers were retained in the sample for matching, as were people whose residence status was not determined. Inmovers or people clearly identified from the interview as not belonging in the sample (e.g., because they resided in a group quarters on Census Day) were not matched.
E-Sample and P-Sample Matching Within Block Cluster
Matching was initially performed by a computer algorithm, which searched within each block cluster and identified clear matches, possible matches, nonmatches, and P-sample or E-sample people lacking enough reported data for matching and follow-up. (For the A.C.E., in addition to meeting the census definition of data defined, each person had to have a complete name and at least two other characteristics). Clerical staff next reviewed possible matches and nonmatches, converting some to matches and classifying others as lacking enough reported data, erroneous (e.g., duplicates within the P-sample or E-sample, fictitious people in the E-sample), or (when the case was unclear or unusual) as requiring higher-level review.4 The work of the clerical staff was greatly facilitated by the use of a computerized system for searching and coding (see Childers et al., 2001).
On the P-sample side, the clerks searched for matches within a block cluster not only with E-sample people, but also with non-E-sample census people. Such people may have been in group quarters or in enumerated housing units in the cluster that were excluded when large block clusters were subsampled.
Targeted Extended Search
In selected block clusters, the clerks performed a targeted extended search (TES) for certain kinds of P-sample and E-sample households (see Navarro and Olson, 2001). The search looked for P-sample matches to census enumerations in the ring of blocks adjacent to the block cluster; it also looked for E-sample
correct enumerations in the adjacent ring of blocks. The clerks searched only for those cases that were whole household nonmatches in certain types of housing units. The purpose was to reduce the variance of the DSE estimates due to geocoding errors (when a housing unit is coded incorrectly to the wrong census block). Given geocoding errors, it is likely that additional P-sample matches and E-sample correct enumerations will be found when the search area is extended to the blocks surrounding the A.C.E. -defined block cluster.
Three kinds of clusters were included in TES with certainty: clusters for which the P-sample address list was relisted; 5 percent of clusters with the most census geocoding errors and P-sample address nonmatches; and 5 percent of clusters with the most weighted census geocoding errors and P-sample address nonmatches. Clusters were also selected at random from among those clusters with P-sample housing unit nonmatches and census housing units identified as geocoding errors. About 20 percent of block clusters were included in the TES sample. Prior to matching, field work was conducted in the TES clusters to identify census housing units in the surrounding ring of blocks.
Only some cases in TES block clusters were included in the extended clerical search. These cases were P-sample nonmatched households for which there was no match to an E-sample housing unit address and E-sample cases identified as geocoding errors. When an E-sample geocoding error case was found in an adjacent block, there was a further search to determine if it duplicated another housing unit or was a correct enumeration.
Following the clerical matching and targeted extended search, a small, highly experienced staff of technicians reviewed difficult cases and other cases for quality assurance. Then a yet smaller analyst staff reviewed the cases the technicians could not resolve.
FIELD FOLLOW-UP AND FINAL MATCHING
Matching and correct enumeration rates would be biased if there were not a further step of follow-up in the field to check certain types of cases. On the E-sample side, almost all cases that were assigned a nonmatch or unresolved code by the computer and clerical matchers were followed up, as were people at addresses that were added to the MAF subsequent to the housing unit match. The purpose of the follow-up was to determine if these cases were correct (nonmatching) enumerations or erroneous.
On the P-sample side, about half of the cases that were assigned a nonmatch code and most cases that were assigned an unresolved code were followed up in the field. The purpose was to determine if they were residents on Census Day and if they were a genuine nonmatch. Specifically, P-sample nonmatches were followed up when they occurred in: a partially matched household; a whole household that did not match a census address and the interview was
conducted with a proxy respondent; a whole household that matched an address with no census person records and the interview was conducted with a proxy; or a whole household that did not match the people in the E-sample for that household. In addition, P-sample whole household nonmatches were followed up when: an analyst recommended follow-up; when the cluster had a high rate of P-sample person nonmatches (greater than 45%); when the original interviewer had changed the address for the household; and when the cluster was not included in the initial housing unit match (e.g., list/enumerate clusters, relisted clusters).
The field follow-up interviews were conducted with a paper questionnaire, and interviewers were instructed to try even harder than in the original interview to speak with a household member. After field follow-up, each P-sample and E-sample case was assigned a final match and residence status code by clerks and, in some cases, technicians or analysts.
WEIGHTING AND IMPUTATION
The last steps prior to estimation were to:5
weight the P-sample and E-sample cases to reflect their probabilities of selection;
adjust the P-sample weights for household noninterviews;
impute missing characteristics for P-sample persons that were needed to define post-strata (e.g., age, sex, race); and
impute residence and/or match status to unresolved P-sample cases; impute enumeration status to unresolved E-sample cases.
Weighting is necessary to account for different probabilities of selection at various stages of sampling. Applying a weight adjustment to account for household noninterviews is standard survey procedure, as is imputation for individual characteristics. The assumption is that weighting and imputation procedures for missing data reduce the variance of the estimates, compared with estimates that do not include cases with missing data, and that such procedures may also reduce bias, or at least not increase it.
For the P-sample weighting, an initial weight was constructed for housing units that took account of the probabilities of selection at each phase of sampling. Then a weighting adjustment was performed to account for household noninterviews. Two weight adjustments were performed, one for occupied households as of the interview day and the other for occupied households as of
Census Day. The adjusted interview day weight was used for inmovers; the adjusted Census Day weight, with a further adjustment for the targeted extended search sampling, was used for nonmovers and outmovers. E-sample weighting was similar but did not require a household noninterview adjustment.6
Item imputation was performed separately for each missing characteristic on a P-sample record. The census editing and imputation process provided imputations for missing characteristics on the E-sample records (see Appendix A). Finally, probabilities of being a Census Day resident and of matching the census were assigned to P-sample people with unresolved status, and probabilities of being a correct enumeration were assigned to E-sample people with unresolved enumeration status (see Chapter 6).
Estimation of the DSE for post-strata and the variance associated with the estimates was the final step in the A.C.E. process. The post-strata were specified in advance on the basis of research with 1990 census data (see Griffin and Haines, 2000), and each E-sample and P-sample record was assigned to a post-stratum as applicable. Post-strata that had fewer than 100 cases of nonmovers and outmovers were combined with other post-strata for estimation. In all, the originally defined 448 post-strata, consisting of 64 groups defined by race/ethnicity, housing tenure, and other characteristics cross-classified by 7 age/sex groups (see Figure 6-2 in Chapter 6), were reduced to 416, by combining age/sex groups as needed within one of the other post-strata.
Weighted estimates were prepared for each of the 416 post-strata for the following:
P-sample total nonmover cases (NON), total outmover cases (OUT), and total inmover cases (IN) (including multiplication of the weights for nonmovers and outmovers by residence status probability, which was 1 for known Census Day residents and 0 for confirmed nonresidents);
P-sample matched nonmover cases (MNON) and matched outmover cases (MOUT) (including multiplication of the weights by match status probability, which was 1 for known matches and 0 for known nonmatches);
E-sample total cases (E); and
E-sample correct enumeration cases (CE) (including multiplication of the weights by correct enumeration status probability).
Also tabulated for each post-stratum was the census count (C) and the count of IIs (people with insufficient information, including people requiring
imputation and late additions). The DSE for each post-stratum was calculated as the census count minus IIs, times the correct enumeration rate (CE/E), times the inverse of the match rate, or
The match rate (M/P) was calculated for most post-strata by applying the outmover match rate (MOUT/OUT) to the weighted number of inmovers (IN) to obtain an estimate of matched inmovers (MIN), and then solving for
However, for post-strata with fewer than 10 outmovers (63 of the 416), the match rate was calculated as
Procedures were implemented to estimate the variance in the DSE estimates for post-strata. Direct variance estimates were developed for the collapsed post-strata DSEs that took account of the error due to sampling variability from the initial listing sample, the A.C.E. reduction and small block subsampling, and the targeted extended search sampling. The variance estimates also took account of the variability from imputation of correct enumeration, match, and residence probabilities for unresolved cases. Not included in the variance estimation were the effects of nonsampling errors, other than the error introduced by the imputation models. In particular, there was no allowance for synthetic or model error; the variance calculations assume that the probabilities of being included in the census are uniform across all areas in a post-stratum (see Starsinic et al, 2001).