SAMPLING AND STATISTICAL ESTIMATION
This chapter focuses on uses of sampling and statistical estimation for improving the 2000 census. These techniques are particularly relevant for achieving two main goals of 2000 census research and development: reducing differential coverage and controlling operational costs.
Although the Census Bureau has ruled out a sample census for 2000, several uses of sampling are being considered:
Sampling households that do not respond to the census mailing could substantially reduce the cost of nonresponse follow-up. (Because of its close interaction with sampling for nonresponse follow-up, we also discuss truncation of the nonresponse follow-up period.)
Any coverage measurement program designed to reduce the differential undercount would require sampling. Each of the three coverage measurement methods under consideration would involve field operations in a sample of blocks.
Sampling for content is also very likely in 2000—involving either a single long form assigned to a sample of households, as in 1990, or multiple sample forms (matrix sampling).
The three major sections of this chapter discuss these topics in turn.
The use of sampling necessarily entails the use of statistical estimation as well, because information from sampled units must be used to generate estimates for units omitted from the sample. The nature of these methods is partly, but not entirely, determined by the type of sampling performed. (Statistical estimation is discussed in the last section of the chapter.)
In addition to the obvious relationships between sampling and statistical estimation, the various uses of sampling interact in important ways. Thus, decisions about any of the particular aspects should not be made without attention to the others. For example, truncation or sampling of nonresponse follow-up could affect coverage measurement in terms of the sample size required, the resources available, and the schedule for implementation.
Nonresponse follow-up was a very costly and troublesome part of the 1990
census. The Census Bureau estimates that nonresponse follow-up operations cost approximately $378 million in direct costs, 15 percent of the $2.6 billion 10-year cost of the census (Bureau of the Census, 1992f). Field follow-up activities—consisting mainly of rechecking housing units initially reported as vacant or nonexistent in nonresponse follow-up operations—cost an additional $115 million.1 Each 1 percent of nonresponse to the mailed questionnaire is estimated to have added approximately $17 million to the cost of the census.
Perhaps just as important, nonresponse follow-up took much longer than anticipated in some sites (particularly, New York City), pushing back the schedule for completion of the census and therefore for the Post-Enumeration Survey (PES) as well. Even without delays in schedule, the latter stages of census operations typically suffer degradation of data quality. Ericksen et al. (1991) report that, for the 1990 census, the rate of erroneous enumeration on mailout-mailback was 3.1 percent. On nonresponse follow-up, the rate was 11.3 percent; on field follow-up, the rate was 19.4 percent.
Much of the problem in 1990 resulted from mailback response rates that were lower than expected. Item nonresponse also contributed to the follow-up work because additional contacts are required to complete missing items. A variety of response improvement programs can be expected to improve mailback rates and, perhaps, to speed nonresponse follow-up operations. Even so, a 1990-style nonresponse follow-up operation is sure to be very expensive. Thus, methods that reduce the scope of nonresponse follow-up without undue sacrifices in the accuracy of the count or the content deserve serious consideration for the year 2000.
Major Nonresponse Follow-Up Innovations
The Census Bureau is considering three main techniques that would greatly reduce the cost of nonresponse follow-up: truncation, shortening the time period for field operations; sampling, carrying out nonresponse follow-up operations for a sample of households or blocks; and use of administrative records to replace some of the field data collection.
Simple truncation of nonresponse follow-up means stopping nonresponse follow-up operations in all areas at an earlier date than in a 1990-style census. The operations would be curtailed either at a predetermined time or when some predetermined percentage of nonresponse cases had been resolved. A range of truncation dates have been considered in the Census Bureau's research on this topic,
including April 21 (at the beginning of nonresponse follow-up operations, 3 weeks after census day), June 2 (after 6 weeks of nonresponse follow-up), and June 30 (near the end of nonresponse follow-up work). Of the population included in the final census total, the Census Bureau counted approximately 70 percent, 90 percent, and 98 percent, respectively, by these dates in 1990.
Sampling for nonresponse follow-up implies that enumerators would follow up only a sample of mailout addresses for which a mail form had not been returned. Current research at the Census Bureau is considering using either the individual address or the block as the sampling unit. If responses from mailback nonrespondents are correlated within blocks, then sampling by blocks would increase the variance of block-level population estimates relative to those derived from a sample of housing units. However, sampling by blocks appears to offer three advantages that might make it preferable. First, it seems likely that only block sampling would permit adding persons who do not correspond to an original address. Second, potential cost savings in field operations might accrue from block clustering. Third, coverage measurement methods that compare their own enumeration to the regular census enumeration in the same blocks would work much better in blocks, where all nonresponding households are followed up (see descriptions of CensusPlus and PES in the next section).
Sampling and truncation may also play a role in controlling costs associated with follow-up for item nonresponse, although the design issues are somewhat different than those for unit nonresponse, and the design need not be the same. Truncation of this aspect of follow-up may control costs and speed completion of the follow-up operation. Sampling may be useful by providing information that can be used to impute missing items for nonsampled units. Costs per household for completing missing items for responding households by telephone interview are certainly lower than those for completing full forms for nonresponding households. Yet there is potentially much more information available to guide item imputation (from completed items for the same household or from external sources such as administrative records) than there is for imputing a whole nonresponding household. As McKenney and Cresce (1992) note, decisions on follow-up procedures can have a substantial effect on the amount of information that must be allocated by models versus the amount that is obtained by telephone or field follow-up.
Pros and Cons of the Innovations
Both truncation and sampling offer the opportunity for substantial cost savings of hundreds of millions of dollars, in part by shortening the time of the nonresponse follow-up period. Compressing nonresponse follow-up offers several benefits. Two of the coverage measurement methods under consideration for the 2000 census, CensusPlus and PES, would benefit from starting earlier because fewer people would have moved between April 1 and the time of the coverage measurement operation, and there would be more time for data processing and analysis before counts were
due. These considerations have implications for a truncated nonresponse follow-up because the full benefits of accelerating census schedules would only be obtained if coverage improvement programs (such as the recheck of vacant housing units) were also truncated, so that CensusPlus or PES operations commenced at an early date. Sampling might also improve the accuracy of the nonresponse follow-up operation because the reduction in workload could permit hiring more qualified personnel with smaller caseloads, resulting in earlier contact attempts and lower incidences of moving between April 1 and contact time.
In contrast to these cost and operational advantages, both truncation and sampling have negative implications for the precision of small-area enumeration. Truncation of nonresponse follow-up could be expected to increase both the undercount and differential coverage of regular census enumeration. That result would increase the coverage measurement sample size necessary to achieve the same degree of precision in the final one-number census population totals that would be achieved in conjunction with a full nonresponse follow-up operation. Simple truncation would also adversely affect the quality of long-form data by creating nonresponse bias with no obvious method for correction.
Sampling for nonresponse follow-up would have somewhat different consequences on the precision of counts. If block sampling is used and coverage measurement is done in a subsample of those blocks, there would not be a direct increase in the size requirements for the coverage measurement sample. However, the precision of block-level counts for nonsample blocks would suffer due to the need for indirect estimation of all mailout nonrespondents in those blocks. At 1990 mail response rates, estimation would be required to complete about 30 percent of the population for those blocks, rather than the 1.5 percent or so that would be added for undercount in blocks with complete nonresponse follow-up. Two points are relevant to this discussion. First, count estimates requiring imputation for nonresponse before nonresponse follow-up would contain more error that estimates based on 1990-type nonresponse follow-up. Second, the additional error would also appear in counts for aggregations of blocks.
Two-Stage Nonresponse Follow-Up
Although both truncation and sampling have merit on their own, they may work even better in combination. A two-stage nonresponse follow-up operation would consist of a truncated first stage carried out in 100 percent of blocks—which would try to quickly resolve a substantial fraction of cases using some combination of telephone interviews, field work, and assignment from administrative records—followed by extended nonresponse follow-up in a sample of blocks.
A combination could bring together some of the best features of each strategy. The fraction of the population that would have to be estimated indirectly would be smaller than with a single-stage sample nonresponse follow-up strategy, because the status of many of the nonresponding addresses could be resolved in a brief first-stage
follow-up (of perhaps 2 or 3 weeks); therefore, uncertainties due to sampling and estimation would be smaller as well. At the same time, the sample at the second stage of nonresponse follow-up would be sufficiently dense to capture local variations in response rates, and follow-up operations and could make use of the most skilled of the enumerators from the first-stage operation. This would mitigate one of the potential defects of truncation: that local variations in mailback response and nonresponse follow-up operations are so large that differential coverage cannot be measured with acceptable precision by the coverage measurement survey. We recognize, however, that a combined strategy may limit cost savings in comparison with a single-stage approach because of inefficiencies in administration and staffing. Both the statistical properties of a two-stage nonresponse follow-up and the operational costs and benefits in terms of quality and staffing costs require further investigation. We note that the final report of a previous National Research Council panel (Citro and Cohen, 1985:27) also considered the merits of combining truncation with sampling for nonresponse follow-up:
Recommendation 6.2. We recommend that the Census Bureau include the testing of sampling in follow-up as part of the 1987 pretest program. We recommend that in its research the Census Bureau emphasize tests of sampling for the later stages of follow-up.
Nonresponse follow-up truncation, sampling, or a combination of the two would affect accuracy of counts and of content. These effects would be most noticeable at the small-area levels. The loss of precision for counts relative to a full nonresponse follow-up (no sampling, truncation at a relatively late date) would depend on details of the design, including the length of follow-up before truncation, the rate for posttruncation follow-up, if any, and sampling rates in the coverage measurement survey. In addition, the loss of precision, and therefore the eventual choice of designs, would depend on the availability of supplementary information, the amount of local variation in response and coverage rates, and the appropriateness of estimation procedures. The tradeoff between cost and variance, at various levels of geographical detail, is an important aspect of nonresponse follow-up design that could be investigated with a combination of algebraic models and simulations using 1990 data.
We endorse the recent recommendations of the Panel on Census Requirements (Committee on National Statistics, 1993) that sampling for nonresponse follow-up be investigated in the 1995 census test. In addition, we suggest specific topics for consideration during research and evaluation.
Recommendation 2.1: The Census Bureau should continue research on nonresponse follow-up sampling and truncation, including consideration of a combined strategy with a truncated first stage and sampling during a second stage of follow-up. Evaluation should consider effects of the nonresponse follow-up design on costs and on variance at a variety of geographic levels,
from states to small areas.
Use of Administrative Records during Nonresponse Follow-Up
Administrative records may play an important role in making nonresponse follow-up more efficient. Possible uses of administrative records include the following:
to support nonresponse follow-up by facilitating identification of residents of nonresponding addresses and making it easier to reach them by telephone;
to complete information for nonresponding addresses, if an administrative record that can be identified with that address is sufficiently complete and reliable;
to complete missing items for responding households, using information in the completed items to verify that the administrative record and the census questionnaire refer to the same household; and
as background information to make possible more accurate estimation of persons in nonsample blocks with a sampled nonresponse follow-up.
If these uses prove effective, they could dramatically shorten the period needed for adequate truncated nonresponse follow-up or for the first stage in a two-stage nonresponse follow-up. (Administrative records have similar uses with regard to coverage measurement; see Chapter 4.)
Recommendation 2.2: The Census Bureau should study in the 1995 census test the use of administrative records during nonresponse follow-up as a way to reduce the need for conventional follow-up approaches.
COVERAGE MEASUREMENT METHODS
Previous Coverage Measurement Programs
The Census Bureau has attempted to systematically measure coverage of the census since 1950 (Coale, 1955; Himes and Clogg, 1992). The 1980 Post-Enumeration Program was designed as an evaluation program for the 1980 census. Following that census, a number of states, local governments, and other plaintiffs sued to have the Census Bureau adjust population totals for differential undercoverage. Several lawsuits were filed, of which two went to trial. In one of these, the New York case, the judge in December 1987 ruled against the plaintiffs.
The 1990 Post-Enumeration Survey (PES) was designed to measure undercoverage and overcoverage, with a view to adjusting for undercount if the PES was found to support sufficiently credible, precise, and reliable population estimates. The PES was designed as two surveys based on identical samples of 5,000 block
clusters, one to measure undercoverage and one to measure erroneous enumerations in the census. The fundamental statistical assumption of the PES methodology is independence between capture in the census and in the PES, conditional on adjustment cell or poststratum. The 1990 PES has been extensively documented (Hogan, 1992, 1993; Mulry and Spencer, 1991).
After a period of intensive evaluation, the Census Bureau recommended adjustment of the 1990 census, but the Secretary of Commerce did not accept the recommendation. In one of several lawsuits filed after the 1990 census, the presiding judge observed that ''Plaintiffs have made a powerful case that discretion would have been more wisely employed in favor of adjustment. . . ,'' but he also found that the Secretary's decision was not "arbitrary or capricious" and ruled that, therefore, the decision could stand (see McLaughlin, 1993).
The Census Bureau's coverage evaluation efforts have demonstrated that certain groups are systematically undercounted relative to the rest of the population. Although response improvement programs—such as questionnaire simplification and reminder postcards—show promise for improving the mailback rate compared with 1990, early tests (see Chapter 3) suggest that they will have little effect, if any, on differential coverage. Other programs targeted at improving coverage of hard-to-reach populations may help to reduce differential coverage (see Chapter 3), but past experience suggests that they are unlikely to close the gap, especially at acceptable cost levels. For example, during the 1990 census, the recheck of vacant or nonexistent housing units added about 1.5 million persons, but over 30 percent of the additions were subsequently estimated to be erroneous. Similarly, the parolee-probationer check added between 400,000 and 500,000 persons to the final census counts, but about half of those persons were later estimated to have been enumerated erroneously (U.S. General Accounting Office, 1992; see Citro and Cohen, 1985:Ch. 5, for additional discussion of coverage improvement programs).
The One-Number Census
Current plans for the 2000 census are predicated on a strategy of integrated coverage measurement leading to a one-number census. In this approach, coverage measurement and statistical estimation are integrated into census methodology, rather than being regarded as separate operations that might be used to adjust the census. A one-number census offers several advantages over the dual strategy that was adopted for the 1990 census. First, it allows for the most cost-effective design because it permits planning cost and quality improvements in the census, particularly with regard to closing the differential coverage gap, that coverage measurement makes possible. Second, decisions about whether to implement response improvement programs aimed at special populations can be made on the basis of improving accuracy rather than on the basis of which groups would be helped or hurt. If the decision to use estimation and the basic estimation strategy (although not necessarily all the details of the procedure) are specified at the beginning of the census process, concerns that
decisions have been influenced by a desire to benefit certain geographic or demographic groups will be forestalled. Finally, a one-number census that enjoys the support of the scientific community will have greater credibility with the American public. At the same time, it requires that there be sufficient confidence in the success of the coverage measurement aspect of the program to proceed with plans based on it. We believe that the Census Bureau's experience with the Post-Enumeration Survey in 1990 indicates that a coverage measurement method can be designed that would reduce differential coverage in the 2000 census (Mulry and Spencer, 1993; Bureau of the Census, 1992b).
We argue above that reduction of census costs and reduction of differentials in coverage cannot occur simultaneously without coverage measurement and the use of statistical estimation. In this section, we argue that the strategy of integrated coverage measurement possesses several important advantages over the dual-strategy approach that was adopted for the 1990 census. Together, these arguments lead us to an endorsement of the Census Bureau's position regarding coverage measurement for the 2000 census.
Recommendation 2.3: We endorse the Census Bureau's stated goal of achieving a one-number census in 2000 that incorporates the results from coverage measurement programs, including programs involving sampling and statistical estimation, into the official census population totals. We recommend that research on alternative methodologies continue in pursuit of this goal.
Alternative Coverage Measurement Methods
The Census Bureau has identified three candidate methods for coverage measurement: the PES (at least conceptually similar to the 1990 methodology) and two new methods, CensusPlus and SuperCensus (Bureau of the Census, 1993e).
The Post-Enumeration Survey (PES) is an independent survey conducted after the census in a sample of blocks for the purpose of measuring census coverage. The respondents are matched to the original enumeration on a case-by-case basis. Statistical methods for ratio estimation can then be applied to obtain an estimate of the population size. The Census Bureau's PES methodology involves two overlapping sample surveys: a sample of census enumerations that measures erroneous census enumerations (the E sample) and a sample of the population that measures census omissions (the P sample) (see Mulry, 1992).
CensusPlus selects a sample of blocks and, using the census enumeration as a starting point, continues enumeration efforts in these blocks after the regular census is completed in an attempt to achieve a complete count. The count is improved by using special methods—such as administrative lists and highly trained interviewers—that are too expensive to use everywhere. The additional enumerations in the CensusPlus sample areas are used to develop population estimates for the nonsample areas by using statistical methods, such as ratio estimation (Mulry, 1992).
SuperCensus also selects a sample of blocks and conducts the enumeration with special methods similar to those described in CensusPlus. Like CensusPlus, SuperCensus attempts to conduct a complete census in the sampled blocks. The key difference is that no regular census takes place in these blocks: SuperCensus operations begin at the same time as, or even earlier than, the census in other blocks. This timing avoids most of the problems faced by PES or CensusPlus for people who move in the months following census day. Population estimates are based on applying the ratio of people to housing units observed in the sample blocks—or some other measure that is available for every block prior to the census—to the total number of housing units (Mulry, 1992).
The distinguishing feature of the PES is that the coverage measurement survey takes place after the basic census enumeration and is intended to be operationally and statistically independent of that enumeration. The PES methodology has no necessary connection to other features of the 1990 coverage measurement effort, such as the use of a dual strategy in which separate adjusted and unadjusted counts were calculated and the choice between them was made after the fact.
CensusPlus, like the PES, takes place after the basic census enumeration. Coverage estimation for CensusPlus involves techniques similar to those for the PES. For both coverage measurement methods, the ratio used in estimation is the correct population total to the census enumeration for that block. The key difference between CensusPlus and PES is that CensusPlus does not require independence between the basic enumeration and coverage measurement operations. This characteristic allows CensusPlus to use and build on the information collected during the regular census operations. However, the independence requirement is replaced by one of complete coverage by CensusPlus in sample blocks.
At the time of this report, the CensusPlus and SuperCensus methods for coverage measurement are not mature designs. Work is needed to define these options more fully so that they can be subjected to simulation studies, field testing, and comparative evaluation against the post-enumeration survey.
Four criteria are critical to selection of a coverage measurement methodology for the 2000 census:
Acceptable degree of bias. In practice, no coverage measurement method will completely eliminate differential coverage. However, the bias should be substantially smaller than the differential coverage that the method is intended to correct.
Adequate precision of population estimates for various levels of geography achievable at a fixed cost. Of course, the actual variance is determined by the size of the coverage measurement budget and the resulting sample size.
Operational and scheduling feasibility. The method needs to be
operationally feasible and able to meet reporting deadlines (see Chapter 1).
Ability to demonstrate that the method meets the other criteria. In practice, this means that there must be a satisfactory methodology for evaluating the chosen coverage measurement methodology, in tests as well as during the census. This is especially important for the assumptions that determine whether the coverage estimates will be biased.
We have identified specific major hurdles that apply to the three major contenders for alternative coverage measurement methods. For the PES, there are three: (1) the assumption of independence, conditional on poststratification variables, between inclusion in the census and the PES samples, (2) the need for accurate matching of names reported in the census and PES, and (3) the ability to meet reporting deadlines (after the extensive matching and verification operations).
For CensusPlus, there are five major hurdles: (1) the ability to obtain complete coverage (i.e., find everyone by the end of CensusPlus), (2) the ability to eliminate duplication of names in order to avoid overcounting during CensusPlus, (3) resolving place-of-residence problems (see Chapter 1), (4) development of a way to estimate the number of erroneous enumerations that occurred during the regular census, and (5) the ability to meet reporting deadlines (probably less of a concern than for the PES, because CensusPlus would build on census enumerations, while the PES starts anew).
For SuperCensus, there are four major hurdles: (1) the ability to obtain complete coverage (i.e., find everyone by the end of SuperCensus), (2) the ability to eliminate duplicated names in order to avoid overcounting during SuperCensus, (3) resolving place-of-residence problems, and (4) achieving sufficient accuracy within reasonable cost limits. There is reason to believe the last hurdle is particularly severe for the SuperCensus approach. The SuperCensus would require estimating ratios of person counts (e.g., black males aged 18–24) to housing unit counts. The variance of one of these estimated ratios is roughly proportional to the between-block variance of the ratio, and the Census Bureau's research on a sample census (Isaki et al., 1993) seems to indicate high between-block variation in these ratios. Therefore, SuperCensus may require a much larger sample size than CensusPlus or PES to achieve the same level of precision. It should be possible to resolve this issue, at least preliminarily, using 1990 PES counts (or regular census counts) to simulate a sample of SuperCensus blocks.
Recommendation 2.4: Before final design of the 1995 census test, the Census Bureau should critically evaluate the SuperCensus method of coverage measurement by using 1990 data to learn whether adequately precise coverage estimates are possible using ratios to the housing base.
As noted in Chapter 1, definitions of residency in a particular location are an issue for every coverage measurement method, complicated by people who move shortly after the census or whose residency is transient around census day. The
nature of the issue, however, differs somewhat among methods. With the PES, it is only necessary to determine whether persons located in the PES were enumerated in the census anywhere, or in practice, within some search area determined by the practicalities of the search operation. With CensusPlus it would be necessary to determine which of the persons found in the survey several months later were resident in sample blocks on census day according to census rules, and conversely, to achieve near-complete coverage of those persons who were resident in the sample blocks. Finally, SuperCensus, like CensusPlus, would have to locate and geographically assign all persons, and only those persons, who were officially resident in sample blocks on census day. However, because SuperCensus would be carried out on an earlier schedule than CensusPlus, it might be somewhat easier to deal with place of residency.
Development and Testing
The Census Bureau has correctly identified that either of the new coverage measurement methods would require more research and development than would the PES if it is to become the method used in 2000. However, we are concerned by the description of the PES approach ". . . partly as a fallback position in the event the SuperCensus/CensusPlus approach does not prove viable based on our research or on results of the 1995 Census Test" (Bureau of the Census, 1993d:24). Because both CensusPlus and SuperCensus involve an assumption of near-complete coverage that has not been demonstrated by any census method to date, the importance of maintaining the PES as a tested alternative should not be minimized.
We are concerned that exclusion of the PES from the 1995 census test will jeopardize its status as a candidate method for coverage measurement in the 2000 census. Because the PES is the method that is best supported by past experience, we believe it should be tested in combination with new features that will be tried in 1995, possibly including integrated coverage measurement, multiple response modes, and nonresponse follow-up truncation. In particular, the timeliness of the PES as part of an integrated coverage measurement strategy can only be demonstrated in a census test environment. Even if SuperCensus or CensusPlus proves possible on the basis of testing in 1995, the PES may still be the best choice. However, unless a fair, comparative evaluation of all candidate methods is undertaken, it may be difficult to justify choosing the PES over a method tested in 1995.
Recommendation 2.5: Development and testing of methodology for the Post-Enumeration Survey (PES) should continue in parallel with other methods until another method proves superior in operational tests. All methods still under consideration—including the PES—should be evaluated critically against common criteria.
Note that the selection criteria outlined in the preceding section may be
commonly applied to all candidate methods—that is, the criteria are independent of any particular feature of the candidate methods. To produce the information required, coverage measurement will need to be tested in a substantial proportion of the 1995 census test blocks. We note that this will also provide valuable information about the success of response improvement programs in reducing differential coverage.
Interaction between coverage measurement and other design features needs to be studied. The best choice of coverage measurement method may depend on other factors in the 2000 census. For example, we believe that speeding up nonresponse follow-up may improve the accuracy of the PES and CensusPlus. Thus, it would be valuable to learn about the interaction between coverage measurement methodology and nonresponse follow-up design, including truncation date and the use of sampling.
Coverage Measurement Sample in the 2000 Census
We anticipate that estimates that meet the demands of integrated coverage measurement in the year 2000 will require a much larger sample size than that used in the 1990 PES, although the precise determination of this size will depend on the coverage measurement and estimation methods used. The precision of coverage measurement depends critically on sample size and the poststratification design (choice of adjustment cells). For a given sample size, if adjustment cells are too large, they will fail to capture important heterogeneity in census coverage rates, which will affect the accuracy of corrections for small areas. The issue of heterogeneity within adjustment cells was cited as an argument against adjustment of the population base for intercensal estimates (Bryant, 1993). If adjustment cells are too small, there will be inadequate sample size to obtain estimates of acceptable precision, although modeling may be of some assistance in this situation; again, the accuracy of estimates for small areas will be most affected. Only with adequate sample size can both of these problems be mitigated. This sample size calculation should be based on analysis of data from the 1990 census and the 1995 census test.
Coverage measurement sample size also interacts with other features discussed in this chapter, including truncation or nonresponse follow-up sampling: there is a tradeoff between cost savings through these methods and a corresponding requirement for expansion of the coverage measurement sample in order to obtain sufficiently precise estimates of the larger undercoverage that would result. These tradeoffs may favor increasing the size of coverage measurement relative to nonresponse follow-up. The cost of the PES, $50–60 million in 1990, was small relative to that of nonresponse follow-up. In addition to cost considerations, it would be important to demonstrate that field management and data processing operations for a larger coverage measurement program could be administered.
Recommendation 2.6: Whatever coverage measurement method is used in 2000, the Census Bureau should ensure that a sufficiently large sample is taken so that the single set of counts provides the accuracy needed by data users at
pertinent levels of geography.
SAMPLING FOR CONTENT: MATRIX SAMPLING
Census methodology has involved sampling for content since 1940. In recent censuses, sampling for content has been accomplished through the use of short and long forms. The long form has been distributed to a sample of households, with the sampling rate depending on the type of place in order to produce good estimates of population characteristics for small as well as large civil divisions.
Matrix sampling is a more complex content sampling scheme, in which several different long forms are used, each sent to a subsample of the census universe. Each form includes a different subset of the full set of content items. The subsets are designed in such a way that there is adequate sample size for each item and so that combinations of items for which cross-tabulations are important appear together on some subset of the forms. The use of matrix sampling in the decennial census is not new: it was used in both the 1950 and 1960 censuses for the housing sample items (see Bureau of the Census, 1965; Goldfield, 1992).
Matrix sampling was also used extensively in the 1970 census for both population and housing items. There were two long forms; one was sent to 5 percent of households, and the other was sent to 15 percent of households. Some questions appeared on both forms and, thus, were asked of 20 percent of households. Matrix sampling was quite important in 1970 because this was the first census from which data tapes were provided to users to do their own analysis. (In 1950 and 1960, only printed reports prepared by the Census Bureau were generally available.) Users found matrix sampling in 1970 to be a nuisance because of the limitations on cross-tabulations and the need to access multiple data products (e.g., there were two PUMS [(public-use microdata sample)] files, one for each long form).
Matrix sampling for content, the "multiple sample forms" census design element, is attractive because it can reduce respondent burden while yielding high overall content. Content here refers to statistical information on various joint, conditional, and marginal distributions of census items (Bureau of the Census, 1993c, especially p. 20; Griffin and Cresce, 1993; Miskura, 1993a).
The specific advantages of matrix sampling for long-form items are that respondent burden can be more evenly distributed; each item may be included only as often as necessary (it is not necessary to have the same sampling rate for every item), and there is an opportunity to expand content by asking more questions without making any single form excessively long. Matrix sampling also has some liabilities that will have to be addressed in order to determine whether its use in the census will be beneficial: sample size for some cross-tabulations may be decreased; the best estimates for tabulations and cross-tabulations for some items will require use of statistical estimation methods; analysis of the resulting data products will be more complicated than with a single sample; and costs may increase due to increased content and operational complexity. The latter two problems were also noted by
Citro and Cohen (1985:261).
Current plans appear to provide for tests of a method that would have four or five sample forms, each with a lesser level of respondent burden than the previous long form. Specific content—items to be sampled—will be determined at a later date. Current plans also call for simulation tests based on the 1990 census long-form data. Marginal tabulations for each item are available on the basis of every response containing that item. Similarly, cross-tabulations for pairs or larger combinations of items may be based on forms on which those items appear together. More efficient estimates of tabulations and cross-tabulations require statistical estimation to combine marginal and cross-tabulated information from different samples. Optimal design for the proposed implementation must take into account the correlations among items to be collected on a sample basis. Items that are independent can be placed on separate forms. Items that are almost perfectly correlated require little overlapping or splicing across forms. But for the many items that are neither independent nor perfectly correlated in the total population or major population subgroups, these design issues warrant investigation.
Matrix sampling may prove valuable as a method for obtaining additional information for a given level of cost and respondent burden, although the ability to cross-classify this information would be somewhat diminished. Many of the questions that must be answered in order to plan and evaluate a matrix sampling scheme can best be attacked through simulations based on 1990 data, because those data provide a reasonable approximation to the universe of items from which matrix sampling will draw.
Both item content and pertinent levels of geography for this content must be examined simultaneously in simulation experiments. In addition to statistical design for item selection and overlapping or splicing (by pertinent geographical units), sampling protocols that stratify by levels of geography (e.g., by state) must be considered. Sample stratification must also be considered in relation to subnational needs for census content.
The optimal design of multiple forms has to be investigated carefully, using the 1990 census long-form data to simulate likely outcomes for the 2000 census. Such simulations need to take account of interactions among items used (not just pairwise correlation or association). Because almost all census content items (occupation, labor force status, etc.) are most useful for levels of geography, such as states, counties, cities, and minor civil divisions, the correlation structure and conditional interactions must be examined within pertinent levels of geography. The relevant variance comparisons can also be simulated reasonably well with analyses based on the 1990 census long-form data. The 1995 census test cannot provide decisive information on variances of interest because the correlation structure within and across pertinent geographic areas cannot be simulated from a test census conducted at only four sites.
Research must also be carried out to examine the operational feasibility, cost, and effects on bias and variance of the implementation of matrix sampling in combination with other basic census designs under consideration. For example, if
nonresponse follow-up is conducted partly on a sample basis, it is not obvious how to select sample forms for the target samples to be used in sample nonresponse follow-up. In fact, these interactions may be problematical even if only a single sample form is used. While simulations with 1990 data may partly answer questions of this sort, testing in 1995 will be necessary to fully explore these issues, especially those related to operations and costs.
Recommendation 2.7: The Census Bureau should continue research on possible matrix sampling designs, using the 1990 census data to simulate tabulations and crosstabulations. Design(s) that appear most promising should be tested in 1995 to permit evaluation of their performance in combination with other census design features under test.
Finally, matrix sampling is only one of the methods under consideration that are designed to increase or retain census content. The other main method is continuous data collection through samples on monthly or annual bases throughout the decade, perhaps using matrix sampling. Research on variance and bias, imputation for nonresponse, and other statistical issues should be carried out to make comparisons between matrix sampling and continuous measurement. However, we are not sure that it is possible to simulate such factors with 1990 census data. (Proposals for continuous data collection are discussed more fully in Chapter 5.)
Census design features being considered for 2000 will create new demands for statistical estimation methods. Each of the methods described previously in this chapter—sampling or truncation of nonresponse follow-up, coverage estimation, and matrix sampling—requires a corresponding estimation strategy and research on particular aspects of implementation.
Nonresponse follow-up sampling. Estimates must be obtained of numbers and characteristics of persons and households who would have been found in each block during nonresponse follow-up had all households been included in the nonresponse follow-up sample. The information that can be used in this estimation process includes the number and characteristics of persons found in nonresponse follow-up in sample blocks or households, the number of unresolved nonresponse addresses in the nonsample blocks or households, and the number and characteristics of persons found during unsampled census operations (mailback and other presampling responses).
Truncation. Truncation increases the number of persons who are not directly enumerated and therefore must be statistically estimated or assigned. As this number becomes larger, the demands on estimation for accuracy become more stringent.
Coverage estimation. Methods such as models or poststratification schemes must be developed for describing patterns of undercoverage. Sample sizes will
probably be such that direct estimates are possible only for fairly large aggregates. Estimation of population and characteristics for smaller aggregates will require development of such indirect estimation methods as synthetic estimation and empirical Bayes smoothing. At the most detailed level, methods will be required for incorporating estimated persons and households into individual blocks, creating units with realistic characteristics in such a way that additivity is maintained across levels of geography.
Additional information sources. Inclusion of new information sources into the census, such as administrative records and multiple response modes, may create new demands on estimation methodologies.
Matrix sampling. As noted above, the use of matrix sampling implies the use of estimation to combine information from various sample forms efficiently and to estimate cross-tabulations for which there is little or no information in the sample data.
Necessary research on statistical estimation divides roughly into three phases. In the first phase, which is now underway and continues until the major design decisions have been made for the 1995 census test, estimation research focuses on broadening the range of possibilities for the use of sampling and other statistically based techniques. In this phase, preliminary assessments can be obtained of the expected precision for various designs. In the second phase, roughly coinciding with the planning, execution, and processing of the 1995 census test, the emphasis shifts to developing methods needed for the selected designs and methodological features. In the final phase, beginning with assessment of the 1995 census test and continuing through the decade, the selected estimation methods will have to be consolidated, optimized, validated, and made both theoretically and operationally robust. This last process will ensure that they can stand up to critical scrutiny and to problems that may arise in the course of the 2000 census. In this phase, work will also continue on selecting estimation procedures required for the production of all census products.
Recommendation 2.8: The Census Bureau should vigorously pursue research on statistical estimation now and throughout the decade. Topics should include nonresponse follow-up sampling, coverage estimation, incorporation of varied information sources (including administrative records), indirect estimation for small areas, and matrix sampling.
Official statistics have progressed over the century from a narrow focus on simple tabulations of population characteristics to provision of a range of census products, including complex tabulations and sample microdata files. Analytical uses of these data require availability of both point estimates and measures of uncertainty. When complex statistical methods, such as complex sampling schemes, indirect estimation, and imputation are used in creating census products, users will not be able to derive valid measures of uncertainty by elementary methods, and they may not have adequate information in the published or available products to derive these
measures. It therefore becomes the responsibility of the data producers to facilitate estimation of uncertainty.
''Total error models'' have been used by the Census Bureau to measure uncertainty in the outcomes of the census and the contributions of the various sources of error to this uncertainty (Hansen et al., 1961). More recently, a total error model was developed for estimation of uncertainty in adjusted estimates based on the 1990 census and PES (Mulry and Spencer, 1993). Such models take into account both sampling errors in the estimates and potential biases stemming from the regular census and from coverage estimation. Bias can arise, for example, from use of several response modes or from differences among response times. Similar models may be a useful tool for evaluating uncertainty in integrated estimates from a complex census in the year 2000.
Recommendation 2.9: The Census Bureau should develop methods for measuring and modeling all sources of error in the census and for showing uncertainty in published tabulations or otherwise enabling users to estimate uncertainty.