Read "Preparing For the 2000 Census: Interim Report II" at NAP.edu

Page 6 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

2
Application of Sampling Procedures

NEED FOR SAMPLING

In its first interim report, the panel reviewed the possible uses of sampling procedures in the census, both as a means of completing the enumeration of households (sampling for nonresponse follow-up) and as a means of increasing the accuracy of the count by identifying people missed in the census enumeration and adjusting population figures accordingly (integrated coverage measurement). In that report the panel discussed the potential benefits of sampling for both uses and concluded that, in both cases, there is clear potential for improvements in census data through sampling. The report also concluded that it appeared at that time that the Census Bureau would be able to implement those sampling procedures for 2000 in a way that would realize the potential benefits. However, the panel noted that the Census Bureau would need to apply substantial thought, planning, and diligence between then and 2000 if sampling procedures are to be implemented in a way that will realize their full potential to produce a census of higher quality with controlled cost.

At this time, the panel finds that the Census Bureau has made substantial progress in developing procedures for conducting nonresponse follow-up on a sample basis and in developing a large-scale sample survey for integrated coverage measurement (see Chapters 5 and 6). It is clear, however, that the Bureau faces additional work to guarantee that the 2000 census will be a well-managed and cost-effective census that meets its constitutional mandate and is of uniformly high quality throughout the nation. The fact that sizable challenges remain in the development of a census that uses sampling methods does not change the panel's assessment that a census without sampling will almost certainly be unacceptable in terms of both quality and cost. The panel concludes that there is no reasonable ''fall-back position" for the 2000 census. As we indicated in our previous report, echoing the message of the National Research Council's two previous panels on the 2000 census (Steffey and Bradburn, 1994; Edmonston and Schultze, 1995), we do not believe that a census of acceptable accuracy and cost is possible without the use of sampling procedures, for both nonresponse follow-up and integrated coverage measurement.

The potential for improved data quality through the use of sampling for nonresponse follow-up derives from two main features. First, the use of sampling will reduce the field workload and may result in more timely completion of the nonresponse follow-up procedures in the field. This increased timeliness will increase data quality because respondents will typically be giving information to enumerators closer to census day than they would if nonresponse follow-up without sampling were implemented. There will be fewer recall errors and less use of poor-quality "last-resort" information obtained from indirect sources in the final stages of field data collection. In addition, the

Page 7 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

more timely completion of nonresponse follow-up will permit the coverage measurement survey to be implemented in the field closer to census day, thereby decreasing recall errors and the effect of people who move on the process of coverage measurement and correction. Second, the use of sampling will make it possible to use better qualified and more highly trained personnel to conduct the nonresponse follow-up work.

We stress again, as we did in our first interim report, that it would not be feasible to implement the intensive procedures that would be needed to significantly improve census coverage without the use of sampling. The initial census process itself will use the best methods possible to identify every household in the country and the best techniques available so that householders can provide information about all residents of the household. Yet experience in recent censuses clearly shows that the number of people missed as a result of missed dwellings and of people missed in enumerated dwellings is too great and, in particular, too inequitably distributed to be ignored if a census is to be of adequate quality. Thus, to conduct an adequate census, sampling procedures must be used and the results integrated into the final population counts.

Although sampling procedures to "complete the count" have not been used previously in a decennial census (either for nonresponse follow-up or for coverage measurement), the procedures proposed are well established in the production of official statistics and in the conduct of scientific research. Indeed, there is widespread public confidence in the data collected by the census long form, which is on a sample basis from 1 in 6 households in 1990. Such sampling for expanded information began in the 1940 census. Probability sampling procedures are acknowledged to be an objective method of collecting data from which it is possible to obtain valid measures of the level of variability introduced as a result of using a sample.

The exact procedures for implementing sampling and its associated estimation procedures, both for nonresponse follow-up and for coverage measurement, must be scientifically established on the basis of available evidence. The panel finds that the Census Bureau is using such an approach to develop its procedures. The final procedures to be used must be shared with knowledgeable data users and other interested parties and must be clearly established prior to the conduct of the census. With this approach, the use of sampling in enumerating the population will be demonstrably free from influence aimed at achieving a particular result in a given geographic area.

CONCERNS ABOUT THE USE OF SAMPLING

There are several objections that have been raised about the use of sampling as part of census procedures. Some concerns are legal: we do not attempt to address those, which are largely outside the panel's area of expertise. We do note, however, that like a previous panel (Edmonston and Schultze, 1995) we have not seen evidence of any prevailing or significant legal opinion that the use of sampling, in the manner contemplated by the Census Bureau, would result in a census that did not fulfill constitutional requirements.

Page 8 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

In the remainder of this chapter, we discuss three concerns about the quality of results from a census that uses sampling:

Will uncertainties in the population counts, due to sampling variability, undermine public confidence in the results? Will state and local participation in the census process decline as a result?|
Will people assume that they do not need to respond by mail because the use of sampling means that their participation makes no difference to the results?
Will the use of sampling for nonresponse follow-up compromise the accuracy of small-area data used for redistricting, since estimates for small areas may have substantial sampling variability?

To ensure clarity, a brief review of sampling terminology precedes the discussion.

The Terminology of Sampling Variability

Throughout this report, the panel refers to "sampling variability," "sampling error," "confidence interval," ''standard error," and "coefficient of variation." These terms all refer to the measurable uncertainty introduced by sampling. For example, the media may report that 58 percent of the population drink coffee and qualify the statement by adding "plus or minus 3 percent." The "plus or minus 3 percent" is one way to express that there is sampling variability, or sampling error, inherent in the estimate from the particular sample design used to gather the data. The two terms, sampling variability and sampling error, have a general connotation and are used to express the fact that estimates made from samples have some known and measurable uncertainty (or variation). In contrast, "standard error", "confidence interval", and "coefficient of variation" refer to mathematically precise expressions of this variability for a particular estimate from a particular sample design. Both the sample size and other aspects of the sample design (respondent selection scheme) affect the amount of uncertainty, which can be expressed as a standard error, coefficient of variation, or a confidence interval. When an estimate is made from sample data, whether it is a percentage, a count, or some other measure, the standard error of the estimate is expressed in the same units of measurement (percentages, counts, etc.).

As a general rule in statistics, one can be 95 percent confident that the process of drawing a sample and computing a range defined by first subtracting and then adding twice the standard error of the estimate to the estimate itself, will yield a range that includes the true value. The coffee drinking example expressed one such 95 percent confidence interval: that 58 percent plus or minus 3 percent of the population drink coffee and, thus, that one can be confident at the 95 percent level that the true value lies between 55 and 61 percent. The standard error is roughly half of 3 percent, or 1-1/2 percent. Or consider an example of estimating the population size of a small city instead of percent of coffee drinkers. A sample might yield an estimate of say, 100,000 people. The standard error of this estimate, dictated by sample design parameters, will be expressed in the same unit. If the standard error is 2,300 people, a 95 percent

Page 9 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

confidence interval would be 100,000 plus or minus 4,600 (2 times 2,300), or from 95,400 to 104,600 people.

In order to compare the relative effect of sample designs on different kinds of estimates, one has to express the sampling variability in a standardized or comparable manner, the "coefficient of variation." The coefficient of variation is a relative measure, in contrast to the standard error, which is expressed as a percentage of the estimate itself. Thus, in the coffee example, the standard error of 1.5 percent becomes a coefficient of variation of .023 (1.5/58) or 2.3 percent. Coefficients of variation are always expressed as a percentage and are therefore directly comparable. A coefficient of variation can be converted to a 95 percent confidence interval by subtracting and adding to an estimate a quantity equal to twice the coefficient of variation (percentage) of the estimate itself. For example, 2.3 percent of 100,000 is 2,300, so twice that or 4,600 is subtracted and added to 100,000 to get an interval of 95,400 to 104,600 people.

When comparing the costs and other relative advantages and disadvantages of alternative sample designs, it is useful either to hold the expected coefficients of variation constant and look at the differences in cost or to hold the cost constant and look at the difference in coefficients of variation.

In sum, while sampling variability and sampling error are general terms expressing the fact that sample estimates have some measurable variability, standard error and confidence interval are specific measures of the variability of a specific estimate, and the coefficient of variation is a unit-free relative measure that expresses the standard error as a percentage of the estimate. This unit-free measure is easy to compare across different types of estimates and for alternative sample designs.

Sampling Error and Public Confidence

Knowledgeable census data users, especially state and local officials, are aware that the 1990 and earlier census data were not error-free at the local level. In fact, in some cases state and local officials found the results of these past censuses very much lacking in credibility. This problem was exacerbated by the fact that there was no procedure for quantifying the possible size of the error in a given jurisdiction.

Under these circumstances, a census that combines sampling with other procedures to improve the quality of the count has the potential to increase public confidence in the result. The fact that sampling error can be measured will provide confirmation that sampling errors for relatively large geographic areas are actually small. Furthermore, the panel believes that sampling will actually reduce nonsampling error because census resources can be strategically targeted to improve quality.

The Census Bureau must do a careful job in communicating to census users what sampling involves and what the resulting numbers will and will not mean. The idea of sampling as part of the procedure for obtaining population counts is novel, that it is important that its role be explained carefully and often. If this is done successfully, however, there is good reason to think that, through the use of sampling, together with other enhancements to census procedures, public confidence in census results can be increased.

Page 10 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

Sampling and the Mail Return Rate

The issue of whether the use of sampling for nonresponse follow-up might have an adverse effect on mail return rates is an important one. The Census Bureau has undertaken a number of enhancements aimed at increasing the rate of return by mail. It would certainly be counterproductive if the introduction of sampling were to have the unintended consequence of reducing the mail response rate.

The arguments suggesting that sampling for nonresponse follow-up might reduce mail response take two lines. The first is that households will make the calculation that if they fail to return the form by mail, there will be no in-person follow-up to collect a form. Thus, the household can minimize its expected total burden by failing to complete and return the form that it receives in the mail. The logic of this argument assumes that (in the absence of nonresponse follow-up sampling) households know that failing to return their forms by mail is certain to result in a visit from an enumerator to obtain it.

The second line of argument that mail response will go down as a result of sampling has the reverse underlying assumption. This argument is that, in the absence of sampling, householders will assume that an accurate count depends on their returning the form by mail; that if they do not return the form, their household will not be counted. If there is widespread public awareness of the use of sampling, however, then some householders will determine that the result of the count is unaffected by whether they return the form by mail or not, and so will not bother. Ironically, it is actually when no sampling is used that the result of the census is largely unaffected by whether a household returns its form by mail (setting aside prohibitive logistical and cost barriers to contacting every single nonresponding household). Thus, in the absence of any research about householder behavior in a real census, one might just as well argue that the absence of any sampling for nonresponse follow-up has traditionally held down mail return rates, since respondents expected in-person follow-up and might not have perceived added benefit from returning the form by mail. The panel has seen no evidence that households go through the kind of logical calculations hypothesized above when deciding whether to respond by mail.

The concern that the introduction of sampling procedures will have a noticeable effect on the rate of mail return is speculative at this point, with no evidence that such an effect will occur. There is an absence of empirical research on this topic. However, sampling was used in the 1995 census test, and although there was no control group, mail return rates below those that would have been expected without sampling were not observed. Frankly, in the absence of an organized negative publicity campaign, we consider it unlikely that any significant proportion of households will make any connection between their decision whether or not to return the form by mail and the use of sampling for nonresponse follow-up. Other factors, some under the control of the Census Bureau and others not, are much more likely to significantly affect the mail return rate: public perception of the importance of the census, trust in and respect for government, the clarity of the instructions, and the use of reminders and replacement forms, are all much more likely to affect the mail return rate than is the use of sampling for nonresponse follow-up.

Page 11 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

However, the panel does believe that it is important for the Census Bureau to communicate clearly to local authorities and knowledgeable users of census data what the plans for nonresponse follow-up sampling are, how they might vary from area to area (depending on the rate of mail return, for example), and the importance of obtaining a high rate of mail return for a successful census. A constant and clear message on these points, which does not oversimplify the issues involved, needs to be an important component of the Census Bureau's plans for 2000.

These issues do highlight a very crucial aspect of the sample design for nonresponse follow-up. As we discuss in detail in Chapter 5, it is important that the sample design does not have the consequence that areas with high mail return rates have counts subject to greater levels of sampling variability than areas with low mail return rates. An increase in the mail return rate should (at the least) lead to no increase in the level of sampling error. In fact, a design in which there is a modest improvement in the level of sampling error as the mail return rate increases might give incentives to local governments and interest groups to increase mail return rates. Such a plan might counter the notion that the use of sampling will lead to a decrease in public participation in the census.

The Accuracy of Small-Area Data

The mathematics of sampling and estimation are such that, for a given sample design, the level of sampling error, relative to the size of the population to be counted, will increase as one moves to smaller and smaller geographic units. In general, other sources of census error tend to remain constant on average across units, relative to population, as one considers increasingly smaller geographic units.

Consider, for example, the errors that result from housing units being missed from the Master Address File. If they are missing at an average rate of 1 percent per block, then for a large geographic area that consists of perhaps 10,000 blocks (such as a congressional district), the rate of missing addresses will also be about 1 percent. But the coefficient of variation due to sampling error will only be one-hundredth the size for the congressional district that it is for the blocks in that district.

With any reasonable sampling strategy that the Census Bureau might adopt, the level of sampling error would be very small for a large geographic area, such as an urban county. In fact, as pointed out in the panel's first interim report and above, at such a geographic level there is a good case to be made that the introduction of sampling procedures can lead to an overall net reduction in errors because the errors that can be reduced when sampling is used outweigh the sampling error introduced. Inevitably, however, this does not hold true at very small levels of geography. With any reasonable sampling scheme that might be used nationally, there will be some levels of aggregation (for example, census blocks) for which the census count will be less precise on average and would arguably not be an improvement over what could be obtained without the use of sampling.

The important point to note here is that for the counts for census blocks, the level

Page 12 Cite

Suggested Citation:"2 Application of Sampling Procedures." National Research Council. 1997. Preparing For the 2000 Census: Interim Report II. Washington, DC: The National Academies Press. doi: 10.17226/5886.

×

of sampling error is, relatively speaking, not an appropriate criterion for judging the quality of the census. Although block counts may contribute to the congressional redistricting process, for example, it is important to keep in mind that the results in a redistricting process are the counts for the congressional districts that are eventually created (and to a lesser extent, the counts for districts that were, or conceptually might have been, considered but were discarded). For these kinds of counts, the level of sampling error will be modest because the larger the number of observations used for an estimate, the smaller its sampling error will be.

Thus, in the panel's view, the important considerations for evaluating whether the amount of sampling error present in the census process is acceptable are not those that relate to counts for very small units, such as blocks. It is clear that at that level, sampling error may be substantial in some cases (again, relative to the size of the block). The evaluation of sampling error should take place for the geographic level counts that have important legal, political, or financial implications. For such levels, a census that uses sampling can achieve results that are at least as good as those from a more time-consuming and expensive effort to obtain a completed form for every household.

In summary, then, as we have stated before, the panel concludes that the use of sampling and statistical estimation are important components of the plans for the 2000 census. Both sampling for nonresponse follow-up and sampling for integrated coverage measurement are key to the successful conduct of an affordable enumeration of adequate quality in all parts of the country. Although each type of sampling improves both efficiency and quality, sampling for nonresponse follow-up will make the greatest contribution to cost savings, while integrated coverage measurement contributes more to improved accuracy. The Census Bureau needs to carry out further research to develop the specific details of how each of these components is to be conducted and how they are to be integrated. The Census Bureau must also be careful to inform knowledgeable users of the methods to be used and the reliability of the counts that will be obtained.

If sound procedures are developed by the Census Bureau and communicated to users, the panel believes that it will be possible for the Bureau to address all reasonable potential objections to the use of sampling and to satisfy users that the use of sampling has added to the soundness and quality of the 2000 census, rather than detracting from it.