3
Reconsideration of Important Census Bureau Decisions

The planning of a decennial census begins at least 10 years before the first questionnaire is mailed out. Some decisions must be made relatively early in the decade, for example, because of the need to procure equipment or because of limited testing opportunities. While the panel supports the fundamental decisions that the Census Bureau has made in planning for the 2000 census regarding the use of sampling for nonresponse follow-up and integrated coverage measurement, various decisions that the Census Bureau was required to make early in the 1990s that cannot be changed until the 2010 census planning cycle—some supported and some not supported by this panel—need to be revisited for 2010. (One important reason for reconsidering many of these decisions is technical and methodological advances that are either likely or expected before the next census.) This should be done with the benefit of evaluation results and data collected from 2000.

This chapter discusses the following features involved in these decisions, some of which will be used in 2000 and some not, in roughly the chronological order of their appearance in the census process:

  • the decision to carry out a full master address file canvass prior to the census;
  • the decision not to move census day;
  • the use of multiple response opportunities;
  • the use of blanket replacement questionnaires;
  • the use of four sampling rates for the long form (assuming use of the long form in 2010);


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 37
--> 3 Reconsideration of Important Census Bureau Decisions The planning of a decennial census begins at least 10 years before the first questionnaire is mailed out. Some decisions must be made relatively early in the decade, for example, because of the need to procure equipment or because of limited testing opportunities. While the panel supports the fundamental decisions that the Census Bureau has made in planning for the 2000 census regarding the use of sampling for nonresponse follow-up and integrated coverage measurement, various decisions that the Census Bureau was required to make early in the 1990s that cannot be changed until the 2010 census planning cycle—some supported and some not supported by this panel—need to be revisited for 2010. (One important reason for reconsidering many of these decisions is technical and methodological advances that are either likely or expected before the next census.) This should be done with the benefit of evaluation results and data collected from 2000. This chapter discusses the following features involved in these decisions, some of which will be used in 2000 and some not, in roughly the chronological order of their appearance in the census process: the decision to carry out a full master address file canvass prior to the census; the decision not to move census day; the use of multiple response opportunities; the use of blanket replacement questionnaires; the use of four sampling rates for the long form (assuming use of the long form in 2010);

OCR for page 37
--> the use of the sampling rate chosen for undeliverable-as-addressed vacant housing units; the obligation to use nonresponse follow-up to directly enumerate at least 90 percent of households in combination with mail response; the restriction to use at least a 1-in-3 sampling rate in areas of high mail response; the use of hot deck imputation for nonresponse follow-up and vacant households; the use of computer-assisted personal interviewing for integrated coverage measurement; the treatment of missing data in integrated coverage measurement; issues involving use of dual-system estimation; the decision not to combine demographic analysis with integrated coverage measurement; the prohibition against using integrated coverage measurement estimates that borrow information across states; the use of ''raking" rather than more complex modeling for small-area estimation from dual-system estimation; and the creation of a transparent household file. A Full Master Address File Canvass A complete master address file is crucial to a 2000 census that produces reliable small-area tabulations. In addition, the MAF needs to be referenced to the correct geographic location in the computerized census feature maps, referred to as the Topologically Integrated Geographic Encoding and Referencing (TIGER) system. The completeness and accuracy of the geographically referenced address list (MAF-TIGER) is important to provide adequate support for key data collection operations planned for the 2000 census: mailout and postal delivery of the census questionnaires for mailback return; census delivery of questionnaires for mailback return in rural areas; unduplication of multiple questionnaire responses from the same household, which results from multiple response options and mailout of replacement questionnaires; and enumerator field follow-up for nonresponse, including accurate sampling to achieve 90 percent direct enumeration in each census tract. To assure a high-quality MAF-TIGER, the Census Bureau has undertaken initiatives throughout the 1990s to keep these files up to date. At the national level, the Bureau has partnered with the U.S. Postal Service

OCR for page 37
--> (USPS) to make regular updates to the MAF based on the USPS delivery sequence file (DSF). The DSF provides a nationwide source for identifying new addresses to add to the MAF as originally developed for the 1990 census, including both addresses not included on the 1990 address list as well as addresses new from 1990. At the subnational level, the Census Bureau has partnered with local governments through the Tiger Improvement Program and the Program for Address List Supplementation (PALS) to identify the street location of new addresses added to the MAF and to supplement MAF improvements based on the DSF. As late as spring 1997, the Census Bureau anticipated that these efforts, combined with targeted field canvassing in urban areas1 and blanket field canvassing of rural areas would be sufficient to build a high quality MAF-TIGER for conducting the 2000 census. In its second interim report (National Research Council, 1997b), the panel stated that the Census Bureau had not demonstrated that it could effectively identify where the MAF-TIGER was deficient and then correct the deficiencies through targeted updating checks. By late summer 1997 the Bureau's internal evaluation determined that the national partnership with the USPS using the DSF file and the local partnership programs, in conjunction with targeted canvassing, would not produce the high-quality MAF-TIGER needed to conduct the 2000 census. Specifically, the DSF file missed too many addresses for new construction and was not updated at the same rate across all areas of the country (for details, see U.S. General Accounting Office, 1998). The PALS local government partnership program failed because of the poor response of local governments (often based on lack of resources) and the Census Bureau's inability to anticipate, make sense of, and process the various submissions (both paper and electronic) received from local governments. In September 1997 the Bureau announced a revised plan for achieving the needed quality in the MAF-TIGER. The new plan called for expanded field canvassing operations in 1998 and 1999 in a manner similar to the traditional, blanket canvassing operations used in prior censuses. This effort was to be in combination with an opportunity for local governments to review the Bureau's address list under the Local Update of Census Addresses (LUCA) program. The addition of expanded USPS operations (e.g., casing checks to update the address list after the census canvassing and LUCA operations) will then provide the final updates to MAF-TIGER just prior to mailout of the census questionnaires. 1   The urban canvassing was aimed largely at multi-unit structures and areas with conversions from single to multi-unit structures.

OCR for page 37
--> A dual canvassing strategy has been and is now being implemented in 1998 and 1999, making a distinction between areas that are largely urban with mail delivery based on city-style addresses and areas that are largely rural (which may or may not have city-style addresses), where USPS delivery is generally not based on a city-style address. For largely urban areas, the Bureau assumed that the existing MAF-TIGER was of sufficiently high quality to enable local governments to review the current address list, that is, to initiate the LUCA program, starting in the summer of 1998. Following the receipt of LUCA feedback from local governments, the Bureau will conduct a full field canvass of all urban blocks in 1999. The canvass will deal with all LUCA challenges, will check every third address in each block, and will check all addresses in multi-unit structures and structures where conversions from single to multi-unit status were occurring. For largely rural areas, the Bureau planned to conduct a full field canvass of housing units (listing all addresses and noting the location of housing units on census maps), starting no later than fall 1998. The resulting updates to the MAF-TIGER for rural areas will be distributed by spring 1999 to local governments for review under the LUCA program. The ability to carry out the above operations with little error depends on (1) the recruitment and supervision of a high quality field staff to carry out the expanded field canvassing operations in the limited time available, (2) the technical ability to manage the field data and its proper entry into the MAF-TIGER files, and (3) the ability to partner work with local governments to make the most of the local review of addresses and the resulting incorporation of verifiable corrections into the MAF-TIGER files. There is no time in the schedule of events leading up to the 2000 census to alter current plans concerning MAF-TIGER improvements. Given the Bureau's recognition of the uneven quality of the decade-long efforts to keep the MAF-TIGER up to date, it has returned to intensive field operations as the best way to achieve the level of uniformity, accuracy, and completeness needed to conduct the census enumeration as planned. The dependence of this strategy on securing the people and technical resources needed to make it succeed entails a risk: while the canvassing procedures to be used are not new, mounting a far larger and more expensive field effort than was originally planned and implementing it successfully in the time that remains may be difficult. Some important implications of the new MAF-TIGER improvement plan with respect to the issues, concerns, and recommendations presented in the panel's second interim report are discussed below.

OCR for page 37
--> Maf-Tiger Updating—Urban Areas The strategy of having LUCA precede the full-block canvassing is based on the desire to give local governments the best opportunity to participate in LUCA and to hold off on field canvassing until a date closer to the actual census. While this may be the best operational way to proceed, it does present some concerns. First, it assumes that the acknowledged limitations of the procedures used so far to keep MAF-TIGER up to date were not extensive so that it is worthwhile for local governments to review the existing list before field canvassing. The MAF-TIGER is not likely to be as accurate and up to date as desirable. This is particularly true with regard to multi-unit addresses. These limitations need to be clearly communicated to and understood by local governments reviewing these lists. Second, even under this plan, local governments participating in LUCA will only have 3 months to review the MAF-TIGER. This, coupled with limited local resources and capacity for systematically reviewing the Census Bureau's list, may result in targeted local review of areas that are thought to be especially incorrect or incomplete. This targeting may re-suit in some address list deficiencies that remain undiscovered. Third, the format in which MAF-TIGER files are provided for local government review and the requirements for how local officials report challenges to the address list may limit local participation in the review process. Only 34 of 60 jurisdictions participated in LUCA in the 1998 census dress rehearsal (U.S. General Accounting Office, 1998). Finally, the Census Bureau must incorporate the local challenges under LUCA into its field canvassing operations and then provide feedback to local governments on their challenges. It is extremely important that local governments are assured that the needed MAF-TIGER quality has been achieved in order to garner theft support for the enumeration operations that follow. LUCA Program The Census Bureau is soliciting participation in the LUCA program by sending letters of invitation to the highest elected officials of all units of local government. The response to this program, even with intensive Bureau telephone follow-up, may well be uneven in terms of the geography and population covered by participating governments. Those who elect to participate are invited to local LUCA training workshops (multiple sites in each state): the workshops were conducted in the spring of 1998 for jurisdictions with urban areas, and workshops for rural areas were planned for early 1999. These steps are necessary, but they may not

OCR for page 37
--> suffice to ensure a significant impact of the LUCA program on the final quality of the MAF-TIGER. As noted in the panel's second interim report (National Research Council, 1997b), the effectiveness of the LUCA program depends on the rate of participation by local governments, the extent and quality of the changes they propose, and the Census Bureau's ability to incorporate the needed changes and corrections and convey them back to the local governments. Ultimately, the quality of the expanded canvassing operations in both urban and rural areas will determine the major quality improvements to the MAF-TIGER. The LUCA program will be a contributor, although its impact may be as much one of perception as of making improvements to MAF-TIGER. Multi-Unit Structures The panel's observations in its second interim report regarding multi-unit structures remain relevant. The panel suggested that LUCA pay special attention to structures that have units either without clear or unique labels or units that are not clearly distinguishable. The panel also suggested that it might even be preferable in some cases to treat an entire structure as the "dwelling unit" for purposes of the MAF, nonresponse follow-up, and integrated coverage measurement (ICM). At this time, this idea should be seen as only a possibility for 2010, since it will create complications with respect to the sample design for nonresponse follow-up and for ICM matching rules, but it should be considered for 2010. Clearly, the expanded canvassing operations must address the problem of enumerating households in multi-unit structures. It is likely that many of the LUCA challenges from local government will involve multi-unit structures. This is where the current MAF-TIGER is likely to be weakest, in the absence of prior field canvassing before LUCA review in the urban areas. If the current MAF-TIGER is weak for multi-unit structures, this contributes to the risk that is taken in having local governments review the MAF-TIGER before final field canvassing. The revised Census Bureau plan, with its emphasis on a full precensus field check, removes from the Bureau the burden of predicting where its current files are inaccurate and then performing only targeted field checks to achieve the required level of quality. Instead, the Bureau is substituting a proven procedure, albeit one that is more expensive, which requires a large field implementation to succeed and that makes certain assumptions about the ability of urban areas to participate in LUCA before this field canvass will take place.

OCR for page 37
--> Recommendation 3.1: The panel endorses the Census Bureau's plan to conduct a full canvass of the areas covered by the MAF-TIGER, which began in the fall of 1998 and will continue through 1999. In addition, the panel recommends that the Bureau investigate the usefulness of other data sources for updating MAF-TIGER during the coming decade, including address lists and maps from private companies and residential housing data from property tax records and maps. Date of Census Day In preparing for the 2000 census the Census Bureau considered pursuing legislation to move the date of the census from April 1 to mid–March (while retaining the mandated delivery dates of state counts by December 31 and counts for redistricting by the following April 1). The panel regrets that this change was not pursued. The proposed date change would have had two major advantages: (1) a likely improvement in the quality of coverage and (2) some additional time to complete critical coverage studies, data processing, and analysis of results prior to the December 31 deadline for releasing state counts. Improvement in the quality of coverage would result from moving the census date away from the end of the month, when changes of address are most common. The concentration of moves at the time of the census leads to both a greater likelihood of households failing to complete and return a questionnaire and increased chances of duplicate reporting. Such reporting problems increase the volume and complexity of the workload associated with both nonresponse follow-up and coverage measurement efforts. A reduction in these workloads would result in both improved data quality and reduced costs. While the extent of improvements in the quality of coverage or reductions in costs is difficult to estimate, experience with the most recent census of Canada suggests that the benefits would be significant. Statistics Canada has not estimated the exact benefits, but agency officials attribute both lower undercoverage and reduced costs to a change in the 1996 census date from the beginning of June to mid-May. The two weeks or more of additional time that would be gained by moving census day from April 1 to mid-March would benefit census operations. Current schedules for completing the census data collection, post-enumeration coverage data collection, data processing, and review of results are extremely tight. There is little time to resolve unforeseen problems or to extend schedules where workloads have been underestimated. The added time could reduce the likelihood of errors of commission or omission, and improve data quality by extended nonresponse

OCR for page 37
--> follow-up, more thorough coverage studies, and more extensive quality checks. Recommendation 3.2: The panel recommends that Congress enact legislation to move the date of the 2010 census to mid-March. The panel is concerned that the Census Bureau has attempted to accommodate the failure to move the census day to earlier than April 1 by mailing out census questionnaires earlier. However, this does not gain as much time as might be thought since residents are not required to return the questionnaire until census day and it is not likely to lessen the difficulties posed by movers at the end of the month. Depending on how much earlier questionnaires are mailed, it could also exacerbate data quality and coverage problems since many people may complete and return their questionnaire well in advance of census day. This would increase the number of changes caused by moves, births, and deaths, which would have to be revised in coverage measurement studies and, with the use of a blanket second mailout, would likely increase the degree of multiple responses. Finally, it makes the census reference date somewhat ambiguous and may increase the number of situations in which people who move and are in the integrated coverage measurement survey have a different residence for April 1 or later but had a March residence for the census. Use of Multiple Response Opportunities For the past few censuses there have been complaints from individuals who thought either that their residence had been left off the census mailing list or that they had been omitted from the questionnaire returned for their household of residence. While the great majority of these complaints proved not to be true, the "Were You Counted?" programs of the 1980 and 1990 censuses provided an opportunity for those who believed they were missed to be included and thus were a useful public relations tool. The public relations value of the 2000 census analog, the "Be Counted" program, is therefore incontestable. "Be Counted" provides an easy way for residents who do not receive (or believe they did not receive) a census form or believe they were otherwise not counted to return a census questionnaire (available in various public locations) or to telephone in their response. This easy access reduces respondent burden and could target historically undercounted groups. This program was adopted from a suggestion presented to Congress by the U.S. General Accounting Office (1992). The forms are available in foreign languages, and individuals can request by telephone a census form in a large number of languages. A key

OCR for page 37
--> difference between the "Were You Counted?" and "Be Counted" programs is that the first was concurrent with nonresponse follow-up, and the latter is concurrent with the mailout/mailback portion of the census. One area worthy of more research is that of assessing the extent to which additional people are whole-household versus within-household additions and whether the households are included on the MAF. For whole-household additions that are from addresses on the MAF, this program would simply reduce the nonresponse follow-up workload. For whole-household additions not on the MAF and additions to MAF households that were partially enumerated, this program would reduce undercoverage. In the 1998 dress rehearsal, nonresponse follow-up was conducted on those forms that indicated that responses were for a partial household (and for "Be Counted" forms that were received late from MAF addresses). In addition, it might be useful to conduct field follow-up of some whole-household additions for addresses both on and not on the MAF. For addresses not on the MAF, this would be helpful in understanding how the MAF was deficient and in verifying additional households. For addresses on the MAF, this would be useful in determining whether the response of whole-household additions was accurate. Another area worth examining is whether respondents should continue to be required to report on the census form whether their response is for a whole or a partial household. The response to this question currently determines whether the household is included in nonresponse follow-up. There is evidence to suggest that this information may be inaccurate. Therefore, as mentioned above, the validity of this information should be examined. Along with the public relations and modest enumeration benefits, the "Be Counted" program raises one primary concern, which is that there is a potential to have many households in the 2000 census for which more than one questionnaire is returned (representing either the same household or a partial household).2 The frequency of this duplication in the 1995 test census was not excessive enough (15 percent of the responses were for individuals who were already enumerated) to produce an unfeasible amount of "unduplication." However, it might be a far greater problem in the 2000 census, because of either increased amounts of undiscovered unduplication or a more compressed time schedule, making unduplication either much more time consuming or error prone. Therefore, the results from the 1998 census dress rehearsal in evaluating the primary selection algorithm, which determines which forms are considered to be duplicates, should be used to better understand the problems 2   Another concern is the number of fictitious or incorrect enumerations that are received.

OCR for page 37
--> that this program will raise and to assess modifications that will make it more effective. Use of Blanket Replacement Questionnaires The panel has been enthusiastic about targeted mailing of replacement questionnaires to reduce nonresponse (see National Research Council, 1997a). Testing in a variety of situations indicated that this could have been one of the most important innovations in the 2000 census. Unfortunately, the size and time constraints of the 2000 census seem to require that replacement questionnaires be mailed to all census addresses, not only the nonresponding ones. While there is still likely to be a substantial increase in response among mail nonrespondents with a blanket mailing of replacement questionnaires, the panel foresees the potential for large numbers of duplicate responses. In addition, if what is being done is not well understood by the public, there is a possibility of a public relations problem if people feel unnecessarily bothered by the Census Bureau after responding promptly to the first questionnaire they received. Furthermore, the cost and environmental impact of a blanket second mailing are likely to elicit some negative comments from the public. Clearly, more analysis and experience with this technique are needed before firm recommendations can be given. It would be useful to have a direct analysis of the costs and benefits of the use of a blanket second mailing of questionnaires sometime early in the next planning cycle. The 1998 dress rehearsal will provide some important evidence as to the value of blanket replacement questionnaires, and final decisions about their use in the 2000 census should not be made until the dress rehearsal experience is evaluated. In addition, the Census Bureau should determine early in the next decade whether it will be technologically feasible to use a targeted replacement questionnaire in the 2010 census, since that is the strongly preferred procedure. Recommendation 3.3: If the 1998 census dress rehearsal gives any indication that there are substantial problems (of extensive duplication of returned forms or public dissatisfaction) associated with the use of a blanket replacement form mailing, this procedure should be dropped and only a reminder postcard sent to each household. Furthermore, the Census Bureau should explore all possible approaches to having available, for the 2010 census, technology that will permit targeted mailing of second forms only to households that did not return their first forms by a specific date.

OCR for page 37
--> Long-Form Sampling Rates The Census Bureau will use four long-form sampling rates for governmental units3 in the 2000 census (and used them in the 1998 census dress rehearsal): 1 in 2, 1 in 4, 1 in 6, and 1 in 8, depending on the number of housing units in a jurisdiction. This represents a change from the 1990 census—the addition of a fourth intermediate step, the 1-in-4 sampling rate. The panel endorses this and related decisions affecting the design of long-form sampling for the 2000 census. The Census Bureau is making one other change of note with respect to the assignment of long-form sampling rates: the sampling rate cutoffs will be based solely on the counts of addresses or housing units. In contrast, in 1990 the Bureau used a combination of population and housing unit counts to define the size of units for purposes of sampling. With these changes, the geographical units to which the long-form sampling will be applied will be configured somewhat differently than if the 1990 design were being used in 2000. The addition of a fourth sampling rate provides the Bureau with greater flexibility in achieving the goal of an overall long-form sampling rate of approximately 17 percent (1 in 6) of all addresses nationally—the same as in 1990—with more nearly equal precision for small areas. The fourth rate will reduce the disparity in coefficients of variation between areas of similar size that would have fallen on opposite sides of a threshold and as a result be sampled at 1-in-2 and 1-in-6 rates, respectively. The change might also help ease the transition from the census long form to the American Community Survey,4 which will also use four sampling rates. The 1-in-2 rate will be applied to governmental units (including school districts) with fewer than 800 housing units, while the new 1-in-4 rate will be applied to governmental units with 800 to 1,200 housing units. Governmental units that exceed this size will have sampling rates set by census tract. The 1-in-6 rate will be applied to census tracts with fewer than 2,000 housing units (that do not satisfy the above conditions for higher sampling rates), while the minimum 1-in-8 rate will be applied to census tracts of 2,000 or more housing units. These cutoffs were selected from simulation studies that considered a variety of factors affecting the overall response rate and the resulting coefficients of variation. With the addition of a fourth sampling rate, some units will be sampled at a lower 3   A governmental unit is a county, town, township, specified unincorporated area, school district, etc. 4   The American Community Survey is a proposed bailout/mailback survey of 3 million households annually using a so-called rolling sample design. The content will be similar to that of the decennial census long form.

OCR for page 37
--> pendent data-collection systems, the initial enumeration and the PES, are combined to obtain the required factors. (The precise formula for dual-system estimation is provided in Chapter 2.) As in the PES, in the Census Plus coverage measurement survey interviewers go to a sample of housing units and first ask the residents who lived there on census day. Census Plus adds a second phase to the interview in which the interviewer attempts to reconcile the roster from the first phase of the interview with the initial census enumeration that has been loaded into the interviewer's laptop computer, to obtain a "resolved roster." Little or no follow-up is conducted after this two-phase interview. For estimation purposes the resolved roster is regarded as the truth, so the adjustment factor is essentially the ratio of the count in the resolved roster to the count in the initial enumeration. Because Census Plus treats the resolved roster as final, the quality of its estimated adjustment factors is critically dependent on the completeness of that roster. Dual-system estimation, on the other hand, requires that the PES be statistically independent of the initial enumeration but not necessarily complete. It also requires a matching operation. Even if the independence assumption is not entirely correct (because people in various poststrata who are missed by the PES are also more likely than others to be omitted from the initial enumeration), dual-system estimation will usually be intermediate to the census and the true counts. (This argument is developed more fully in Chapter 4.) The completeness of the final roster was listed as an essential requirement for use of the Census Plus methodology in 2000 (see National Research Council, 1994, especially, Recommendation 4.3). In the 1995 test census, however, the Census Plus resolved rosters omitted many residents. Although this was due in part to processing delays in the test, which caused many interviews to be conducted without an initial enumeration roster for use in reconciliation, the 1995 experience suggests that the problems with undercoverage by Census Plus are unlikely to be overcome before 2000. The panel's independent analysis of the test evaluations (National Research Council, 1997b) pointed in this direction, and the panel supports the decision of the Census Bureau not to use Census Plus in the 2000 census. The PES methodology has been used before, notably in the 1990 census, in a form similar to that planned for the 2000 census, although the resulting counts were not used for apportionment or redistricting in 1990. The primary difficulty with the PES for 2000 concerns scheduling because of the additional follow-up operations that are required. A decennial census schedule that allows time for the PES is very tight, and the panel looks forward to the results of efforts by the Census Bureau to accelerate the PES by allowing it to partially overlap the initial enumeration time schedule.

OCR for page 37
--> The PES also must be adapted to be used jointly with nonresponse follow-up sampling. This is an important issue for people who move between census day and the PES enumeration. In 1990 the PES sample consisted of people resident in sample blocks at the time of the PES survey (''PES-B"), but in the 2000 census some of those people will have moved in from other blocks where only a sample of the households were included in nonresponse follow-up. Therefore, under the PES-B design, it is sometimes not just difficult (as in 1990), but impossible to determine who would have been included in the original enumeration if they had not responded by mail and not been sampled in nonresponse follow-up. For this reason, procedures for the 1995 and 1996 test censuses defined the PES sample as consisting of people resident in PES sample housing units on census day ("PES-A"); when a household moves shortly after census day, the PES requires finding and interviewing the family that moved out. The Census Bureau's plan for the 1998 census dress rehearsal called for use of a hybrid, third method ("PES-C") in which the match rate is estimated either through the use of proxy information collected from the people moving in or at times by reinterviewing the family that moved out, but the number of people in "mover" households (where "mover" means the broad population of households that move at this time) is estimated from the residents found in the PES. It is thought that this estimated number is superior to that obtained from outmovers (for details, see Bureau of the Census, 1997). The panel believes that the mover problem can be solved, and it urges the Census Bureau to act more quickly to develop and test methodology for the treatment of movers. (The problem may be reduced through more expedited nonresponse follow-up.) Borrowing Information Across States One of the arguments against adjusting the 1990 census was that the empirical Bayes regression smoothing, which was used to borrow information across the 1,392 (original) poststrata, used information from other states to produce estimated counts for a given state. The empirical Bayes regression was needed for the following reason. The use of nearly 1,400 poststrata, defined using demographic characteristics, owner/renter, and geography, produced aggregate information on census undercoverage that had generally reduced bias compared with aggregates using fewer poststrata. However, the resulting estimates had relatively high variances. To reduce these variances, smoothing across poststrata was conducted. Using some assumptions (e.g., that undercoverage for, say, black men aged 18 to 40 in metropolitan areas in Louisiana is apt to be very similar to undercoverage for black men ages 18 to 40 in metropolitan areas in South Carolina, and that shifting from one age group to another

OCR for page 37
--> should have similar effects on undercoverage across poststrata), information from different poststrata was used to reduce the variance of the estimated undercoverage for a given stratum, possibly without appreciably increasing the bias. Smoothing or blending information from similar or easily related situations, known as "borrowing strength," can produce estimates with less overall error. Methods such as empirical and hierarchical Bayes regression modeling and closely related variance component estimates have been shown to have very desirable properties in a variety of applications (see, e.g., Gelman et al., 1995). The need for smoothing could have been reduced had the 1990 PES been as large as initially planned. It was originally planned to have 300,000 housing units, a sample size chosen so that most direct state estimates would have been of marginally acceptable precision, though substate estimates would still have had considerable variability. However, the ultimate sample size was only about 160,000 housing units, which necessitated borrowing of information across states to obtain state-level estimates of marginally acceptable precision. Given the 750,000 housing unit PES currently planned for the 2000 census, which will permit useful estimates at lower geographic levels than the planned PES in 1990, there should be less need for smoothing. Undercoverage of individuals in the United States is geographically determined to some extent (e.g., undercoverage is related to whether an area is urban, suburban, or rural). (Hengartner and Speed, 1993, examine the extent to which undercoverage is geographically based.) Factors other than geography also strongly affect census undercoverage. For example, the likelihood of undercoverage is higher for people whose residence is in a multi-unit structure than for those whose residence is a detached house, a factor that can vary within a single city block. It seems plausible to expect that undercoverage is similar for individuals in areas with otherwise similar characteristics that fall into states in the same region that are also generally similar. (However, there are at least two exceptions that are discussed below.) Therefore, aggregating information on census undercoverage across states is likely to improve estimated counts by reducing variance and not substantially increasing bias. Although there are clear technical benefits from this blending of information, there is an important political concern that the responses of people outside a state could affect another state's estimated count and hence its congressional apportionment. (The advantages from a public acceptance standpoint of constraining each state's estimated counts to derive directly from information collected in that state are discussed by Fay and Thompson, 1993.) In addition, there are some difficulties in explaining these methods to nonexperts, and the ease of communication of methods for such an important purpose has its advantages. Finally, it

OCR for page 37
--> is possible that some state effects may not be negligible—that is, areas in one state may have coverage rates that differ from those in similar areas in nearby states. An example could be when one state's census offices are more effectively run than another's, due to local economic conditions. Another example might be when the political climate in one state results in substantially different refusal rates. A same-state constraint was not adhered to in the 1970 census. In that census the vacant/delete check was carried out on a sample basis, and state estimates used information on the residency status of "vacant" dwellings from other states. In the case of fund allocation, the distribution of Title I funds (under the Elementary and Secondary Education Act) is based on estimates of the number of poor children in counties, and regression models that blend information across counties and states are used to allocate considerable amounts of federal funds to counties and states (for a description of the method, see National Research Council, 1998). Also, with the use of sequential hot deck imputation, it is clear that at some small level of geography, the analogue of the same-state constraint is not adhered to. So, the principal rule that an area's political and monetary allocations are to be derived from information collected only from that area has not been consistently applied in past censuses or for important fund allocation programs. Also, as is relatively clear from the Title I fund allocation example, it is not even a principle that should be adhered to in all circumstances. Any use of demographic analysis, such as the promising methods examined by Bell (1993), would fail to meet the same-state principle. The same-state constraint has strong sample design implications in that the need for direct state estimates of some threshold accuracy implies less accuracy for substate areas in large states, which has implications for the accuracy of counts of demographic groups. Furthermore, this constraint would dramatically reduce the demographic detail that could be used in the poststrata, which would result in greater heterogeneity in poststrata with respect to census undercoverage. Given the planned size of the PES (750,000 housing units), the Census Bureau is limited to about 1,000 or fewer poststrata, based on the 1990 experience. Given state-level estimates, this would mean 20 or fewer poststrata per state. One could argue that undercoverage differs substantially with respect to age, sex, race/ethnicity, and owner/renter status, possibly requiring as many as five age categories, two sex categories, three or more race/ethnicity categories, and an owner/renter dichotomous category, which results in much more then 20 poststrata in a state. As a result, a good deal of collapsing of poststrata will be required. (The Census Bureau is examining the use of "raking" to counts from aggregate poststrata to enable the use of more factors. See below for a discussion.) Of course, if state-level

OCR for page 37
--> effects are substantial, this collapsing would be justified. However, if state-level effects are small, this collapsing will result in poststrata that join people with relatively more heterogeneous rates of census undercoverage. Given the constraint that each state's counts be based on information collected only from that state, it is still possible to share information for substate allocation of population, as well as for congressional redistricting, when there is evidence of common patterns across groups of states. In 1990, information was used across states to produce both state counts and substate shares. Given that distributing counts within states does not violate the above constraint, its use should be considered separately. First, consider restricting a state's estimated total count to be based only on information collected from individuals in that state. There are certainly real advantages to this restriction. It could reduce or eliminate a source of bias as discussed above—for example, if Maine's undercount is systematically greater than Vermont's within the same poststrata (groups defined by demographic characteristics, owners/renters, etc.). Maine's estimated count could be lower than it should and would be if this constraint were not observed. Also, given the highly political role of state counts, not blending information is easy to understand and has great face validity. Furthermore, it could be required on legal grounds given its mention in a previous case before the Supreme Court (94–1614, 94–1631, and 94–1985; March 20, 1996). Yet the associated cost of observing this constraint could be substantial. As mentioned above, this restriction makes inefficient use of the information collected, so that sampling error is larger than it would be through efficient use of information across states. As the National Research Council (1994:125–126) notes: At one extreme, a criterion of equal coefficient of variation of direct population estimates in every state (equal standard error of estimated ICM adjustment factors) would imply roughly equal sample sizes in every state, despite the 100-fold ratio of populations between the most and least populous states. Such a design might be drastically inefficient for estimation of adjustment factors for domains other than states. At the other extreme, a criterion of equal variance of direct population estimates for every state would imply larger sampling rates (and therefore disproportionately larger sample sizes) in larger states. In order that all state estimates have a small coefficient of variation, the prohibition of not borrowing information requires that the PES sample be concentrated in small states, thereby increasing the variance of estimated counts for larger states. Also, observing this constraint does not permit, for the smallest states, substrate estimation with low coefficients of variation at any level of detail.

OCR for page 37
--> The panel recognizes the reason for this decision, understanding that it is based on legal and political factors that are beyond the expertise and scope of a technical panel. However, there is no reason to exclude this procedure in all future censuses. A key issue is the extent to which undercoverage is related to state effects. Research on this issue would be very important to help understand the advantages obtained from observing this constraint, looking toward 2010. The second form of this constraint is restricting the allocation of state population shares to substate areas based only on information from that state. Similar to the argument above, assuming there are consistent patterns to substate variation in adjustment factors, accepting this constraint increases the sampling variance for estimates of substate population shares. Models that allocate substate shares need to be considered separately from the methods used to estimate state population counts because substate estimates can always be controlled to add up to a given state estimate. Two final points are important to mention. First, besides congressional apportionment, census counts are used for official purposes at various levels of aggregation, some relatively low. The constraint that an estimate be determined only using data from its geographic region is a constraint that, when viewed in its absolute form, could be extended to assert that the estimates at any level of geographic aggregation should be constructed from information collected directly from those areas. At some level of aggregation this has not been true of any modem census, is clearly unnecessarily restrictive, and therefore should not be instituted. Second, since congressional apportionment involves allocating a fixed pie of 435 representatives to the 50 states, every state's estimated count, directly estimated or not, affects every other state's apportionment. Recommendation 3.8: The panel supports the decision of the Census Bureau to produce state total estimates using the 2000 census that are derived only from data collected within a given state. For the 2000 census, models across states should be examined for use in allocating populations within states. Both forms of the constraint on estimates that are based solely on data from a given state should be reexamined with respect to the 2010 census. The Use of Raking The PES in the 2000 census is designed to support direct estimation for each state—that is, calculation of population estimates based only on data from a given state. PES samples will not be large enough, however, to support high-quality direct estimates for many important substate

OCR for page 37
--> areas, such as counties, cities, and congressional districts. To generate estimates for those areas, the Census Bureau has decided to use synthetic estimation (see below), possibly combined with raking (iterative fitting of tables of counts to marginal totals). Synthetic estimation (see Cohen, 1989) is a method of distributing estimated additional counts over those from the initial enumeration to small areas. Suppose that adjustment factors have been estimated for each of several population groups making up the population of some relatively large area, such as poststrata defined by race, age, and tenure (owner/renter) in a section of a state. The synthetic estimate of population within a poststratum for a smaller area, even a single census tract or block,12 is obtained by applying the same adjustment factor to all people in the smaller area from the poststratum. Then the sum of adjusted population counts from each of the poststrata represented in the smaller area gives the adjusted count for that area. Synthetic estimation is therefore a simple method that assumes that for each poststratum, the undercoverage rate is constant across the area for which the adjustment factor is estimated. Although this assumption can only be approximately true, the synthetic estimates still should be more accurate than direct estimates at low levels of aggregation, since direct estimates would be based on very small samples. Synthetic estimation also has the somewhat conservative property that the adjustment for a poststratum in a small area is never more extreme than that estimated for the poststratum in a larger area, unlike some regression methods that can extrapolate beyond the range of values estimated directly. To smooth the adjustment factors for poststrata defined within substate regions, the Census Bureau (see Farber et al., 1998) is considering use of a raking ratio adjustment. This methodology is described here as it might be applied at the state level in 2000. (This approach was tested in the 1998 census dress rehearsal, though it differed in some details from the decennial application because of the small geographic areas of the dress rehearsal sites.) Each state is divided into several geographical subregions and several sociodemographic population groups (defined by such variables as race, age, and tenure). For each poststratum, defined as the intersection of a population group and a subregion, a separate dual-system estimate is calculated. Because these direct estimates are based on small samples and therefore have high sampling variability, they are not 12   This report does not address details of the necessity for producing integral counts for blocks. To do this the Census Bureau has historically made use of a linear programming routine for rounding, and the plans are to repeat this in 2000. The panel did not examine this procedure.

OCR for page 37
--> used without modification to adjust population estimates within the poststrata. Instead, they would be combined to obtain direct estimates for subregions and for statewide population groups. Next, a model is fitted that calculates an adjustment factor that is the product of a factor for the subregion and one for the population group. These factors are calculated so that the total population for each subregion and for each (statewide) population group agrees with the corresponding directly estimated total. (This is often referred to as iterative proportional fitting. Technically, a log linear model for adjustment factors is fit to the population data.) The poststratum adjustment factors are used in synthetic estimation as described above, which preserves consistency with direct estimates for substate regions and statewide sociodemographic groups. The combination of synthetic estimation and raking is a reasonable approach to substate estimation, especially under the constraint that there can be no sharing of information across states. Alternatives to this approach include empirical Bayes regression models, which were used in 1990 when there was no effort to have state-only estimates. (Observation of this constraint in this case would require 51 different regression models.) This approach required the use of variance smoothing models that were the subject of some debate. One advantage of this approach was the easy incorporation of additional covariates (possibly) predictive of census undercoverage. While this (and other) alternative methods have some advantages over the current planned approach, the panel agrees with the decision of the Census Bureau to use the relatively well-understood set of methods for the 2000 census. Estimates (especially for small areas) can be affected to some extent by the details of the approach to modeling adopted by the Census Bureau, including the definition of substate geographic areas and demographic groups. The panel urges the Census Bureau to give high priority to research that will permit an early statistically based decision on these detailed issues in time for the 2000 census to conform with prespecification to the extent possible. In the long run, a wider range of models should be considered for use in future censuses. This research should consider the possibility that, even if state estimates are required to be direct, estimation for substate areas might be improved by using models that pool some information across states, as discussed above. Recommendation 3.9: The panel endorses the proposal to use raking ratio estimation to obtain substate estimates. Research should continue to define the poststrata and geographic regions as quickly as possible for the 2000 census and to examine alternative modeling options for use in 2010.

OCR for page 37
--> A Transparent Household File In 1990 there was no attempt to estimate the household characteristics of those persons added based on the PES; they were instead assigned to special quarters of unrelated persons. This procedure presents two problems. First, census data users need to know population characteristics on a household basis. Second, a household file then does not reflect the fact that many households are counted incorrectly in the initial census enumeration. Consequently, the Census Bureau has worked to develop methods to produce a household data file that is consistent with the official person counts produced by integrated coverage measurement, while assigning all persons to realistic households. Isaki et al. (1997) propose a method for adjusting the frequency of household types in an area to produce person counts consistent with those obtained from dual-system estimation. The procedure uses two inputs: estimated counts by an age/race/ethnicity/sex/tenure category for a state (or substate area) and frequency of household types defined by the number and characteristics of the inhabitants within the census enumeration for the same area. This method produces adjustment factors for each household type such that the distribution of person characteristics for the adjusted household file matches almost exactly those from dual-system estimation. Because many sets of household adjustment factors could duplicate the estimated person counts, an additional criterion is needed. Isaki et al. selected a set of factors that essentially minimize the change in the distribution of household types according to a simple mathematical criterion. They evaluated their methodology on 1995 test census data from both the Paterson, New Jersey, and the Oakland, California, sites using 42 person categories and about 350 household types. This model "weights up" households at a rate that depends on the number of members of undercounted groups that are in a household, which does not necessarily correspond to underestimation of the number of households of that detailed type. There are two potential problems with this methodology. First, it might distort the distribution of household types at low levels of aggregation. For example, if dual-system estimation demonstrates that young adult males were substantially undercounted, the method might increase the number of households consisting of several young men even if the undercount resulted from missed enumerations in other types of households. Similarly, the method might need to delete some households consisting of person types that were counted accurately (e.g., elderly females) to compensate for similar persons added in other household types. Second, even if the correct adjustment factors are known, using them might have unintended effects

OCR for page 37
--> that would disturb the person counts for small areas. For example, if a certain type of person (i.e., in a particular poststratum) is heavily undercounted, the households with large numbers of that person type will tend to be weighted up and those with few or none will tend to be weighted down. If the households with multiple persons of that type are concentrated in certain areas, those areas would be given additional population at the expense of others where households have few members of that type. This would not agree with the synthetic estimates for those areas. A similar situation could arise if the people in the undercounted group fall into the same households as people of an overcounted group in one area but are in separate households in another area. While there is no way to know for sure whether one set of estimates is superior at this level of aggregation, the synthetic estimates derived directly from estimated person counts are based on assumptions that the panel thinks are more plausible. This potential discrepancy for small-area counts could be eliminated or reduced greatly by controlling person-type counts for areas much smaller than a state. However, that modification might inordinately increase the range of household-type adjustment factors and, consequently, have a greater potential to distort the distribution of household types. The difficulty in deciding between various approaches is mainly a result of the limited amount of information used from dual-system estimation about attachment of person types to households. Because there is no guarantee that these problems can be addressed adequately by 2000, the Census Bureau has decided not to produce a household data file as part of the official census 2000 products. Instead, imputations (in all census data products) used to account for the undercounted population will have a special nonhousehold category designation used for household characteristics. The panel concurs with this decision but also believes the Bureau should continue to address this important issue. Accurate assignment of persons to households would benefit from direct evidence about the types of assignment errors made in the basic census enumeration and about the true distribution of household types. The panel encourages the Census Bureau to conduct research on methods for using integrated coverage measurement to estimate the frequency of household-type assignment errors. The panel is aware of recent work on the transparent file that addresses many of the above concerns and strongly supports research in this direction. The panel also urges that a transparent file be produced from the 2000 census for research use. Finally, placement of integrated coverage measurement additions into a special nonhousehold category serves to compromise the goals of a "one-number" census, since it makes unadjusted counts easy to construct.

OCR for page 37
--> This is additional strong motivation to address this problem before the next census. Recommendation 3.10: The panel concurs with the decision of the Census Bureau not to use a transparent file to provide household assignments for persons added through use of integrated coverage measurement in the 2000 census. However, the Census Bureau should continue research on production of public-use files that are consistent for persons, housing units, and households, along the lines of current research on a transparent file. Considerable effort should be taken to avoid use of a special nonhousehold category in the 2010 census.