CHAPTER 5
Enumeration and Data-Processing Methods

BUSINESSES WILL NOT TAKE A HOLIDAY, travelers will not cut short their trips, citizens will not simply stand still and stay put—in short, the collective life of the nation will not take pause on April 1, 2010, simply because it will be Census Day. Nor has the population taken pains to make itself easy to count on any previous Census Day. Instead, when 2010 arrives, the Census Bureau will confront what it has always faced: an increasingly dynamic and diverse population, in which each person and household varies in both willingness and ability to be enumerated in the census. As a result, census-taking involves a continual search for methods to maximize participation in order to collect information on as many willing respondents as possible. It also involves the need for strategies to do everything possible to count those whose economic and living circumstances make it difficult for them to be enumerated by standard means.

For 2010, the Census Bureau proposes to make a significant change to its tool kit of enumeration methods. Relying on a short-form-only census and improvements to its geographic resources, the Bureau hopes to make use of a new generation of portable computing devices (PCDs) to enhance both the ease



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges CHAPTER 5 Enumeration and Data-Processing Methods BUSINESSES WILL NOT TAKE A HOLIDAY, travelers will not cut short their trips, citizens will not simply stand still and stay put—in short, the collective life of the nation will not take pause on April 1, 2010, simply because it will be Census Day. Nor has the population taken pains to make itself easy to count on any previous Census Day. Instead, when 2010 arrives, the Census Bureau will confront what it has always faced: an increasingly dynamic and diverse population, in which each person and household varies in both willingness and ability to be enumerated in the census. As a result, census-taking involves a continual search for methods to maximize participation in order to collect information on as many willing respondents as possible. It also involves the need for strategies to do everything possible to count those whose economic and living circumstances make it difficult for them to be enumerated by standard means. For 2010, the Census Bureau proposes to make a significant change to its tool kit of enumeration methods. Relying on a short-form-only census and improvements to its geographic resources, the Bureau hopes to make use of a new generation of portable computing devices (PCDs) to enhance both the ease

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges and accuracy of interactions between census enumerators and follow-up respondents. These devices have great potential to improve census-taking, but their development process involves considerable risk and uncertainty (Section 5-A). In Section 5-B, we comment on several areas—including group quarters and residence rules—where both the 2000 census experience and continually changing societal influences suggest the need for redefinition and recalibration of census-taking approaches. We then turn in Section 5-C to two particular enumeration challenges—representing different extremes of urbanicity—that we believe deserve attention in 2010 census planning. Section 5-D comments on the Census Bureau’s plans to expand the means by which respondents can return their census information, including wider use of the Internet and telephone. Finally, the 2000 census experience focused attention on two processes—unduplication of person records and imputation for nonresponse—that are more commonly thought of as late-stage, data-processing functions. However, lessons learned from the 2000 census coupled with new technology will make these processes—and management of the critical trade-offs in cost and accuracy associated with them—a fundamental part of the enumeration strategy of the 2010 census (Section 5-E). As we discuss PCD plans, it is important to note that the term “portable computing device” (PCDs) is the panel’s, and not currently the Census Bureau’s, usage. The Census Bureau uses the term “mobile computing device” or, more frequently, simply “MCD” to refer to the computers. However, the choice of MCD as a label is unfortunate because it conflicts with the abbreviation for “minor civil division,” a long-standing concept of census geography that refers to the subcounty (township) divisions that are functioning governmental units in several Midwestern and Northeastern states. For this report, we have adopted the compromise label of PCD. 5–A PORTABLE COMPUTING DEVICES Since the 1970s, computer-assisted interviewing (CAI) has emerged as a major element of modern survey methodology. Development began with computer-assisted telephone interviewing

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges (CATI), allowing interviewers to administer a questionnaire via telephone and capture responses in electronic form. In the late 1980s, the emergence of portable laptop computers initiated a second wave of development, computer-assisted personal interviewing (CAPI), in which interviewers conduct face-to-face interviews with respondents using a computerized version of the questionnaire on the laptop. The automated data capture that follows from CAI methods, along with the capacity to tailor questionnaires to individual respondents through “skip” sequences jumping to different parts of the questionnaire or customizing question text based on information already collected in the interview, have proven enormously advantageous, even though cost savings have proven elusive (see National Research Council, 2003b). In the 2000 census, field staff used laptop computers to collect data as part of the Accuracy and Coverage Evaluation Program. Hence, it is natural that plans for 2010 revisit—and try to improve—CAI implementation in the census. During the past decade, a class of miniature electronic devices has entered the marketplace and continued to mature—the handheld computers commonly known as personal digital assistants (PDAs); they are also sometimes called pen-based computers, since the principal means of interacting with many of the devices is through handwriting on the screen.1 Today, most of the devices use the Palm operating system or a Pocket PC (Windows CE) version of the Microsoft Windows operating system. As the technology continues to mature (and to get cheaper, faster, and more powerful), survey organizations have increasingly tested the potential use of these devices for their work. More recently, tablet computers (roughly, a hybrid device with the computing power and screen size of a laptop computer but in a one-piece, keyboardless design using handwriting recognition) and “smart phones” (combining PDA and cellular phone functions) have arrived on the market, and survey organizations have considered those devices, too, in the quest to outfit interviewers with easily portable but survey-capable computers. 1   As text pagers and handheld e-mail devices have become more common, several newer PDAs use a miniature keypad in lieu of a pen/stylus and handwriting recognition.

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges For the 2010 census, the Census Bureau proposes to use portable computing devices “that will enable enumerators to locate and update address information for housing units, conduct interviews, transmit the data directly to headquarters for processing, and receive regularly updated field assignments” (Angueira, 2003b:3). In doing so, the Bureau hopes to exploit another increasingly common technology—global positioning system (GPS) receivers that can be embedded in portable computers. The devices that the Census Bureau is testing are of the Pocket PC class and, though no decision has been formally made, the information available to the panel through discussions with Bureau staff suggests that the current vision for the device in 2010 is of the same size as the current Palm/Pocket PC models. In the panel’s earliest discussions with the Census Bureau about the prospects of PCD use, the principal arguments raised by the Bureau in support of the plan were: that PCDs would save field costs by helping field staff complete their work more efficiently and without getting lost, that PCDs would save costs on paper and forms, and that PCDs would achieve familiar CAPI benefits such as automated data capture. In more recent interactions, though, the savings-through-better-navigation argument has been downplayed while much more emphasis has been put on savings on paper costs. Indeed, the Bureau’s draft baseline design for 2010 maintains that “through the use of automation and [PCDs], we will significantly reduce the amount of paper used in the field (including questionnaires, address registers, maps, assignment sheets, and payroll forms) and the large number of staff and amount of office space required to handle that paper” (Angueira, 2003b:3). 5–A.1 Testing PCDs: Pretests and the 2004 Census Test As portable computing devices began to emerge as a focus of the 2010 census plan, the Census Bureau initiated small pilot tests involving basic skills. For instance, small numbers of field staff with different levels of local familiarity were assigned to find a set of addresses using TIGER-based maps on a Pocket PC-class device in a pilot test in Gloucester County, Virginia. This test concentrated only on locating features using a small-screen map and

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges not on using the computer to calculate a route to those features. In addition, the devices used in the test were not equipped with GPS receivers, so the test was not meant to assess the locational accuracy of the designated addresses (U.S. Census Bureau, Mobile Computing Device Working Group, 2002). The 2004 census test is intended to serve as the first major proving ground for portable computing device usage in the census, and to that end is more comprehensive than the earlier pilot tests. A Pocket PC-class device equipped with a GPS receiver has been selected for the test. In addition to continuing to gauge enumerator reaction and ability to use the devices with a short amount of training, the primary thrust of the test is to assess the performance of a basic workflow for the devices. In 2000 and previous censuses, assignment of enumerators’ workloads was quite hierarchical in nature, filtering from local offices down to crew leaders and finally to the enumerators. The workflow being used in the 2004 test centralizes control to a great degree at Census Bureau headquarters. Though local census office and crew leader input is sought in making initial assignments and elsewhere in the process, all information is channeled directly through headquarters; each enumerator’s PCD communicates with headquarters in order to receive new assignments. Likewise, hierarchical processing when completed questionnaires are gathered is also replaced. Rather than having completed questionnaires undergo visual inspection by enumerator crew leaders and by local census office staff, the PCD transmits each enumerator’s completed questionnaires directly to headquarters and on to data processing. Members of the panel saw a demonstration of the device to be used in the 2004 test, and our understanding is that software development is still under way. At this time, the devices are able to provide enumerators with listings of their workload assignments and with maps of the blocks they will be working in, but there is no connection between the two. That is, enumerators cannot highlight one or more of their assigned cases, request that they be plotted on a map, and thus decide on an optimal route. In its current form, the map information appears to serve only a purely reference purpose. The 2004 test implementation is also preliminary in nature because paper—and not

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges electronic transmission—will be used for many of the progress reports and initial assignment rosters that will circulate between census headquarters, local census offices, and crew leaders. 5–A.2 Assessment The Bureau’s plans for portable computing devices are a particularly exciting part of a reengineered census, but we are concerned about various aspects of their development, among them the following. First, much remains to be done to bolster the argument that use of PCDs will create major cost savings. Second, we are concerned that the Bureau’s current approach to testing the devices may be based primarily on devices currently available on the market. Hemmed in by that limitation, the Bureau runs the risk of not placing enough emphasis on the establishment of requirements and specifications for these devices and of not adequately accounting for human factors. PCD Cost Savings PCDs are critical to the Census Bureau’s plans to achieve cost savings in the 2010 census plans. Indeed, Bureau staff at the panel’s last public meeting in September 2003 identified PCDs as the centerpiece of savings for the short-form data collection in 2010. The basic claim is that huge costs associated with current census field operations can be directly linked to the use of paper, including the cost of rental space to store the paper, the cost of the paper and printing itself, and the cost of distribution, transportation, and handling of the paper. The expectation is that the use of PCDs will produce sufficient savings through the reduction of paper to pay for itself. In addition to the savings associated with the reduction in paper, Census Bureau cost documents have asserted that PCDs will reduce equipment and staff needed in local census offices to produce maps, reduce costs of data capture, and improve productivity (U.S. Census Bureau, 2001a). The problem is that, at present, the panel knows of no empirical evidence for any of these potential cost savings. Therefore it appears that the Bureau is proposing to spend a large amount of money in PCD procurement in the hope that the efficiency and

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges paper-reduction gains will be achieved. This is too radical a departure from current procedures and too risky an undertaking for the decision to be made without careful testing and the accumulation of evidence that such cost savings could realistically be achieved, or that the use of the PCDs will pay for themselves without negatively affecting data quality (or alternatively, that the expected gains in quality offset the additional costs). For example, much has been made of the use of global positioning system (GPS) receivers attached to these devices to reduce duplication through correct map spotting of dwelling units. To support this contention, we have heard many anecdotes about enumerators who cannot read maps, but have been shown no hard evidence of the extent of the problem. (It is also far from clear that the same enumerators who experience difficulty working from and reading paper maps will, with minimal training, be able to use the maps on a handheld computer with any greater efficiency or any less error.) Moreover, the use of GPS will do little to solve the problem of correctly identifying individual units within multiunit structures. The device may be able to indicate when the enumerator has reached the correct structure, but readings will likely be impossible in the structure’s interior. It is also not clear that GPS receivers will be of much utility in high-density urban areas, where the map spot associated with an address may be based on interpolating from address ranges associated with streetline segments, which may not match precise structure locations. Since it is difficult to fully articulate or confirm the benefit of having a GPS receiver built into every PCD, it is even more difficult to contrast that benefit with the associated cost and decide if the expense is justifiable. This is not to argue against technological advancement in conducting the census. What we do argue for is better explication of the costs and benefits; the Census Bureau’s experience with converting its survey programs to computer-assisted interviewing methods amply demonstrates that new technology does not automatically translate to cost savings. The introduction of laptops for ongoing survey data collection increased the number of staff at Census Bureau headquarters, while not reducing (initially, at least) the number of processing staff at the Bureau’s national processing facility in Jeffersonville, Indiana. No evidence

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges has been forthcoming on how much the transition cost the Bureau, but the arrival of computer-assisted interviewing does not seem to have saved the Bureau any money (National Research Council, 2003b). The decennial census experience with laptop computers for Accuracy and Coverage Evaluation (ACE) interviewing also suggests difficulty in assuming automatic cost savings with the implementation of technology. Titan Systems Corporation (2002) discusses the procurement of approximately 9,700 laptops for ACE from a vendor the Bureau had worked with since 1996. There were procurement and delivery problems, and a 10 percent overage was needed for failures. Specifically, the report notes that “the 9,639 laptop kits had to be assembled before shipping and this required the contractor to make BIOS configuration settings, load the software, and bundle the various accessories (adapters, manuals, batteries, etc.). The contractor had problems ensuring that each unit was configured as required” (Titan Systems Corporation, 2002:9). If the procurement of 9,700 laptop computers occasioned such problems, there seems to be legitimate concern about the procurement process for (as we understand it) some 500,000 PCDs. The 2004 and 2006 census tests will be critical to establishing the veracity of the cost-saving assumptions associated with PCDs. However, we are not confident that the 2004 census test is capable of providing the basic data needed to make return-on-investment decisions for 2010. That test is posed more as a wide-ranging but ultimately tentative first-use test to establish basic feasibility. Accordingly, as we note throughout this report and particularly in Chapter 9, the onus is that much greater on the 2006 test as a proof of concept. The Bureau must build into that test appropriate measures and metrics to make a cost-quality assessment of the effectiveness of PCDs, and these measures need to include a realistic assessment of training costs, failure rates and associated maintenance and support costs, accuracy rates, efficiency improvements, and so on. Testing, Requirements, and Human Factors It would be a mistake to make assumptions at an early stage that unnecessarily limit the functionality or constrain the human

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges factors of these devices. Given the rate of technological development, it is not unreasonable that a tablet-size PCD with a full-blown operating system, adequate memory, a 20-gigabyte hard drive, a GPS receiver, a modem, encryption facilities, and an 8-inch full-color screen display will be available in the market by 2007 at a price of $500 or less in the quantities required by the Bureau. So to prototype systems and to put too much emphasis on usability tests using devices of considerably less capability—rather than using early testing to further refine the basic logical and informational requirements that the final device must satisfy—is probably too conservative and will result in the acquisition and use of devices that will be less effective than necessary. We strongly suggest therefore that the Census Bureau not focus on the particular limitations and capabilities of the existing 2 or 2.5-inch screen devices currently available on the market. In terms of the capability of the devices likely to be available for 2010, it is almost certain that some testing using high-end devices (e.g., tablet PCs) would provide a more realistic test. The Bureau’s most pressing need regarding PCD development is the definition of specifications and requirements—clear statements of exactly what the devices are intended to do. In Section 6-B.3, we suggest the designation of a subsystem architect with responsibility for PCD and field systems to address this need. A key part of establishing the specifications and requirements for the devices will be articulation of the other census operations besides nonresponse follow-up for which the devices may be used; it is unclear, for instance, to what extent PCDs might be used in American Community Survey operations or in block canvassing. As the Bureau further develops its plans for PCDs, it will be essential to keep human factors in mind. The utility of the devices will depend on their effective use by a corps of temporary workers with relatively little training. While smaller devices of the current Palm/Pocket PC class may have advantages in terms of sheer size or weight, it is quite possible that working with a tablet-sized device will be much easier for census workers than repetitive pecking at a 2.5-inch screen. It is very important that the application software developed for the PCDs be tested by end users for its usability and accessibility, in addition to testing for computational bugs and flaws.

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges Recommendation 5.1: The Census Bureau should develop and perform a rigorous test of its plans for use of portable computing devices, and this test should compare the performance and outcomes of data collection using: devices of the current (Pocket PC) class being developed for use in the 2004 census test; high-end devices (e.g., tablet computers) of classes that are very likely to be available at reasonable cost by the time of procurement for 2010; and traditional paper instruments. Such a test is intended to provide fuller information about the costs and benefits of portable computing devices, using paper as a point of comparison. The test should also provide the opportunity to review specifications and requirements for the PCDs, using devices of the caliber likely to be available by 2010. Recommendation 5.2: By the end of 2004, the Census Bureau should complete requirements design for its portable computing devices, building from the results of the 2004 census test and in anticipation of the 2006 proof-of-concept test. The requirements and specifications for portable computing devices must include full integration with the census system architecture and should include suitability for other, related Census Bureau applications. The Bureau’s requirements design for PCDs must devote particular attention to the human factors underlying use of the devices. Recommendation 5.3: The Census Bureau must develop a complete engineering and testing plan for the software components of the portable computing devices, with particular attention to the computer-assisted personal interviewing interface, data capture systems, and communication/synchronization capabilities (including assignment of enumerator workload).

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges 5–B CHALLENGING DEFINITIONS FOR A MODERN CENSUS While PCDs offer the potential to improve the mechanics of census-taking, the panel believes that it is essential that 2010 census planners also take the opportunity for reexamination and change of some of the basic definitional concepts of the census. 5–B.1 Housing Units First, and consistent with our recommendations in Chapter 3, the very notion of what constitutes a housing unit deserves a fresh assessment. It is largely for this reason that we recommend the creation of a Master Address File (MAF) coordinator position within the Census Bureau. For census purposes, the MAF’s most fundamental purpose should be to serve as a complete register of housing units. Accordingly, an important step in enhancing the MAF is an examination of the definition, identification, and systematic coding of housing units (and, by extension, group quarters). (See Sections 3-E.1 and 3-E.2 for additional discussion about housing unit identification and coding.) The current MAF/TIGER Enhancements Program may impart some benefit to MAF entries by virtue of their linkage to TIGER but does little to address two fundamental problems that hindered the MAF’s effectiveness as a housing unit roster in the 2000 census. The first of these is multiunit structures—physical buildings that contain more than one housing unit. A realigned TIGER database may offer a precise location for a structure—an aerial photograph may confirm a structure’s existence or point to the construction of a new one—but that added precision is ultimately of little use if the address roster of subunits within the structure is unknown or inaccurate. Multiunit structures pose problems both conceptually (e.g., if the finished basement of a house is sometimes offered for rent, should it be counted as a unit?) and technically (e.g., do different data sources code an apartment as 3, 3A, or 3-A?), and deserve research and clarification during the intercensal decade. We further discuss the particular problem of

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges in large and small cities in neighborhoods where housing has been vacated by non-Hispanic whites. These immigrant settlements are frequently characterized by extensive family networks, higher-than-average fertility, and larger-than-average household sizes, all of which greatly complicate the census enumeration. Many immigrant families also have English language proficiency problems, are fearful of government, and have occupancy characteristics that may violate local ordinances. All of these factors decrease the likelihood of their completing and returning a census questionnaire via mail, requiring high levels of nonresponse follow-up (NRFU). NRFU may be further compromised in these neighborhoods because their housing stock violates some of the basic tenets of the mailout/mailback method of data collection, the most important of which is the clear demarcation of housing units, especially in small multiunit structures. Small buildings that were once occupied by a single family are now home to multiple families in all kinds of configurations in many of the nation’s cities, large and small. As the presence of immigrants and their children becomes more widespread, this problem will become more pronounced, threatening the most elementary assumption of a census enumeration—that it is possible to uniquely identify a housing unit for the purposes of mailing questionnaires and for conducting nonresponse follow-up. In some neighborhoods, questionnaires can no longer be linked to housing units in any exact way, creating confusion about the delivery points for questionnaires and the completion of nonresponse follow-up. When a questionnaire fails to be returned by mail, the NRFU enumerator does not have a clear apartment designator in follow-up because such designators do not exist—mail is sorted by tenants of separate apartments out of a single mailbox, or, where multiple mailboxes exist, the use of apartment designators is inconsistent or nonexistent. This situation leads to underenumeration and/or erroneous enumeration in the very places where effective counts are most needed for program planning and targeting. But although there have been calls for the Census Bureau to address this problem (see, e.g., National Research Council, 1999), the Bureau has failed to develop methods to deal with it, continuing instead to rely on a haphazard NRFU effort in immigrant communities

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges that, in 2000, likely contributed to considerable undercount and erroneous enumeration. The Bureau can no longer ignore this situation. The 100 percent block canvass and the actual census enumeration must employ new methods. No block canvass, regardless of the effort, will work if the rules regarding the listing of housing units do not take into account the occupancy and labeling problems that frequently characterize immigrant communities. The address listing operation assumes the existence of unit labels that are not present, so the very premise of the operation is faulty. The Bureau needs to create ways to label units and carry these labels into the enumeration. Mailout to these units may be impossible, so strategies need to be developed to take this into account. The Bureau needs to use the 2006 test as an opportunity to: create labels for housing units in multiunit structures where no labels exist; test methods for incorporating this labeling into a block canvass operation; determine whether mailout can be conducted to these units; and test an enumeration strategy that does not use the standard mailout/mailback method of data collection. Alternate enumeration strategies might include urban update/leave, in which questionnaires are delivered to apartments by enumerators with a request that they be mailed back. This approach needs to include a component that labels the apartment, so that the questionnaire-apartment assignment is correct and so that follow-up can steer the enumerator to the correct location. Other options the Bureau should explore include more extensive use of face-to-face enumeration, in cooperation with local community leaders. 5–C.2 Rural Enumeration The Census Bureau has historically been challenged by rural enumeration. Problems range from the absence of city-style address formats to physical barriers in remote, isolated places such

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges as Alaska and the desert Southwest. Other challenges arise from a subset of rural respondents who live in these remote places because they are seeking to escape the intrusions of modern life, and especially the intrusions of the federal government. Finally, there is an often underappreciated diversity of places in rural areas—American Indian reservations, Hispanic colonias, and religious communities such as the Amish, to name only a few. The Census Bureau should be mindful of two considerations in the enumeration of rural areas. First, it should avoid treating such areas as essentially homogeneous regions. Enumeration methods that work well on Indian reservations may not work well in rural Appalachia, and vice versa. Housing arrangements may vary from one rural area to another. Second, the partnerships formed for the 2000 census were instrumental in ensuring the cooperation of many rural communities. The partnership program and the many efforts made during the 2000 enumeration to “localize” the census and make it attuned to the interests of diverse communities should be carefully examined to build upon the successes of 2000 and to rectify any problems that may have arisen during the 2000 count. 5–D ALTERNATIVE RESPONSE MODES AND CONTACT STRATEGIES Following up on limited experience in the 2000 census, the Census Bureau plans for alternative modes of response to the questionnaire to play a larger role in the 2010 census than they did in 2000. In particular, the response modes that have been proposed (in addition to mailback of the paper census form) are submission via the Internet and answers using an automated telephone system known as interactive voice response (IVR). As is the case with PCDs, the Census Bureau has suggested that increased usage of these response modes—both of which feature automated data capture, as data are collected in digital form—will achieve significant cost savings. As is also the case with PCDs, much remains to be demonstrated regarding the accuracy of these cost-saving assumptions; research is also needed to address the possible effects of alternative response modes on potential duplication in the census (see Section 5-E) and the potential

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges for respondents to answer questions differently if the structure and wording of questions under the different response modes vary. In addition to expanding the possible response modes, the Census Bureau has suggested that it plans to revise some of its respondent contact strategies. In 2000, advance letters were mailed before questionnaires were delivered and reminder postcards to send in the form were sent a few weeks after questionnaire mailout. The Census Bureau has expressed interest in sending a second questionnaire to nonresponding households, reviving an idea that had to be abandoned in the 2000 planning cycle. We discuss both the response mode and contact strategy proposals in the balance of this section. 5–D.1 Response Modes in 2000 and Early 2010 Testing Internet data collection was conducted in the 2000 census, albeit in a limited and unpublicized manner. The Bureau’s evaluation report on 2000 census Internet collection notes that there were 89,123 Census ID submissions on the Web site, of which 16.7 percent were failures (thus 0.07 percent of eligible households—63,053 out of 89,536,424 short-form households—successfully availed themselves of the opportunity to complete the form on the Internet) (Whitworth, 2002). This seems like a relatively high failure rate, although the report notes that “many, if not most, of the submission failures were associated with a Census ID representing a long form” (Whitworth, 2002:5). Given the Bureau’s plans to expand the use of Internet reporting for the 2010 census, it is important to examine the data from the 2000 Internet responses, as well as from the 2003 National Census Test, to identify and correct problems such as those relating to entering the ID or other security or usability issues. We urge the Bureau to examine the data already in hand with a view to improving the design of the Internet response option. The 2000 census also included in its evaluations and experiments program a Response Mode and Incentive Experiment (RMIE), testing whether respondents would be more likely to submit a census form via the Internet or IVR if offered an incentive (specifically, a telephone calling card). Among those as-

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges signed to Internet (i.e., encouraged to complete the form on the Web), only 3.9 percent did so when given an incentive to do so, and 3.4 percent did so with no incentive. Extrapolating these numbers to the entire set of eligible households in 2000, providing an incentive to use the Internet option would have resulted in just over 3 million returns by this mode. The summary of the RMIE work suggests a potential saving of between $1 million (assuming a 3 percent Internet response) and $6 million (assuming a 15 percent response) in postage costs (Caspar, 2003). The Bureau has argued that savings in paper, printing, data capture, and warehousing costs resulting from converting many mail responders to alternative electronic response modes such as the Internet would help offset the costs of acquiring PCDs. Given the above numbers, we do not see large potential savings from alternative response modes and we urge the Bureau to develop realistic cost models for such approaches. Caspar (2003), summarizing the various RMIE reports, offers other insights into the potential effectiveness of alternative response modes in 2010: The calling card incentive moved some people to use the alternative mode, but did not increase overall response as these are people who would respond by mail anyway. The impact of the calling card incentive may not justify its cost. Among respondents to the Internet Usage Survey who were aware of the Internet option, 35 percent reported that they believed the paper form would be easier to complete. While Internet completions may be beneficial for the Census Bureau, the argument needs to be made for its benefits to the respondents before large numbers of them are likely to switch to Internet completion. IVR does not look promising: “Without significant improvements in the voice-user interface, the IVR technology is probably not a viable option for Census 2010.” The results from the alternative mode part of the 2003 National Census Test support the view that alternative modes are unlikely to account for a large proportion of census responses

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges or to decrease mail nonresponse in 2010: “Offering a choice of alternative modes did NOT increase or decrease the cooperation rates. Instead it simply shifted response to the alternative modes. However, this shift was relatively small…. Pushing respondents to respond by electronic modes [IVR or Internet, by not providing an initial paper questionnaire] was found to decrease overall response” (Treat et al., 2003:8). The response mode component of the 2003 test is also discussed in Box 9.1 in Chapter 9. 5–D.2 Response Mode Effects It is well known in survey research that respondent answers may differ due to variations in the precise wording, format, and structure of the questions, and may differ based on the mode in which the survey is rendered (e.g., self-response versus interviewer-administered). Reporting the results of the 2003 National Census Test, Treat et al. (2003:9) recommend that the Census Bureau “develop content suitable for each mode; we need to first develop an ideal instrument for each mode, then test against other modes for differences.” As the Bureau report notes further, redesigns of the questionnaires used under different response modes should be sensitive to the possibility of mode effects on respondent answers; specifically, the report recommended “research[ing] the design of the instruments so as not to compromise data quality, while maximizing the advantages of each mode” (Treat et al., 2003:9). Assuming a short-form-only decennial census, concern over mode effects is eased somewhat due to the nature of the questions; several of the basic data items such as gender and housing tenure are not likely to be hurt by nuance in wording and format. The race and Hispanic questions, however, are a key area of possible concern for response by mode. This is particularly true given the multiple-response form of the race question and the demonstrated sensitivity of the Hispanic origin question to the number of groups mentioned as examples, as suggested by the 2003 National Census Test (Martin et al., 2003; see also Box 9.1). The ACS, consisting of the current long-form-sample data items, is more sensitive to mode effect concerns, though the extent to which alternative response modes may be added to the

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges ACS plan is unknown. Of greater concern with respect to the ACS are potential differences in response that may arise from different question structuring between the ACS instrument and the current census long form. 5–D.3 Replacement Questionnaires Prior to the 2000 census, the Census Bureau’s initial plans to send a replacement questionnaire to nonresponding households had to be abandoned after it was determined that the operation could not be completed in a timely manner. While the address list for targeted nonrespondents could be developed quickly, the Bureau learned from contractors that the actual printing, addressing, and mailout of questionnaires would take several weeks, delaying any nonresponse follow-up effort by an unacceptable amount. The results of the 2003 National Census Test were consistent with previous results in the survey literature, showing that targeted replacement questionnaires had a significant effect on cooperation rates—a 10.3 percentage point increase at the national level. The panel is convinced that the potential effect of replacement questionnaires on mail response rates has been well demonstrated and that implementation of this contact strategy in 2010 would be beneficial. What is needed now is a specific operational plan in order to actually deliver the replacement questionnaires. Treat et al. (2003:11) comment that, “in the Census, the largest obstacle for a targeted replacement questionnaire to nonresponding households is how to operationalize it.” It is certainly not the only obstacle; both the replacement questionnaires and the greater use of alternative response modes increase the potential risk of duplicate enumerations, and so development of strategies for unduplication becomes increasingly important (see Section 5-E). Furthermore, as Treat et al. (2003:11) note, research also remains to be done on the optimal time lag after the initial questionnaire mailout to compile the list of nonrespondents and send replacement questionnaires.

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges Recommendation 5.6: The Census Bureau must quickly determine ways to implement a second questionnaire mailing to nonresponding households in the 2010 census, in order to improve mail response rates. Such determination should be done in a cost-effective manner that minimizes duplicate enumerations, but must be made early enough to avoid the late problems that precluded such a mailing in the 2000 census. In the panel’s assessment, research consideration of possible effects of response mode and questionnaire design on respondent answers is certainly warranted and should be pursued. That said, a more pressing concern is development of plans for dealing with census duplication and nonresponse, as we describe in the next section. 5–E DATA-PROCESSING METHODOLOGIES: UNDUPLICATION AND IMPUTATION Two basic data-processing stages became very prominent in the 2000 census experience, and are likely to remain so in 2010. Unduplication (referring here to person records) became a major focus of the follow-up research informing the various decisions on possible statistical adjustment of the 2000 census totals. Specifically, advances in unduplication were made possible by a reasonably simple innovation—name and date of birth were captured for the first time in the 2000 census, as a byproduct of the use of optical character recognition technology. Based on work by Fay (2001), the Bureau staff continue to use and enhance the capacity to search the nation for individuals matching on name and date of birth. Especially for very common names, some of these matches are false, but weighting procedures have been developed to account for false matches. There is the real possibility of using some variant of this procedure to substantially reduce the frequency of duplicates in the 2010 census and coverage measurement program. Likewise, imputation for nonresponse emerged as a major focus in the wake of the 2000 census. The Census Bureau’s

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges basic methodology for imputing missing data items—so-called “hot-deck” imputation—has been in use for some 30 years. And, though it has certain key advantages—among them that it can be performed in one pass through the data set—it is a methodological area ripe for new research and approaches. Imputation gained considerable attention in the 2000 census when the state of Utah questioned its use in the second of the state’s major legal challenges against the census counts, arguing that imputation constituted statistical sampling (which is prohibited from use in generating apportionment totals). The U.S. Supreme Court rejected the argument, ruling that “imputation differs from sampling in respect to the nature of the enterprise, the methodology used, and the immediate objective sought” and that use of imputation is not inconsistent with the “actual enumeration” clause of the U.S. Constitution (Utah v. Evans, 536 U.S. 452, 2002). Though imputation methods withstood legal scrutiny in this instance, their use and the potential implications they bring will likely be the subject of continued debate. National Research Council (2004) offers three recommendations related, generally, to the Census Bureau’s plans for unduplication and imputation in the 2010 census. We endorse and restate them here. Recommendation 5.7: The Census Bureau must develop comprehensive plans for unduplication in the 2010 census, in terms of both housing units and person records. Housing unit unduplication research and efforts should be conducted consistent with objectives outlined in the panel’s recommendations related to the Master Address File. Person-level unduplication efforts should focus on improvements to the methodology developed for the 2000 Accuracy and Coverage Evaluation Program, including national-level matching of records by person name. It is essential that changes in unduplication methodology be tested and evaluated using extant data from the 2000 census and that unduplication methods be factored into the 2006 proof-of-concept test and 2008 dress rehearsal.

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges Recommendation 5.8: The Census Bureau must pursue research on the trade-off in costs and accuracy between field (enumerator) work and imputation routines for missing data. Such research should be included in the 2006 proof-of-concept test, and census imputation routines should be evaluated and redefined prior to the 2008 dress rehearsal. As appropriate, the American Community Survey research effort should also address the trade-off between imputation and field work. Recommendation 5.9: The Census Bureau should conduct research into the effects of imputation on the distributions of characteristics, and routines for imputation of specific data items should be completely evaluated and revised as appropriate for use in the American Community Survey.

OCR for page 139
Reengineering the 2010 Census: Risks and Challenges This page intentionally left blank.