Census Bureau Research, Past and Present
HAVING CONCLUDED IN CHAPTER 2 that serious attention to cost and quality must drive the planning for the 2020 census, we describe our recommendations in the following two chapters. In this chapter, we critique the Census Bureau’s existing program for research—exemplified by the 2010 Census Program of Experiments and Evaluations (CPEX)—both by comparison with the Bureau’s past efforts and through articulation of the gaps in its current strategies toward research. This chapter then provides general guidance for rethinking census research (and, with it, the approach to the 2020 census); Chapter 4 turns to the practical issues of structuring and scheduling research in advance of the decennial census.
Section 3–A presents our critique of current and recent trends by the Census Bureau in its research programs; it builds from the context provided by two detailed appendices at the end of this report. Appendix A describes the precensus testing programs and formal research (experimentation and evaluation) programs of the 1950–2000 censuses. It also describes the testing earlier in this decade related to the 2010 census. Appendix B then describes the 2010 CPEX program in detail. Section 3–B rounds out this chapter by laying out what we think are key steps in improving the substance of Census Bureau operational research.
It is important to note two caveats about the historical review of research in Appendix A (and, indeed, throughout this report). First, in summarizing research activities in the appendix, our concern is more in inventorying the types and varieties of activities that have characterized Census Bureau re-
search and less in completely documenting their results. This is partly due to availability of information—particularly for the earlier censuses, methodological details are scant in the literature—and partly because documenting the full “results” of major field tests such as census dress rehearsals is simply beyond the scope of our project. Our interest in the appendix and this chapter is in describing general contours and features of census research and not on assessing the merits (or lack thereof) of each specific activity.
Second—and more fundamentally—we deliberately do not delve into the details of the coverage measurement programs that have accompanied the decennial censuses (save for formative developments in the earliest decades). We concur with our predecessor National Research Council panels that have found the Census Bureau’s coverage measurement programs to be generally of high quality. In particular, the Panel to Review the 2000 Census concluded that coverage measurement work in 2000 “exhibited an outstanding level of creativity and productivity devoted to a very complex problem” and that it showed “praiseworthy thoroughness of documentation and explanation for every step of the effort” (National Research Council, 2004a:244, 245). We also direct interested readers to the final report of another National Research Council (2008a) panel that had coverage measurement research as its explicit charge, which is not the case for our panel. Given our charge, we have focused our own analysis principally on research and development related to census operations; accordingly, it is important to note that our comments on “census research” in what follows should not be interpreted as applying to the Census Bureau’s extensive body of coverage measurement research.
CURRENT RESEARCH: UNFOCUSED AND INEFFECTIVE
In our assessment, the Census Bureau’s current approach to research and development (R&D), as it applies to decennial census operations, is unfocused and ineffective. The Census Bureau’s most recent research programs suffer from what one of our predecessor panels described as a “serious disconnect between research and operations in the census processes” (National Research Council, 2004b:45):
Put another way, the Census Bureau’s planning and research entities operate too often at either a very high level of focus (e.g., articulation of the “three-legged stool” concept for the 2010 census) or at a microlevel that tends toward detailed accounting without much analysis…. What is lacking is research, evaluation, and planning that bridges these two levels, synthesizing the detailed results in order to determine their implications for planning while structuring high-level operations in order to facilitate meaningful detailed analysis. Justifying and sustaining the 2010 census plan requires both research that is forward-looking and
strongly tied to planning objectives, and rigorous evaluation that plays a central role in operations rather than being relegated to a peripheral, post hoc role.
In our assessment, the 2010 CPEX exemplifies these problems, although the problem is broader than the specific experiments, evaluations, and assessments outlined in Appendix B. As the quotation from the predecessor panel suggests, our critique applies to Census Bureau research writ larger, including the census tests conducted between 2000 and 2010 and the evaluations and experiments of the 2000 census.
Legacy of Research
In Appendix A, we outline the precensus testing activities and formal research, experimentation, and evaluation programs of the 1950–2000 censuses. Together, they provide a picture of the Census Bureau’s major research programs on the decennial census, from the buildup to the 1950 census through the 2008 dress rehearsal for the 2010 census. We refer to specific points in these narratives throughout this chapter, and also note some general impressions from the flow of census research over the years. The first such observation from past census research is that the lack of focus evident to us in the Bureau’s current strategy was not always the case. Indeed, the Census Bureau has in the past been a place where major technological improvements and major data collection changes have been successfully executed through careful (but innovative) research and pathbreaking theoretical work.
Arguably the best example of R&D driving change in the process of census-taking—a string of related research projects building toward a major operational goal—is the switch from enumerator-conducted personal visits to mailed, self-response questionnaires as the primary mode of data collection. Now that mailout-mailback methodology has become so ingrained, it can be difficult to fully grasp how seismic a shift in methodology the change to mail was for the census. However, the magnitude of the shift can be inferred from chronicling the careful, deliberate program of testing and experimentation that preceded the change. As the Census Bureau’s procedural history of the 1970 census (U.S. Census Bureau, 1976) notes, mail was used for some Census Bureau activities as early as 1890—predating the establishment of the Bureau as a permanent office. In 1890, questionnaires concerning residential finance were mailed to households with a request for mail return; the same was repeated in 1920, and a similar mail-based program of income and finance questions was also used in 1950. Supplemental information on the blind and the deaf was requested by mail in the 1910, 1920, and 1930 censuses, and a mail-based “Absent Family Schedule” was used for some follow-up work in 1910, 1930, and 1940. The direct path
to mail as the primary mode for census collection probably begins with the “Advance Schedule of Population” that was delivered to households in 1910; this form was meant to acquaint households with the topics of the census but was not meant to be completed by the householders (a similar advance form was used in the agriculture census conducted in 1910). Following World War II, and often in conjunction with special censuses requested by cities and towns, the Census Bureau initiated a set of experiments and tests of mailout or mailback methods; one such test was conducted in 1948 (Little Rock, AR; Section A–1.a), another as a formal experiment of the 1950 census in which households in Columbus, OH, and Lansing, MI, had questionnaires distributed to them by enumerators prior to the 1950 census with instructions to complete and mail them on Census Day (U.S. Census Bureau, 1966:292; see also U.S. Census Bureau, 1955:5). Similar tests were performed in 1957, 1958, and 1959, with the January 1958 test in Memphis, TN, adding field follow-up of mailed census returns as a check on quality (Section A–2.a).
In the 1960 census, households were mailed an “Advance Census Report,” which they were asked to fill out but not return by mail. Instead, enumerators visited the household to collect the forms and transcribe the information onto forms more conducive to the optical film reader then used to process census data. If a household had not completed the advance form, the residents were interviewed directly by the enumerator. Based on the successful use of the mailout questionnaire in 1960, Congress enacted a brief but powerful amendment to census law in 1964: P.L. 88-530 struck the requirement that decennial census enumerators must personally visit every census household. Even though mail methods had been tested and the required legal authorization had been obtained, mailout-mailback methods were subjected to further testing prior to the 1970 census, as Section A–3.a describes. These designed tests of mail procedures escalated in size and complexity from a relatively small community (Fort Smith, AR) to a large central city (Louisville, KY), to known hard-to-count areas (parts of Cleveland, OH, that experienced enumeration problems in 1960 and ethnic communities in Minnesota and New York). In the 1970 census, questionnaires were mailed to about 60 percent of all housing units, focusing on major urbanized areas; a formal experiment conducted during the 1970 census (Section A–3.b) expanded mailout-mailback methods to more rural areas in 10 local offices, anticipating wider use of mail methods. The percentage of the population in the mailout-mailback universe has grown in subsequent censuses to include 81 percent of the population in 2000, with another 16 percent receiving questionnaires from census enumerators to be mailed back (see below).
Other notable examples in which R&D (in the form of census tests, experiments, and evaluations) drove important developments in the census process include:
Refinement of residence rules for college students: Prior to the 1950 census, the results of test questions in Current Population Survey (CPS) supplements and special censuses of cities contributed to the Bureau’s reversing its rule on counting college students. As described by the National Research Council (2006:Sec. 3–B.1), census practice since 1880 had favored counting college students at their parental homes. The test results prior to 1950 contributed to the conclusion that college students enrolled at schools away from home were frequently omitted in parental household listings, so the Census Bureau reverted to the 1850 census approach of counting college students at their school location.
Development of enumeration strategies for nonmail areas: In the 2000 census, blocks and other geographic regions were designated to be covered by one of nine types of enumeration areas (TEAs)—essentially, the method for making initial contact with census respondents—with mailout-mailback being the most common TEA. Update-leave enumeration for areas without city-style addresses, in which enumerators checked and updated address list entries during their visits but simply left a census questionnaire for respondents to return by mail, was first tested in a significant way in one of the experiments of the 1980 census; five matched pairs of district offices were selected for comparison using this technique (Section A–4.b). The March 1988 dress rehearsal in St. Louis, MO, added a variant of the strategy—urban update/leave, targeting hard-to-enumerate areas for personal enumerator visit—that was added to the 1990 census, evaluated, and judged to be a valuable technique (Sections A–5.a and A–5.b). The Bureau’s response to an unforeseen problem in the same 1988 dress rehearsal also led to enduring changes in practice. Nine counties in the east central Missouri test site were initially thought to be amenable to mailout-mailback but a pre-census check suggested high levels of undeliverable addresses. Hence, the Bureau swapped strategies for such areas; the same flexibility in approach applied to the 2000 census, in which mailout-mailback conversion to update-leave was one of the TEAs.
Flaws in Current Census Research and the 2010 CPEX
In this section, we briefly describe the principal deficiencies that we observe in the Census Bureau’s current approach to research and in the 2010 CPEX in particular. In our assessment, shortcomings in the Census Bureau’s research strategy need to be overcome immediately in order to foster an effective research program for the 2020 census.
However, at the outset of this discussion, it is important to note—and commend—revisions to 2010 census research that were announced as this
report was in the late stages of preparation. In congressional testimony in October 2009, Census Bureau Director Robert Groves (2009:9) described three research programs that he initiated following his own evaluation of 2010 census preparations:
We will develop and implement a Master Trace Project to follow cases throughout the decennial census cycle from address listing through tabulation so that we have a better research base for planning the 2020 Census. We also will be conducting an Internet measurement reinterview study, focused on how differently people answer questions on a web instrument from a paper questionnaire. Finally, we will mount a post-hoc administrative records census, using administrative records available to the Census Bureau. All of this will better position us for the developmental work we must conduct to improve future decennial census operations.
In committing to retain 2010 census operational data and to more aggressively evaluate the quality of administrative records data relative to census returns, the director’s proposed programs are responsive to recommendations made in our letter report (Part III of this volume). The proposed Internet reinterview study stops short of testing Internet response in the census, but does at least put the Bureau in the position of testing Internet response to a census instrument. We commend these developments and look forward to their completion. We also note that they are also partially responsive to the recommendations and guidance in the balance of this chapter. That said, important problems remain in the Bureau’s general approach to research, as we now describe.
Lack of Relevance to Cost and Quality Issues
Although effects on cost and quality were listed by the Bureau as primary criteria for choosing studies for the 2010 CPEX, the final slate of experiments and evaluations described in Appendix B and analyzed in our letter report (Part III of this volume) seem ill-suited to inform choices that would meaningfully affect either cost or quality. Of the experiments:
Only the nonresponse follow-up (NRFU) Contact Strategy Experiment appears clearly motivated by an attempt to reduce the cost of a high-cost operation without impairing census quality. However, even that experiment promises to stop short of providing comprehensive information on the cost-benefit trade-offs of suspending follow-up contacts after some number (more or less than 4–6) of attempts. Because enumerators will know in advance how many attempts are possible for a household, the experiment presents the opportunity for enumerators to game the system: to try particularly hard in early approaches at a 4-contact household or be slightly more casual in early attempts
at a 6-contact household. The experiment may provide insight into how enumerators deal with preset rules and conditions but not a true measure of NRFU yields at each possible contact opportunity.
The Deadline Messaging/Compressed Schedule Experiment may rightly be said to have some bearing on census costs, to the extent that it helps provide mailing package cues and prompts that may boost the mail return rate (and thus reduce the more costly NRFU workload). The compressed schedule portion of the experiment could be argued to promote higher quality by pushing data collection closer to the actual census reference date of April 1, but the impact of a one-week shift on the quality of resulting data is most likely negligible.1
The Alternative Questionnaire Experiment (AQE) is heavily focused on refinements to the measurement of race and Hispanic origin—important data items to be sure, but ones for which an objective truth is both unknown and unknowable, subject as it is to individual concepts of self-identity. Hence, the experiment may suggest whether different treatments yield different levels of reporting in specific categories, but it is impossible to say whether “different” is the same as “higher quality.” By comparison, only a single panel in the AQE focuses on the quality of information about residence and household count—the information that represents the constitutional mandate for the census.
As we noted in our letter report, the Confidentiality/Privacy Notification Experiment is, if anything, contrary to the goals of reducing cost and improving quality. Its paragraph treatment raises the possibility of mixing census information with data from other government agencies—i.e., the use of administrative records in census processes—in an ominous manner. Since the experiment includes only the single alternative wording, it creates a situation where respondents may react negatively but relatively little is learned about public sensitivity to records use.
Undue Focus on “Omnibus” Testing Slots
A treatment group in the 2006 Short Form Mail Experiment (see Section A–7) that used a compressed mailing schedule showed similar levels of nonresponse to questionnaire items on housing tenure, age, Hispanic origin, race, and sex compared with a control group. The compressed schedule group had a statistically significant difference (decrease) in leaving the total household count blank compared with the control, and also appeared to increase reporting of new babies and reduce the tendency for respondents to omit themselves from the questionnaire (Martin, 2007), although how these effects are specifically generated by a one-week difference in questionnaire mailout is unclear.
grams is that the Bureau used to be considerably more flexible in the forms of research studies it undertook. Small, targeted tests in selected sites used to be more frequent; again, the example of the final gear-up to mailout-mailback methodology in 1970 (Section A–3.b) is instructive, with a series of tests escalating in size and scope from small communities to dense urban centers. The Census Bureau also made considerable use of special censuses commissioned by individual localities as experimental test beds; costs of the tests were thus shared by the Census Bureau and the sponsoring locality, and the locality had a tangible product—fresh population data—as an incentive for cooperating with the experimental measures. The use of special censuses for such purposes seems to have ended—perhaps understandably so—when the city of Camden, NJ, sued the Census Bureau over the results of a September 1976 test census in that city, occasioning several years of legal wrangling (Section A–4.a). In the past, the Census Bureau was also more willing to use other surveys for testing purposes—particularly the Bureau-conducted Current Population Survey, which was used to test items for the 1950 and 1960 censuses.
In the most recent rounds of research and experimentation, selected studies seem to have been chosen more based on the availability of testing “slots” (e.g., something that could be tested using only the mail as part of a single, omnibus, mail-only experiment in a year ending in 3) than on looming questions and operational interests. The recent cycle of mail-only tests in years ending in 3 or 5, tests involving a field component in years 4 and 6, and a dress rehearsal in year 8 has the advantage of keeping the various parts of census processing in fairly constant operation, so that there is no need to completely rebuild field operations from scratch. But a too-strong focus on these single-shot testing slots has led to poor design choices. For example:
The National Research Council (2004b:227) argued that the Census Bureau’s decision to fuse a test of alternative response technologies (i.e., paper, Internet, or telephone) to a mail questionnaire with numerous modules on race and Hispanic origin question wording in the 2003 National Census Test “was likely one of convenience” rather than one intended to produce meaningful results. The availability of a nationally representative sample seemed to have trumped attention to “power analysis to determine the optimal sample sizes needed to measure effects to desired precision” or “more refined targeting of predominantly minority and Hispanic neighborhoods” where the revised race questions would provide the most information.
Early plans for the mail-only 2005 National Census Test included experimental panels of different presentations of the instructions and wording of census Question 1 (household count). However, those early plans failed to include any relevant control group—either the question as presented in the 2000 census or the modified question used in a 2004 test—making it impossible to judge effectiveness of the experimental treatments compared with a baseline. After this deficiency was identified at a meeting of the Panel on Residence Rules in the Decennial Census, the test plan was altered to include a control (National Research Council, 2006:205).
In a regime of large omnibus tests, topics that might best or more accurately be handled in a series of smaller, focused tests are forced into a larger design, without promise that the omnibus test will be able to distinguish between fine-grained alternatives. Those omnibus census tests that involve a field component also suffer from an important limitation. They are meant to be census-like to the greatest extent possible in order to utilize the complete census machinery. But an explicit proviso of the modern census tests is that no products are released, most likely to maintain consistency with the Bureau’s reluctance in recent decades to use locally sponsored special censuses as experimental opportunities. But the result of this practice is a major operational test that is “census”-like save for the fact that it is not actually a census: such trials provide participating localities with no tangible product or benefit. Try though the tests do to create census-type conditions, localities have little incentive to provide unfettered support other than a sense of civic duty.
Finally, the shift in recent years to omnibus tests has created another fundamental flaw in Census Bureau research: almost of necessity, the tests can not build from each other. In previous decades, “chains” of related tests can be seen. For instance, the major census tests in Yonkers, Indianapolis, and Memphis in 1957–1958 (Section A–2.a) all involved use of a two-stage interview process, with another enumerator or supervisor rechecking results; based on experience in one of the tests, approaches in the later tests were varied. By comparison, the large-scale tests of recent years take longer to design, longer to field, and longer to analyze—and leave few resources for subsequent tests. With few exceptions, the results of recent census tests have been unable to follow directly from the experience of their predecessors simply because the results of the earlier tests had not been processed.
Failure to Utilize Current Methods in Experimental Design
A criticism related to the increased reliance on a smaller number of large, omnibus tests is a lack of attention to some fundamentals of experimental design. In an attempt to be all-inclusive, the experimental designs of recent
decennial census experiments—including those of the 2010 CPEX—do not take proper account of factors affecting the response and strength of the expected treatment effect, and, as a result, the findings from some experiments have been inconclusive. As we have already noted, the 2003 National Census Test fused together two broad topics—alternative response methodologies and variants on race and Hispanic origin questions—mainly to fill topic “slots” on the planned mailout-only test. Combining the two and distributing the sample led not only to the comparison of extremely subtle treatments in the race and Hispanic-origin segments, but also to the omission of relevant treatment groups on the response method portion (i.e., a treatment to “push” Internet use exclusively, instead of a group encouraging either Internet or telephone response).
There are several commonly used techniques that the Census Bureau does not typically employ in its experiments and tests that could provide important advantages over the methods currently used. For example, fractional factorial designs are extremely useful in simultaneously testing a number of innovations while (often) maintaining the capability of separately identifying the individual contributions to a response of interest. Such a methodology would be well suited to the problem of census innovation, since there are typically a small number of replications and a relatively large number of factors being modified. This problem is very typical of the large census tests that often need to examine a number of simultaneous changes due to limited testing opportunities.
The problem of confounding is well known, yet there are examples of experiments carried out by the Census Bureau, either during the decennial census or during large-scale census tests, in which the experiments have generated very uncertain results due to the simultaneous varying of design factors in addition to those of central interest. For example, the ad hoc Short Form Mail Experiment in 2006 (see Section A–7) took as a main objective determining whether a compressed mailing schedule and specification of a “due date” hastened response, but—by design—the experiment treated deadline and compressed scheduling as a single, combined factor and so could not provide insight as to which change was most effective.2 The proposed Deadline Messaging/Compressed Schedule experiment in the 2010 CPEX shows similar features and flaws. The message treatments shown in Table B-1 test subtle variations of an appeal made in a short paragraph in a cover letter and reminder postcard with blunter changes made to other parts of the mailing package. Whether a quicker response is due to the appeal to save taxpayer funds by mailing the questionnaire or to the explicit “Mail by
April 5” advisory—printed on the outside envelope of all the experimental treatments—is a question that the experiment will not be able to answer.
More fundamentally, experiments and census tests are rarely sized through arguments based on the power needed in support of the statistical tests used to compare alternatives. As noted in Section B–1.c, the critique in our letter report of the lack of a power analysis for the 2010 CPEX experiments—particularly the Deadline Messaging/Compressed Schedule experiment—was answered by the Census Bureau with an appeal to two internal memoranda and an arbitrary doubling of the sample size, with no insight as to how either the original or doubled sample sizes had been derived (U.S. Census Bureau, 2009a). All of this argues for greater attention to standard techniques of statistical experimental design in the planning of census experiments and intercensal tests.
It follows that making improvements in the Bureau’s experimental design and testing areas depends on bolstering the technical capability and research leadership of its staff; see Chapter 4 for further discussion of such organizational features.
Lack of Strategy in Selecting and Specifying Tests, Experiments, and Evaluations
In addition to not providing direct information on cost and quality, the experiments and evaluations in the 2010 CPEX show little sign of anticipating or furthering future methodology that could yield a 2020 census that is reflective of contemporary attitudes, technologies, and available information sources. The choices in the 2010 CPEX seem more suggestive of a “bottom-up” approach—looking at highly specific parts of the census process and making small adjustments—than a more visionary “top-down” approach that takes major improvement in the cost-effectiveness of the census (and such wholesale change in operations as is necessary to achieve that improvement) as a guiding principle.3 Such a top-down approach would be predicated on alternative visions for the conduct of a census—general directions that might be capable of significant effects on census cost or quality. Then, even though an experiment in the 2010 census would not be capable of fully assessing such a vision, the topics for experimentation would relate to those visions: chosen strategically, they could provide bits of preliminary information to guide later work over the course of the subsequent decade (or decades).
Of the 2000 and 2010 research programs, the only experiments that seem to have taken this kind of strategic approach are the Census 2000 Supplementary Survey (C2SS) and the Administrative Records 2000 (AREX 2000) Experiment, although both of those certainly had limits. The C2SS envisioned the major change of shifting long-form content to the ongoing American Community Survey (ACS). It significantly scaled up the collection of the prototype ACS, although not to a large enough degree to provide extensive information on the estimation challenges that are now awaiting users of 3- and 5-year moving average estimates; the (weak) goal of the C2SS as an experiment was simply to demonstrate that the Bureau can field the decennial census and a large survey simultaneously. The AREX 2000 experiment was a very useful first step in suggesting the use of administrative records in the census process but, arguably, was still focused too heavily on the potential of administrative records as a replacement for the census (i.e., do counts and distributions match) rather than administrative records as a supplement or an input source to a variety of operations.4
Two other points related to the strategy in the selection and execution of census research are worthy of mention. First, research activities are sometimes specified (or misspecified) so that the “next step”—the next key insight or possible outcome—is not taken. For example, the telephone-based Coverage Follow-Up (CFU) operation planned for the 2010 census is a key part of the Census Bureau’s coverage improvement activities. The full scope of 2010 CPEX evaluations and assessments relative to CFU is not known to the panel, but based on past Census Bureau history it is virtually certain that the evaluations will provide detail on the number of cases processed in CFU, on the breakdown of cases by incident type (e.g., a household count that conflicts with the number of people reported in the household), and on the number of CFU cases that yielded different responses. However, it is also virtually certain that the telephone-based CFU operation will not include a significant field interview component with a sample of eligible cases; hence, it will not be known how many CFU-eligible cases might have been reached by means other than telephone, nor will it be known how data from the less expensive telephone interviews compare with the “ground truth” established in a face-to-face interview. Likewise, the Census Bureau’s two principal geographic resources—the Master Address File (MAF) and the TIGER geographic database—are both examples of cases in which a vibrant research
program should yield regular estimates of geographic accuracy (through random spot-checks and small-scale collection of geographic coordinates). However, current metrics of MAF quality and completeness are generally limited to counts of addresses in the file and rough comparisons with other measures (e.g., independent estimates of the number of housing units).
Second, the Census Bureau has shown an unfortunate tendency to terminate some promising research and development leads too early. To achieve fundamental change, an organization cannot give up on important visions based on initial problems; the Census Bureau’s approach is too often to stop at version 1.1 of a promising approach rather than going on to develop 1.2. Arguably, the most prominent example of this tendency in recent experience is the Census Bureau’s abandonment of Internet response to the census. Aside from network security and the propagation of “phishing” sites masquerading as the census, the Census Bureau’s primary stated reason for its 2006 decision against Internet response in 2010 was less-than-hoped response via the Internet in the 2003 and 2005 omnibus tests. Rather than continue work on constructive ways to bolster awareness of the Internet response options (and acknowledge the shortcomings in design of the 2003 and 2005 tests), the Census Bureau opted to abandon the Internet response option and—worse—to eliminate it from its 2010 CPEX research plans. Another significant casualty of the 2005 test in this regard was alternative structures for the basic residence question (and supporting instructions), including a question-based worksheet approach suggested by the National Research Council (2006) Panel on Residence Rules in the Decennial Census. Based on perceived problems in cognitive testing interviews and less-than-expected performance as one small part of a too-large test, promising ideas in these alternative panels were set aside, when use of a larger number of small, focused experiments could have allowed the approaches to mature.
Inadequate Attention to the Use of Technology
Past decennial censuses have had to incorporate new technology in important ways. In the 1950–1970 censuses, the gradual shift to a mail-based census and self-response by individuals was also accompanied by development of questionnaires that were machine-readable, reducing the need to key information directly from paper forms. Optical mark recognition (computer parsing of check box and similar information) was complemented in the 2000 census by the use of optical character recognition of handwritten responses; indeed, major pieces of postcensus coverage evaluation work made use of the first-time automated capture of handwritten name information.
Envisioning the development of handheld computers for use in major census operations, the 2010 census promised to make major advances in the
use of technology in the U.S. census. In particular, cost savings were projected based on the use of handheld computers in nonresponse follow-up interviewing. The 2010 census is still likely to show technical improvements over its predecessors but—even before the count begins—suffers from the costly and highly publicized breakdown of the handheld computer development. Failures in the Field Data Collection Automation (FDCA) contract between the Census Bureau and the Harris Corporation led to a late “replan” of the 2010 census, scaling back of the handhelds to include only the address canvassing operation, an expensive switch back to paper-based NRFU, and a late scramble to complete the operational control systems that govern information flows through the entire census process. The causes of the failure of the full-blown handheld development contract are numerous and beyond the scope of this panel to determine, but we do suggest that a failure to make best use of research and testing played a significant role.
Our predecessor Panel on Research on Future Census Methods reviewed the early plans for the 2010 census and devoted considerable attention to technical infrastructure issues and the incorporation of new technology in the census process (National Research Council, 2004b). The panel recognized that the 2010 census plan included many major system overhauls—not only the development of the handheld computers, but also the establishment of a parallel data system with the American Community Survey and retooling of the Census Bureau’s geographic resources. Hence, that panel suggested that:
Serious institutional commitment was needed to map the logical architecture of the 2000 census, revise that architecture “map” for 2010 census assumptions, and use the resulting model to compare costs of alternative system designs and as a blueprint for final technical systems. In particular, the panel argued that effective systems development would founder without strong “champions” in high management and the establishment of a system architect office to oversee technical development.
A common pitfall in system redesign is locking into specific physical architectures too quickly. With specific regard to the handheld computers, the panel argued that “the most pressing need regarding [handheld] development is the definition of specifications and requirements—clear statements of exactly what the devices are intended to do” (National Research Council, 2004b:147). Furthermore, the “most important product of [early] testing is … a clearly articulated plan of the workflows and information flows that must be satisfied by [the handhelds], as they fit into the broader technical infrastructure of the census” (National Research Council, 2004b:189).
To facilitate this early focus on requirements, the panel encouraged the Census Bureau to focus more on function than form. “In terms of the capability of the devices likely to be available for 2010, it is almost certain that some testing using high-end devices (e.g., tablet PCs) would provide a more realistic test”—and better sense of requirements—than restricting focus too early on specific palm-size forms (National Research Council, 2004b:147).
On all of these points, the Bureau’s development process failed. No system architect position—either for the census as a whole or the handheld computer development in particular—was created, and the logical architecture modeling was little used. In particular, such modeling played no role in the testing of handheld devices in pilot work in 2002 and in the field tests of 2004 and 2006, all of which used devices cobbled together from commercial off-the-shelf components using various palm-size pocket PC-class devices as a base. As described in Section A–7, the 2002 and 2004 activities focused less on requirements of devices than on basic reactions to the devices—for example, would enumerators (with different degrees of experience and familiarity with an area) be comfortable with using the maps on the handheld as a reference? As is now clear from accounts from the need for the census “replan” in 2008, the development process of the handhelds following the award of the FDCA contract (in 2006) was not based on a set of requirements developed from the 2004 and 2006 tests, consequently missing even basic information needs like the need to perform operations in large blocks with hundreds (or thousands) of housing units. Indeed, a final set of requirements for the devices was only developed between November 2007 and January 2008, and the resulting cost estimate from the contractor as to how expensive it would be to meet those requirements precipitated the “replan.”
The failures of the handheld development—coupled with the Census Bureau’s decision to forbid Internet response to the 2010 census, despite having offered online response (albeit unadvertised) in 2000—have contributed to the strong perception that the Bureau is not adept at incorporating the use of technology. This is an unfortunate situation, because we think it unlikely that cost can be greatly reduced in the 2020 census without more effective use of technology.
Overreliance on the “Special” Nature of the U.S. Census
A factor that looms large in the Census Bureau’s decisions to include particular topics in its operational trials and major experiments is whether the topic needs the “census context.” That is, the question is whether the topic can best, or only, be tested with full census trappings such as advisories of mandatory response, publicity campaigns, and large sample sizes. To a
considerable degree, emphasis on the census context is appropriate because there are features of the U.S. census that make it more than simply a massive household survey. These features include the sheer size and pace of the census enterprise, the reliance of critical procedures on the mobilization of a large corps of temporary enumerators with relatively little training, and the firm constitutional mandate of the decennial count. Still, we think that the Census Bureau frequently exhibits an overreliance on the special nature of the census as it frames its research—a problem that has become part of its culture and attitude toward research.
Put simply, the Bureau’s research activities seem premised on the argument that the U.S. census experience is so special—large, complex and unique—that findings can be trusted only if they have been tested in the census context. As we have already noted, our review of the testing and experimentation programs of preceding decennial censuses makes it clear that the Census Bureau used to make much greater use of smaller, focused testing activities, and also used to make greater use of other survey vehicles to test changes that might ultimately be adopted for the census. By comparison, the more recent rounds of census research seem to assume that lessons from small experiments, from general survey research, and from foreign censuses and surveys are somehow inapplicable.
Thus, for example, the Census Bureau determined that it needed to test the effect of sending a second, replacement questionnaire prior to the 2000 census, even though the positive effects of such mailings in general survey research had long been documented. Although the effect of sending a second questionnaire has long been known to boost response rates in general surveys, the Census Bureau determined that it needed to test this extensively prior to the 2000 census. Ultimately, the 2000 census did not include replacement questionnaires because the Census Bureau did not determine the practical requirements of replacement questionnaires until late in the process. There was not sufficient time to work with vendors to accommodate the requirements to print and generate the physical forms in a very short time frame. Although the effect on response rates remained well known, testing the general concept of a replacement questionnaire continued in the 2010 testing round.
A second example of this culture is that, in part, appeal to the special nature of the U.S. census underlies the Bureau’s decision not to allow online response in 2010. In this case, the basic argument is that the unique security demands of the U.S. census are such that Internet response in the United States creates too great a vulnerability. While we do not minimize computer security, the implicit argument that those foreign censuses that have implemented Internet response are somehow less or inadequately focused on security discounts efforts by those national statistical offices to ward off hackers and Internet threats.
It is worth noting that threads of the special nature of the U.S. census experience have been part of census culture for a very long time. In fact, we comment in Chapter 1 on what is arguably the first census experiment, the use of advance census forms in the 1910 census. Census Director E. Dana Durand (1910:83–84) described the experiment as “by far the most important method adopted at this census” to increase public awareness of and participation in the census. However, he went on to comment:
The use of this advance schedule is a partial adoption of the practice of the leading foreign countries in which the larger part of the census work is done by the people themselves, so that the enumerators have little to do in most cases except to distribute and collect the schedules. It is not expected that the same results will be secured by the use of the advance schedule in this country. The novelty of the method, the mixed character of our population, and the complexity of the questions asked—much greater than in foreign censuses—are circumstances which render it likely that a much smaller proportion of the schedules will be properly filled out by families in this country than in countries like England and Germany.
Even from the beginning—the first census after establishment of the permanent Census Bureau—the notion that the complexity of the U.S. census (and population) requires wholly separate tools and methodologies was advanced. Overcoming this insularity—and more effectively building from external researchers and international peers—is a key part of improving Census Bureau research.
KEY STEPS IN RETHINKING THE CENSUS BY RETHINKING RESEARCH
Having critiqued the current state of Census Bureau research, we now turn to suggestions for improvement over the coming decade. We begin by discussing some broad overview strategies before suggesting selected specific ideas with respect to key strategic issues in Section 3–B.4.
Identify Visions for Next Census and Focus on a Limited Set of Goals
At the panel’s November 2008 meeting, Census Bureau staff discussed a preliminary set of goals and objectives for the 2020 census; they are listed in Box 3-1. It is worth noting that the three labeled “goals” for 2020 in Box 3-1 are essentially identical to those put forward for the 2010 census, save that the Bureau’s 2010 goals included a fourth point to “increase the rel-
Census Bureau’s Tentative Goals and Objectives for the 2020 Census
SOURCE: Weinberg (2008).
evance and timeliness of census long-form data” through the ACS (Angueira, 2003:2).5
We accept this list for what it is—a preliminary first cut—but the first point we make on restructuring census research is related to this listing. The list of objectives contains many good points (although they do sometimes confuse true objectives with the specific tools or procedures intended to achieve those objectives). But it is the sheer length of the list of objectives that we find troubling. In our assessment, organizational success in attaining goals is harmed when the number of objectives being pursued simultaneously is too large. Even a large organization like the Census Bureau, and the
management thereof, can focus on only so many large tasks at once. Goals are both useful and necessary; our concern is simply that having too long a list of primary objectives will lead to only incremental progress in meeting any one of them.
We think that a better, research-based path to the 2020 census begins by identifying a small set of alternate “visions” for the 2020 census and then evaluating their possible implications for census cost and quality. “Vision” is necessarily a difficult term to define precisely. By the term we mean a rough articulation of plans for or revisions to each of the major steps of census-taking (e.g., initial contact with respondents, response mode, secondary or follow-up contact with respondents, and information and management infrastructures). The reason for thinking of visions as models for the whole census process is to try to generate ideas that are not so vague or hypothetical as to be unhelpful, yet not so completely worked out as to lock in specific approaches or technologies too early. The “three-legged stool” concept that drove 2010 census planning falls short of a vision in the sense we describe here; although it included revisions of the support infrastructure of the census (geographic resources), it lacked specificity in how the short-form-only census would actually play out. Likewise, the phrase “administrative records census”—in itself—is not really a vision; at least an additional level of detail on how (and how well) administrative data might apply to census operations would be necessary to flesh out the idea and make it a tractable model to consider. Specifically, we recommend:
Recommendation 3.1: The Census Bureau should immediately develop a limited number of strategic visions for the 2020 census that are likely to meet its announced goals for costs and quality. By strategic visions, we mean start-to-finish strategies for conducting all major census operations in order to confront looming threats and implement new technologies. In addition, evaluation of each vision should include thorough review of the costs and benefits of individual census operations relative to the announced cost and quality goals, to determine whether operations that are not demonstrably cost-effective may be eliminated or scaled back or whether new technologies or practices might usefully be introduced.
This finite set of visions would provide a starting point for discussion and debate in the early years of the 2010–2020 planning cycle. It is then important that research early in the decade be able to shed light on advantages or disadvantages of the competing visions.
Recommendation 3.2: The Census Bureau should develop its 2020 census research and development program by identifying a handful of research questions whose resolution will determine
which of the visions of the next census are feasible and cost-effective and which are not. Priorities for evaluations of the 2010 census should be arrived at consistent with this research program, as should priorities for experiments and tests in the 2011–2018 period.
Build Capacity to Evaluate Costs of Alternative Visions
A second major step in a research-based strategy for 2020 relates to the evaluation and comparison of competing visions for the 2020 census. As discussed in Chapter 2, it is remarkable how little is known about the costs of the 2010 census even at this late stage of development. Clearly, the Census Bureau has in place cost models that it uses to develop budget estimates and allocate resources. What is fundamentally unclear is how good those cost models are—how sensitive they are to varying assumptions, how transparent they are in breaking down costs by component operations, and how flexible they are to estimating the costs of major changes to census operations. For example, the Census Bureau acknowledged in October 2009 that—for the address canvassing operation—the Bureau’s cost models “did not forecast accurately total costs, and we experienced a cost overrun in components of that operation” (Groves, 2009:7). Specifically, the U.S. Government Accountability Office (2009a:10–11) determined that about $75 million of the $88 million overrun (in an operation originally budgeted for $356 million) was attributable to flawed estimates of workload, both the initial workload for the operation and quality control checks. The remainder of the cost overrun was attributable to the costs of the training and fingerprinting of more temporary staff than needed.
Because we think that census cost and quality are the two central factors that must be addressed in thinking about the 2020 census, it naturally follows that it is of the highest priority that the Census Bureau be able to reliably estimate how much changes in census approaches will affect costs. Accordingly, we recommend:
Recommendation 3.3: In order to provide early indications of the costs of competing visions for the 2020 census and to support effective planning throughout the decade, the Census Bureau should develop and validate a detailed cost model that not only represents the 2010 census, but also accommodates novel approaches to census-taking, including the use of data capture via the Internet and automated telephone systems, the use of handheld devices in nonresponse follow-up, the use of administrative record information for some types of nonresponse follow-up cases, and innovative mechanisms for reducing the costs of updating the Master Address File between 2010 and 2020. This
cost model should be able to assess the implications of introducing specific changes to an existing design, singly and in combination, and to distinguish between the direct and indirect cost effects of specific changes. This cost model should be thoroughly documented and transparent so that the Census Bureau can obtain the benefit of expert advice on cost-effective improvements to census operations.
Build from 2010 Experience and Data (If Not the 2010 CPEX)
As is clear from the preceding critique in this chapter and the comments in our letter report, it is our assessment that the experiments chosen for inclusion in the 2010 CPEX largely squander a valuable testing opportunity. Save for what might be learned from the small residence piece of the Alternative Questionnaire Experiment (in conjunction with the experimental Group Quarters form; see Appendix B) and the differing number of NRFU contact attempts, the CPEX experiments are not likely to inform changes that will substantially affect either census cost or quality.
We are more optimistic about evaluation work generally (if not the specific evaluation studies currently envisioned in the CPEX framework), contingent on the retention of adequate operational and procedural data as the 2010 census unfolds. Like previous National Research Council panels, we think that a master trace sample that saves and links data for a sample of addresses, respondents, and cases through all steps of census processing would be an invaluable tool for providing empirical insight for intercensal testing. The broader notion advanced by the Panel on Research on Future Census Methods of a master trace system—a technical information infrastructure designed in such a way as to automatically and naturally retain virtually all operational data for later reanalysis—is a particularly attractive one.
Due to the 2008 replan of census operations and the resulting crunch to finalize operational control systems (authority for those systems having reverted to the Census Bureau rather than the outside contractor), we recognize that the time and resources to save a designed trace sample simply do not exist. However, as we argue in our letter report, it is absolutely essential that the operational control systems and other census information systems be designed to facilitate data retention—that is, that they include “spigots” or archival outlets to save operational data and facilitate an audit trail. A fully designed sample of cases up front would be ideal, but an archival snapshot of information from all the census information systems in order to build a sample (or trace system) after the census is the next best alternative. Accordingly, we formalize and extend arguments from our letter report as follows:
Recommendation 3.4: The Census Bureau should retain sufficient input and output data, properly linked and documented, from each 2010 census operation to permit adequate evaluation of the contribution of the operation to census costs and data quality to feed into 2020 census planning. For this purpose, the Census Bureau should either establish an internal group or hire a contractor with database management expertise. This group would have the responsibility of retaining and documenting sufficient data from the 2010 census to be able to comprehensively represent the functioning of all census operations. Such a group would also have the responsibility of assisting Bureau research staff, using current database management tools, to produce research files to support the assessment of analytic questions concerning aspects of the 2010 census.
We expand on the kinds of data that need to be retained, and the analysis and linkages that should be explored using them, in discussing options for the Master Address File in the next section.
Examples of Research Directions: Strategic Issues for the 2020 Census
We turn in Chapter 4 to organizational aspects of a successful research program. First, though, we offer some general comments on four issues that we think to be particularly strategic concerns for the 2020 census. By this listing, we do not intend to imply that they are the only important issues that should go into developing alternative visions for 2020, nor are these comments meant to be comprehensive treatments of the topics. We merely suggest that they are sufficiently major issues that some aspects of each of them should and will pervade such visions; what follows are (admittedly incomplete) thoughts on possible directions. (One other strategic issue that rises to this same level—making effective use of testing opportunities in the American Community Survey—is discussed in Section 4–D.2.)
Better and Less Expensive Sampling Frame: Directions for the Master Address File
The development and refinement of the sampling frame for the census—currently the MAF—is clearly a strategic issue for census planning because it is a key determinant of census coverage. Inclusion or exclusion of addresses from the MAF has a strong bearing on whether housing units or people are omitted, duplicated, or misplaced in census returns. It is also a strategic issue for the decennial census because it is a likely source of hidden costs. More effective and less duplicative listing could save eventual field costs during
follow-up operations and could reduce the need for broad-brush operations like the complete precensus address canvass used in the 2000 and 2010 censuses. Finally, the effective and accurate upkeep of a MAF is a strategic issue for the Census Bureau—over and above the decennial census—because it is also used as the sampling base for the Bureau’s ACS and the major demographic surveys the Bureau conducts on behalf of other federal agencies. Hence, a more accurate MAF at any point in time (not just a census year) benefits most, if not all, of the Bureau’s survey programs.
An absolute prerequisite for further research on the MAF and its future improvement is comprehensive evaluation of a type that was impossible in 2000 due to the structure of the file itself. As it existed in 2000, individual list-building operations could overwrite source codes on the address file, so that the most recent operation to “touch” a particular address could be recovered but not the complete history of the presence (or absence) of an address across all operations. Accordingly, a great degree of detective work had to be done to approximate the unique contributions of individual operations to the file and the degree to which operations duplicated each other—detective work complicated by the overlapping schedules of such operations as the Local Update of Census Addresses and the complete block canvass. A major objective of the MAF/TIGER Enhancements Program (MTEP) of the previous decade was to rework and rebuild the format of the database itself; ideally, this has been done in such a way that address source histories are directly recoverable.
Assuming that the source recording in the MAF has been upgraded, then—as part of a master trace sample/system–building effort—an address list research database should be constructed. At a minimum, this research database should link:
Sufficient “snapshots” of the MAF, with source codes that do not overwrite each other, as to be able to parse out unique contributions of such operations as the twice-yearly U.S. Postal Service Delivery Sequence File (DSF) updates, the Local Update of Census Addresses program (and appeals), the address canvassing operation, and the update-leave operation in 2010, among other sources;
Information from the 2010 Census Coverage Measurement operations, including whether addresses were flagged as including omitted or duplicated persons (or whole households), nonexistent or nonresidential structures, or erroneously geocoded entries;
Snapshots of Census Bureau–compiled administrative records databases, such as the Statistical Administrative Records System described below;
Derived variables about the nature of the housing unit (or structure) at the addresses, such as type of unit (e.g., urban house, rural house,
large apartment building, small multiunit apartment building) and demographic area characteristics (e.g., presence in a hard-to-count area due to high prevalence of non-English-speaking households); and
Returns from the ACS, to add richness (and timeliness) of possible covariates for analysis.
Construction of such a database would be invaluable to sorting out and documenting geographic contributions to census error and the characteristics of addresses that are subject to error. In addition, identifying high degrees of overlap between list-building operations could lead to simplification or consolidation of operations, possibly permitting cost reductions.
In brief, then, some selected issues and possibilities for sampling frame or address list research over the next decade include:
Use of address list research database to study feasibility of targeted canvassing operations: A focus of research should be determining whether it is possible to reliably discriminate between those blocks that are virtually unchanged by the various address building operations (blocks that are stable in that sense) and those that are changed. This could help use the more timely ACS data available in the years prior to the census to steer targeting efforts to the highest priority areas.
Quality metrics, change detection, and improved maintenance: Having made serious investments in upgrading the technical platform of the Bureau’s geographic systems during the 2000–2010 decade, the challenge now becomes one of keeping those resources up to date in the most accurate way possible. An original focus of the MTEP was on quality metrics (methods for assessing the quality and geographic accuracy of both the MAF and the line features in the TIGER database) and update mechanisms. One of the CPEX evaluations—comparing the results of detecting whole structures using aerial photography with MAF entries and other sources—may be a useful part of a broader research program in geographic updating. Generally, the Census Bureau would benefit from a program of field spot-checks, comparison with third-party sources (including addresses drawn from administrative records data files), and the like in order to have continuous diagnostic measures of the quality of the MAF and TIGER and to detect priorities for update and maintenance.
Continuous address/geographic improvement process: Several predecessor National Research Council panels on census issues have urged the Census Bureau to make the Local Update of Census Addresses (LUCA) Program a more continuous operation rather than a one-shot (and fairly rushed) chance to review address segments. We concur and urge the Bureau to continue to study the characteristics of addresses added or deleted by LUCA partners, and we also urge the Bureau to
consider a broader approach: a more continuous local geographic partnership for both the MAF and the TIGER database. Means of combining opportunities for local review of portions of the MAF with the regular Boundary and Annexation Survey used to update political boundaries, further combined with periodic sharing of locally maintained geographic information system files such as were a major source of information in the main TIGER realignment project of the past decade, would benefit both the Census Bureau and state and local governments. Developing ways in which continuous geographic updating can be made easier for local government participants—for instance, through software interfaces that make it easier for governments to respond using their existing electronic files or through consistent use of identifier codes for MAF and TIGER features—would serve to bolster participation by a wider range of governments. A continuous program of geographic resource improvements should also be accompanied by the development of quality and coverage metrics for both the MAF and TIGER (discussed above), so that the quality and unique contribution of local update sources can be assessed and areas with particular need for updating can be identified.
Integration with the American Community Survey field staff: Another original plank of the MTEP was what was known as the Community Address Updating System (CAUS)—effectively, making use of field staff assigned to ACS collection to make geographic updates if they encountered new addresses or streets on their rounds. CAUS failed to emerge into a major presence in recent years because of funding constraints and, perhaps more fundamentally, the more pressing exigency of simply getting the ACS on a solid footing. Still, the concept is potentially sound and useful. One possibility that could be researched would be periodic, systematic additions of address list or map segment verification tasks to ACS interviewer workloads, rather than simply enabling geographic updates if interviewers happen to come across new addresses or developments while on their rounds. Possibilities for study might include an approach known as half-open intervals: directing interviewers to go in some direction from a household in their interviewing workload and list any addresses they cannot find in MAF entries, stopping at the first unit they encounter that is on the MAF. Use of ACS field staff for such geographic update activities is certainly not a perfect solution for MAF updates over the course of a decade. However, this work could help in quality measurement of MAF and TIGER and provide clues to detect areas where the Bureau’s geographic resources might require particular updating.
Position the MAF as a national resource: The MAF benefits other federal government agencies through the major demographic surveys (done under contract with the Census Bureau) that use the MAF as a sampling frame. It is also developed, in some respects, in partnership with the U.S. Postal Service because the postal DSFs are a major input source to the MAF. It stands to reason, then, that a useful area of research concerning the MAF would be whether it satisfies the needs of its major stakeholders and insights that other agencies may have on the frame-building process. In the case of the Postal Service, our review of past decades’ research programs—replete with intensive use of postal checks and use of information collected directly from local letter carriers—is a reminder that establishing and maintaining a research partnership with the Postal Service is vital. For example, it should be determined whether the Census Bureau’s adaptation of the regular DSF updates makes use of the full range of address information on those databases and whether other postal data could further improve the MAF or TIGER. It should be noted that broader use of the MAF would be likely to require action by Congress (akin to the 1994 act that permitted both LUCA and DSF updating) because of the U.S. Supreme Court’s interpretation that Census Bureau address lists fall under the confidentiality provisions of Title 13 of the U.S. Code (Baldrige v. Shapiro, 455 U.S. 345, 1982).
Interface with the commercial sector: Just as it is important to compare the address coverage from compiled administrative records files with the existing MAF, it would also be worthwhile to study the quality and coverage of commercially available mailing lists (even if such commercial lists do not become an input source to the MAF). In particular, it would be useful to learn from private-sector practices in building, maintaining, and filtering mailing lists, as well as how private-sector firms have developed other frames, such as e-mail and telephone listings.
Assess integration of the “household” MAF with group quarters: One of the improvements promised in the MTEP was the merger of MAF, TIGER, and group quarters information into a common database structure. For the 2010 census, group quarters validation will still be done as a separate operation, following up with structures and facilities labeled as “other living quarters” in the complete address canvassing operation. Study of group quarters validation will be useful for judging the effectiveness of the merger of these lists and the ability of flags in the address records to distinguish between group quarters and regular household populations. This kind of study and research is particularly important given the sometimes blurred line between conven-
tional household and group quarters structures, such as group homes and health care facilities that may combine “outpatient,” independent living, assisted living, or full-time nursing care functions within the same structures.
Better and Less Expensive Data Collection: Toward a “Paperless” Census
An original goal of the incorporation of new technology into the census process for 2010 was to reduce the use, storage, and movement of paper in the census. The use of paper translates directly to both cost and time; while the use of document scanning greatly reduces the time that needs to be spent handling the paper (keying information directly from forms), the reliance on paper at all stages has serious implications for the size and scope of the local census offices and data capture centers that must be equipped for the census. The use of technology is an area in which the 2010 census is likely to be remembered for some strides but probably more for the costly and embarrassing collapse of the plans for use of handheld computers in NRFU interviewing and reversion to paper-and-pencil methods.
That the 2010 census will fall short of its original goals for reducing paper use is not a failure of vision—the goal was a good and laudable one—but a failure to execute and fully articulate that vision. The idea itself was not enough to guarantee success: the idea had to be matched by a set of research and testing activities designed to propel the larger task of technology development forward and specify the requirements for the technology.
Going forward, it is difficult to imagine a plan for the 2020 census that can substantially reduce costs or increase quality without a major emphasis on developing and integrating new technology. As a bold statement of mission, we encourage the Census Bureau to go further than to think of simply getting the development process of handheld computers for NRFU in shape. Rather, we suggest a broader examination of all steps in the census process with the public, stated goal of making the 2020 census as “paperless” as is practicable.
Further reasons why an effort to move the census in a paperless direction is a critical strategic issue for 2020 include the implications for quality. Indeed, we think that experience in the general survey research community suggests that the gains in accuracy from electronic methods for data collection may be more important than cost reductions. The Census Bureau’s own work in the 2003 and 2005 tests suggested that Internet responses were typically of higher quality than responses by other modes; edit routines and skip patterns in electronic questionnaires can promote accuracy in subtle but important ways. Secondary gains in accuracy are not hard to imagine. Nonpaper formats would not have the same hard space limits as paper forms, thus reducing the need for follow-up with large households for whom reported
information simply does not fit on the paper form. Questionnaires could also be made directly accessible in a wide variety of foreign languages, without the strong filter of recent censuses in which a call to an assistance center was necessary to request a foreign language form. The age distribution and attitudes of the population also make a higher tech, relatively paperless census a key strategic issue; new generations are arguably more conversant with electronic media than paper media, and a “green” census (saving paper) might serve as a strong incentive to boost participation. However, one of the strongest arguments for a heightened focus on use of technology leading to the 2020 census is simple perception, exactly the reason why the 2010 census looks odd relative to other national censuses and surveys that are now turning toward Internet data collection. That is, it would simply look foolish and out of step to try to force 2020 census technology into the 2010 mold rather than aggressively studying and rebuilding systems.
The guidance by the Panel on Research on Future Census Methods in its final report (National Research Council, 2004b) on developing and implementing a technical infrastructure remains valid. It also follows that movement toward census processes that are highly automated and as paperless as possible heightens the importance of ensuring that those processes have an audit trail—that they include outlets for retention, archival, and analysis of operational data such as we recommend for the 2010 census in Section 3–B.3. Having already described many of the points raised in that report, we do not expound on them further. In brief, some other selected issues and possibilities for technology research over the next decade include:
Boosting response using paperless methods: One of the valid arguments raised by the Census Bureau for not permitting online response in the 2010 census is that their experience suggests that permitting electronic response options along with a paper questionnaire does not seem to elevate overall response. That is, it does not seem to produce original respondents: it may sway some people who might otherwise respond by mail to return the form electronically, but it does not convert probable nonrespondents to respondents. This observation is not unique to the Census Bureau’s experience; other national statistical offices and survey research organizations have encountered the same phenomenon. Developments in questionnaire design and approach strategies should be pursued by the Bureau in cooperation with these other groups.
Security and confidentiality concerns: In overcoming the concerns about computer and network security that led it to disallow online response in 2010, the Census Bureau would benefit from in-depth study of the security mechanisms used in other censuses and surveys. It would also benefit from examples in the electronic implementation of other government forms, such as tax returns.
Mode effects: A perennial concern in survey methodology with the adoption of new survey response types is the difference in consistency of response across the different types. Response mode differences are not inherently good or bad but need to be understood and documented.
Better and Less Expensive Secondary Contact: Nonresponse Follow-Up
Reexamining assumptions and strategies for NRFU operations is a key strategic operation because of the significant costs of mobilizing the massive temporary enumerator corps and making contacts at households that, for whatever reason, do not respond to lower cost initial contacts.
Research on the possible role of administrative records in NRFU processes is particularly critical to achieving better and less expensive secondary contact with respondents. With some additional evaluative and follow-up components, the telephone-based 2010 CFU operation could provide some useful insight to start such research. One of the possible planned sources of household or address records being submitted to the CFU operation is a search of census returns against the Census Bureau’s database of administrative records compiled from other federal agencies, a database currently known as StARS (Statistical Administrative Records System) or e-StARS. As we have also described, the retention of operational and procedural data during the 2010 census also has the potential to yield very valuable information; these data snapshots should be able to support a post hoc examination—as a research question—of the impact on census costs and quality if administrative records had been used as supplemental enumerations at various stages of NRFU work. All stages—from near-original enumeration of nonresponding households to use of records as a last resort measure rather than proxy information—should be considered and investigated.
To be clear, the use of administrative records should not be seen as a panacea for all census problems, and we do not cast it as such. Sheer numeric counts aside, the quality and timeliness of administrative records data for even short-form data items, such as race and Hispanic origin and relationship within households, remain open and important questions. Wider use of administrative records in the census also faces formidable legal hurdles, not the least of which are inherent conflicts between the confidentiality and data access provisions in census law (Title 13 of the U.S. Code) and the Internal Revenue Code (Title 26), given the prominence of tax return data in the administrative files. Still, just as it is difficult to imagine a 2020 planning effort that seriously addresses cost and quality issues without aggressive planning for use and testing of new technology, it is also difficult
to imagine such an effort without a meaningful examination of the role of administrative records.
Other key strategic issues for NRFU-related research include:
Investigation of state and local administrative data sources: As mentioned above, the Census Bureau’s current StARS administrative records database is built annually from major administrative data sources maintained by federal agencies, including the Internal Revenue Service. Particularly as the idea of using administrative records in a variety of census operations (such as geographic resource updates) is considered, the Census Bureau should explore the quality and availability of data files maintained by state and local governments, including local property files, records for “E-911” conversion from rural non-city-style addresses to easier-to-locate addresses, and state and county assessors’ offices.
Optimal pacing and timing of contacts: The NRFU Contact Strategy Experiment of the 2010 CPEX varies the number of contacts allowed for nonresponding households. As we have already noted, there is something slightly off in the specification of the experiment—capping the number of visits and making this known to the interviewers presents opportunities for gaming the system. But the optimal number of attempted NRFU contacts—based on yields of completed interviews and quality of information—is an important parameter to resolve. So, too, is work on best ways to structure local census office and enumerator workload in order to maximize the chances of successful interview completion.
Efficacy of telephone follow-up: Along the same lines, a research question that has been touched on in past census research but that is worth revisiting in the 2020 climate is use of telephone (or electronic) follow-up rather than personal visit. The effectiveness of the telephone-based CFU operation in 2010 may provide initial insight on the feasibility of conducting such operations on an even larger scale. Much may also be learned about the effectiveness of telephone-based follow-up in the census context by studying and evaluating its use in the ACS and other Census Bureau surveys.
Reducing NRFU workload by shifting some burden: Clearly, a critical determinant of the cost of NRFU operations is the number of nonresponding households that are followed up. The U.S. Supreme Court’s 1999 decision and current wording of census law reinforce that reducing that workload by following up with only a sample of households is not permissible. But research efforts on another angle to cut into the overall NRFU workload—promoting late response to the census—may be worthwhile. This could involve, for example,
extending publicity and outreach campaigns (and possibly some shift in message—emphasizing that “it isn’t too late” to respond) and “Be Counted”–type programs; such a message was employed in the 1980 census (see Table 2-5). The effectiveness of such an approach would depend on the level of automation of census operations (e.g., the ability to transmit quickly revised enumerator assignments) and the time demands for data capture from paper forms. Still, the costs and benefits are worth exploring—these efforts might not sway some truly hard-to-count respondents, but they could elicit responses from some reluctant or forgetful households.
Examining relative quality of “last resort” enumeration: In those cases in which contact simply cannot be made, the relative quality of different options for filling the blanks—for example, proxy information, imputation, and use of administrative records—should be quantified and evaluated.
Quality of interviews as a function of time from Census Day: It is generally well understood that follow-up interviews (as well as independent interviews such as the postenumeration survey that is the heart of coverage measurement operations) are best done as close as possible to the census reference date. Doing so helps to curb discrepancies due to people moving to different households or switching between “permanent” and seasonal residences. It is also generally well understood that there is decay in interview quality and consistency with length of time from the survey. This is arguably more an issue for time-sensitive information (i.e., exact knowledge of monthly utility bills) or recall of numerous events than for the items on a short-form-only census. Still, a body of quantitative evidence on recall and decay effects on short-form items (including number of persons in and composition of the household)—and key long-form items currently collected on the American Community Survey—as they vary with time from Census Day would be very useful in revisiting such assumptions as optimal timing of determining the start of NRFU operations.
Rethinking the Basic Census Objective: Getting Residence Right
The basic constitutional mandate of the decennial census is to provide an accurate resident count. Accordingly, in terms of setting basic strategy for the 2020 census, the concept of residence and collecting accurate residence information is vitally important. Census residence rules and concepts have been easy to pigeonhole as a questionnaire design issue—the search for the right phrasing and ordering of words and instructions at the start of the census form in order to prod respondents to follow particular concepts. These questionnaire design matters are important, but the issues are
much broader—a thorough examination of residence merits attention to the basic unit of analysis of the census, to the implications of residence concepts for data processing design of operations, and to tailoring enumeration approaches to different levels of attachment to a single “usual” place of residence.
The National Research Council (2006) Panel on Residence Rules in the Decennial Census discussed a wide range of research ideas under the general heading of residence; these generally remain as applicable to the 2010–2020 planning period as they were to 2000–2010. In terms of questionnaire design, these include further research on replacing the current instruction-heavy approach to the basic household count question with a set of smaller, more intuitive questions and more effective presentation of the rationale for the census and specific questions. Other specific research suggestions (many of which draw from the same panel’s recommendations) include:
Quality of facility records for group quarters enumeration: Residence concepts and group quarters enumeration are impossible to disentangle because the nonhousehold, group quarters population includes major cases in which determination of a single “usual” residence is difficult: college students away from home, persons under correctional supervision, persons in nursing or other health care facilities, and so on. Hence, attention to getting residence concepts right demands attention to methods for enumerating group quarters. In the 2000 census, about half of the returns for the group quarters population were filled through reference to administrative or facility records rather than direct interview. Yet much remains unknown about the accuracy and capability of facility records and data systems to fill even the short-form data items, let alone their ability to provide the kind of alternative residence information that would be necessary to inform analysis of census duplicates and omissions. The National Research Council (2006:240) suggested that the Census Bureau study the data systems and records of group quarters facilities on a continuous basis, akin to its efforts to continuously update the MAF. This is clearly a major endeavor, but one that is particularly important because of the inclusion of group quarters in the ACS. Even the records systems of state-level correctional systems will vary, let alone the plethora of systems maintained by individual colleges or health care facilities. But research toward a continuous inventory would benefit other surveys of group quarters populations between censuses and may suggest methods for more efficient and accurate data collection from the relatively small but very policy-relevant group quarters population.
Residence standard mismatch between the census and the ACS: While the decennial census uses a de jure “usual residence” concept, the long-
form-replacement ACS uses a 2-month rule—effectively, a de facto or “current residence” standard. The exact ramifications of this difference in residence standards are unknown, and, indeed, they may be relatively slight, particularly given the pooling of ACS data to produce multiyear average estimates. But more precise empirical understanding of the possible differences introduced by differing residence standards in the Census Bureau’s flagship products would bolster the ACS’s credibility. In the next chapter (and in Recommendation 4.5 in particular) we discuss the need for integration of research between the decennial census and the ACS; a matching study of the census to ACS returns near April 2010 would be an ideal step in that regard to study possible differences in residence specifications and household rostering.
Revamping “service-based enumeration”: In the 2000 census, the Census Bureau’s efforts to count transient populations, including persons experiencing homelessness, were combined into an operation known as service-based enumeration. This operation—to be repeated in 2010—relies principally on contacts at locally provided lists of shelters and facilities providing food or temporary shelter services. Just as group quarters enumeration would benefit from sustained research effort and attention over the decade, so, too, would outreach efforts to best cover the service-based population. Such effort should include collaboration with subject-matter experts, local government authorities, private service providers, and other agencies (such as the U.S. Department of Housing and Urban Development) that have periodically attempted measures of the number of homeless persons at points in time. It should also include focused, relatively small surveys to compare the efficacy of sample-based measures of the homeless population compared with census-type canvasses.
Revisiting the “resident” count: Because of the legal climate surrounding the 2010 census, the Census Bureau may face pressure to conduct research on components that are currently included or not included in the census “resident” count. It should prepare accordingly. In particular, its research program should give some thought to studying the effects on response and cooperation by including questions on citizenship or immigration status. The arguments that such questions could seriously dampen response and hurt the image of the decennial census as an objective operation are straightforward to make and, we think, are basically compelling, but empirical evidence is important to building the case. With regard to counting Americans overseas, the experience of the 2004 Overseas Enumeration Test is very useful, but here, too, additional quantitative evidence would be useful. In particular, it would be useful to examine and critique information resources that
may be available from the U.S. Department of State (e.g., contacts with overseas embassies or consulates) to estimate the level of coverage in such files; it would also be useful to evaluate and assess the quality and timeliness of the data files that the Census Bureau already uses from the U.S. Department of Defense and other federal agencies with employees stationed overseas (for inclusion in apportionment totals).
Planning Now for the Census Beyond 2020
The history of past census research that we have outlined in Appendix A and described in this chapter—particularly the successful adoption of mailout-mailback methods—suggests that truly massive change in approach to the census can take decades of planned research to be fully realized. It is vitally important in 2009 and 2010 for the Census Bureau to be thinking of and planning for the 2020 census, but it is also appropriate to be thinking now of even broader changes that may apply even further in the future.
A sample of such broader, fundamental issues that should be considered for long-term research efforts include the following:
A census without mail: Arguably the boldest area for research in this direction is the concept of a census without primary reliance on mailout-mailback methods. Given the difficult fiscal circumstances of the U.S. Postal Service and major effects that electronic commerce and e-mail have had on regular physical mail volume, means for making initial contact with the national population other than mailed letters or questionnaires may have to be considered in future censuses.
Change to the unit of enumeration: Since the act authorizing the 1790 census required the count to be counted at their “usual place of abode,” the decennial census has used the household as its basic unit of enumeration. In the modern census context, this has involved associating households with specific addresses and, through those addresses, with specific geographic locations. Just as the core assumption of mailout-mailback methodology is one that should be probed and reconsidered in coming years, so too is the unit of enumeration worthy of research and examination. For example, it is important to consider how a census using the individual person as the unit of analysis can be analyzed and tabulated, as well as the extent to which households, families, or larger constructs can be reconstructed from a person-level census.
Interaction between the census and the American Community Survey: We discuss the integration of the census and the ACS further in Chapter 4, but the topic is a critical long-term research enterprise. In its early period of development, it is both appropriate and important to focus on the properties of the ACS as a replacement for the long-form sample of previous censuses—whether ACS tabulations can satisfy both user needs and the myriad legal and regulatory demands for demographic data. Going forward, the capacity of the ACS as a unique survey platform in its own right must be explored, including ways for the census and the ACS to support each other: for example, use of parts of the ACS sample as a test bed for experimental census concept and questions.