This appendix describes the operations of the 2000 census, noting differences from 1990 census procedures. It covers five topics:
the Master Address File (MAF) (including local review and internal checks for duplicate addresses);
questionnaire delivery and mail return (including redesign of mailings and materials and multiple response modes);
field follow-up (including nonresponse follow-up, NRFU, and coverage improvement follow-up, CIFU);
outreach efforts; and
data processing (including data capture, coverage edit and telephone follow-up, unduplication of households and people, editing and imputation, and other processing).
MASTER ADDRESS FILE
The 2000 census was conducted primarily by mailing or delivering questionnaires to addresses on a computerized mailing list—the MAF—and asking residents to fill out the questionnaires and mail them back.1 The Census Bureau first used mailout/mailback techniques with an address list in the 1970 census,2 but the procedures to develop the 2000 MAF differed in several important respects from those used in past censuses (see LUCA Working Group, 2001; Owens, 2000). The major difference from 1990 was that the 2000 MAF was constructed using more sources.
The Census Bureau used somewhat different procedures to develop the MAF for areas believed to have predominantly city-style mailing addresses (house number and street) than for areas believed to have predominantly rural route and post office box mailing addresses (see Box A-1). City-style areas were those inside the “blue line,” and non-city-style areas were those outside the “blue line.”3 For areas inside the blue line, the Bureau expected to have U.S. Postal Service carriers deliver questionnaires to most addresses on the list; for areas outside the blue line, the Bureau expected to use its own field workers to deliver questionnaires.
For remote rural areas, which have less than 1 percent of the population, Census Bureau enumerators developed the address list concurrently with enumerating households in person. For special places in which people live in nonresidential settings, such as college dormitories, prisons, nursing homes and other group quarters, the Bureau used a variety of sources to develop an address list. About 2.8 percent of the population was enumerated in group quarters in 2000 (tabulations of Census Bureau data); the comparable 1990 figure is 2.7 percent (U.S. Census Bureau, 1996:68).
Inside the “Blue Line”
As the starting point for the MAF for city-style areas inside the blue line, the Census Bureau took the 1990 census address list for these areas and updated it from the Delivery Sequence File (DSF) of the Postal Service. The DSF contains a listing of addresses to which mail is delivered, ordered by carrier routes. It is updated regularly. Legislation passed in 1994 allows the Postal Service to share the DSF with the Bureau.
Although not part of its original plan, the Bureau determined that a complete field check of the city-style list should be conducted, which was done in a block canvass operation for all mailout/mailback areas conducted in January-May 1999. The Bureau also provided an opportunity for local review in 1998–1999 (see “Local Review,” below). Approximately 101 million addresses were included in the MAF for areas inside the blue line at the time when questionnaires were labeled and prepared for mailing in July 1999. The Postal Service conducted an intensive check of the DSF in early 2000, and updates were made to the MAF based on that check prior to questionnaire delivery.
Outside the “Blue Line”
To develop the MAF for non-city-style areas, the Bureau first conducted a block canvass operation, called address listing, in July 1998-February 1999.
BOX A-1 Basic Steps to Develop the Master Address File Prior to Census Day, 2000 and 1990
2000 CENSUS MASTER ADDRESS FILE
City-Style Areas (mailout/mailback areas inside the “blue line”)
Non-City-Style Areas (update/leave areas outside the “blue line”)
1990 ADDRESS CONTROL FILE
The 1990 list was not used. There was also a LUCA Program conducted in 1999. Approximately 21 million addresses were included on the MAF for areas outside the blue line at the time when questionnaires were labeled and prepared for delivery. Census enumerators further updated the MAF in these areas when they delivered questionnaires in February-March 2000.
The same legislation that made it possible for the Postal Service to share the DSF with the Census Bureau also permitted the Bureau to invite local governments—counties, places, and minor civil divisions, over 39,000 jurisdictions in all—to review the MAF for their areas and provide additions, deletions, and corrections to the Bureau (LUCA Working Group, 2001).4 The Local Update of Census Addresses (LUCA) Program was conducted separately in areas inside the blue line (LUCA98) and areas outside the blue line (LUCA99). There was also a Special Places LUCA Program.
LUCA required participating local governments to sign a pledge to treat the address list as confidential. The program involved several steps of local review, field verification by the Bureau, and appeal to the U.S. Office of Management and Budget when localities disagreed with the Bureau’s decision to reject local changes to the MAF. Due to time constraints, some planned LUCA operations were combined and rescheduled (LUCA Working Group, 2001:Fig.1-1). In response to local concerns, a New Construction LUCA Program was added to give localities inside the blue line an opportunity during January-March 2000 to identify newly constructed housing units. Addresses identified in the program were not mailed questionnaires; instead, they were visited by enumerators during the coverage improvement follow-up operation in summer 2000.
Of the total 39,051 jurisdictions that were eligible for either or both LUCA98 or LUCA99, it is estimated that 25 percent participated fully in one or both programs by informing the Census Bureau of needed changes to the address list for their area (LUCA Working Group, 2001:Ch.2). Participation varied by such characteristics as geographic region of the country, population size of jurisdiction, type of government, and city-style or non-city-style area (see Chapter 4). Not yet known is what proportion of the MAF in city-style areas and non-city-style areas represented valid addresses that LUCA contributed (rather than repeating information from another operation, such as an update from the DSF).
Further Development of MAF
MAF was a dynamic file during the operation of the census. Not only were addresses added from each stage of census field operations, they were deleted
in an effort to minimize duplicate and erroneous entries. In total, the Census Bureau estimates that about 4 million addresses were added to the MAF during census field operations—2.3 million addresses during questionnaire delivery in update/leave, update/enumerate, and list/enumerate areas (see “Questionnaire Delivery and Mail Return,” below) and 1.7 million addresses during follow-up. About 10.4 million addresses were removed as duplicative of other addresses or nonexistent. About 5 million of these addresses were removed on the basis of two internal consistency checks, one of which was planned and the other of which was designed and implemented while the census data were being processed; the remaining addresses were deleted on the basis of field operations (see “Field Follow-up,” below). Whether the combination of internal checks and field checks reduced duplicate and erroneous addresses to a minimum or went too far or not far enough is a matter for evaluation. The final number of addresses on the MAF of occupied and vacant housing units counted in 2000 was 115.9 million (Farber, 2001a:Tables 1, 2).
Internal Checks for Duplicates
The Census Bureau anticipated that multiple sources to develop the MAF could result in duplication of addresses by carrying out a planned internal consistency check in April in order to reduce the nonresponse follow-up workload. Subsequently, the Bureau responded promptly to evidence that the MAF still contained duplicates by designing and implementing a second internal consistency check in summer 2000.
Reducing the NRFU Workload
In April 2000 the Census Bureau conducted an internal consistency check of the MAF prior to the beginning of nonresponse follow-up in order to remove from the NRFU workload as many addresses as possible that could clearly be identified as duplicative or nonexistent (Miskura, 2000a). At the conclusion of this operation, 3.6 million addresses were dropped or merged with another MAF address.
One source of potential duplicates and errors came about because LUCA, which was essentially a new, untested program, did not run as smoothly as intended (LUCA Working Group, 2001). Because of delays in providing materials to local governments to review, the Census Bureau agreed to include every address provided by a LUCA participant on the MAF that was used to label questionnaires in July 1999, even when there had not been time to verify the address in the field. LUCA-supplied addresses that the Bureau believed likely did not exist, based on field checks after July, were flagged. Processing specifications were developed to delete many of these addresses and other addresses of doubtful existence when no questionnaire was returned for them.
In all, 2.5 million addresses that the Bureau had reason to believe did not exist were deleted from the MAF prior to nonresponse follow-up.
Also as part of this review, the Bureau attempted to identify duplicate addresses originating from LUCA or other sources. About 1.1 million addresses were merged with another address on the MAF when the addresses appeared to be exact duplicates. Follow-up was conducted either only for the one (merged) address or not at all if a questionnaire had been received for that address.5
Unduplication and Late Additions
Another important set of MAF internal checks, not previously planned, was put into place in summer 2000. From evaluations of MAF housing unit counts during January-June 2000 against estimates prepared from other sources, such as building permits, the Census Bureau determined that there were likely still a sizable number of duplicate addresses on the MAF (West and Robinson, 2001). Field verification carried out in June 2000 in a small number of localities substantiated this conclusion (Nash, 2000).
Consequently, the Bureau mounted a special operation to identify duplicate addresses and associated duplicate census returns to remove them from the MAF and the census. Software was written for this operation to match addresses and person records to identify potential duplicates. The flagged records were deleted from the census file of valid, completed returns and further examined. After examination, it was decided that a portion of the potential duplicates were likely valid returns for addresses not already in the census, and they were restored to the census file (late additions). At the conclusion of the operation, 1.4 million housing units and 3.6 million people were permanently deleted from the census file, from a total of 2.4 million housing units and 6.0 million people that had been initially flagged as potential duplicates (Miskura, 2000b).
Comparison: Address List Development in 1990
The procedures used to develop the 1990 Address Control File (ACF) differed in important respects from those used to develop the 2000 MAF (see National Research Council, 1995:App.B). Overall, the Census Bureau used fewer sources in developing the 1990 ACF than it used for the 2000 MAF; also, the 1990 local review operation was considerably less extensive than the 2000 LUCA Program (see Box A-1, above).
For 1990 in areas with city-style addresses, the Census Bureau made no use of the 1980 census address list or the Postal Service DSE Instead, the
starting point for the Address Control File was two files of lists purchased from vendors, supplemented by a field listing operation carried out by census field staff in summer 1988 (precanvass). The Postal Service performed several reviews of the list in 1988–1990; Bureau staff also checked the part of the ACF that derived from the commercial lists in 1989. Governmental jurisdictions in the city-style areas were given an opportunity for review in summer 1989; however, they could not review specific addresses but only counts of addresses at the block level About 16 percent of eligible local governments responded, adding about 400,000 housing units to the ACF (U.S. Census Bureau, 1993:6–44). By comparison, twice as many eligible governments—36 percent—participated in the LUCA98 Program in city-style areas.
In areas with non-city-style addresses, the development of the 1990 address list was similar to 2000, in that census field staff conducted a prelisting operation in fall 1989. Census enumerators also checked the list in March 1990 when they delivered questionnaires in the areas in which the update/leave technique (new for the 1990 census) was used. However, there was no precensus local review program for the ACF in these areas.
QUESTIONNAIRE DELIVERY AND MAIL RETURN
The 2000 census, like the 1980 and 1990 censuses, was conducted primarily by delivering questionnaires to households and asking them to mail back a completed form. Procedures differed somewhat depending on such factors as type of addresses in an area and accessibility; in all, there were nine types of enumeration areas. Box A-2 provides brief descriptions of the nine types in 2000.
The two largest types of enumeration areas were: (1) mailout/mailback, covering almost 82 percent of the population, in which Postal Service carriers delivered questionnaires and (2) update/leave/mailback (usually termed update/leave), covering almost 17 percent of the population, in which Census Bureau field staff delivered questionnaires and updated the MAF at the same time. These two types, together with small numbers of addresses in areas (6), (7), and (9), comprised the mailback universe, covering about 99 percent of the household population (calculated from Baumgardner et al., 2001). The remaining 1 percent of the household population was enumerated in person (see areas (3), (4), (5), and (8) in Box A-2). Separate enumeration procedures were used for such special populations as homeless people, residents of group quarters, and transients (see Citro, 2000c).
Approaches to boost mail response were to redesign the questionnaire and mailing package, adapt enumeration procedures to special situations (the reason for having nine types of enumeration areas), and allow multiple modes for response. Advertising and outreach efforts were also expanded from 1990 (see “Outreach Efforts,” below).
BOX A-2 Types of Enumeration Areas (TEAs)
NOTE: For details, see U.S. Census Bureau (1999).
The mail response rate in 2000 (66%) was similar to the rate in 1990 (65%); it was also considerably higher than the rate that was budgeted (61%), which reduced the burden of field follow-up. The mail return rate in 2000 (72%) was slightly lower than the rate in 1990 (74%). This rate is a more refined measure of public cooperation than the mail response rate, which includes vacant and nonresidential addresses in the denominator in addition to occupied housing units (see Box 3-1 in Chapter 3).
Redesign of Mailings and Materials to Boost Response
To encourage mail response, a new questionnaire format was adopted for 2000. Based on extensive research (see National Research Council, 1995:Ch.6), a design was chosen that appeared as attractive and easy to fill out as possible. The use of new processing technology greatly facilitated the redesign (see “Data Processing,” below). The mailing package was also redesigned to distinguish the questionnaire from junk mail and to motivate response (e.g., the envelope noted that responses were required by law).
One design change for the questionnaire was to ask households to list all members but to limit the space for characteristics to six members—instead of seven, as in 1990—in order to reduce bulk and make the questionnaire less intimidating. It was planned to follow up households with more than six members by telephone (see “Data Processing,” below).
In the mailout/mailback area, multiple mailings were used to increase response. The first mailing was an advance letter (a new approach for 2000). The purpose of the letter was to alert residents to watch for the questionnaire, to provide a means for them to request a questionnaire in a language other than English, and to inform them of employment opportunities in census local offices. The second and third mailings were the questionnaire and a reminder postcard.
In both mailout/mailback and update/leave areas, the Bureau originally planned to deliver a second questionnaire to households not returning a form. Early testing showed that the use of a second questionnaire could increase mail response rates by as much as 10 percent (National Research Council, 1995:120). However, the Bureau determined that vendors could not process the list of nonresponding households quickly enough to be able to mail out a replacement questionnaire on the schedule required. Mailing a second questionnaire to all households, as was done in the 1998 dress rehearsal, was deemed too expensive and likely to lead to negative publicity and confusion.
The advance letter operation did not proceed as smoothly as hoped. A programming error resulted in an extra digit being inserted in every address, although the Postal Service caught the mistake and was able to deliver the letters as planned. In addition, the final version of the letter was not fully tested when it was decided after the 1998 dress rehearsal to add to the letter a way to request a foreign-language questionnaire. There was considerable
public confusion about what to do with the enclosed return envelope if one did not need a special questionnaire. However, there were no apparent untoward effects of these problems on the public’s cooperation with the census, and the publicity may have been helpful in alerting people to the need to respond.
Multiple Response Modes
Another innovation for 2000 to encourage response was to allow multiple response modes. Households that received a short-form questionnaire could fill out a short form on the Internet or by telephone. To answer questions and also permit telephone response, the Bureau contracted with a commercial phone center to operate a toll-free telephone questionnaire assistance system. This system provided assistance in English, Spanish, and several other languages. Individuals could also pick up “Be Counted” forms, which were made available in six languages at various local sites throughout the country just prior to Census Day.
Because multiple response modes might not only boost return rates but also result in more duplicate responses that could not be weeded out in later processing, the Census Bureau did not promote the “Be Counted” Program vigorously. Also, it did not widely publicize the Internet response option because of concerns about being able to handle a large response and maintain security. As it turned out, of 76 million questionnaires that were returned by households, 99 percent arrived by mail and only 1 percent by other modes: 66,000 were Internet returns; 605,000 were “Be Counted” forms; and 200,000 were forms completed by telephone. Not all “Be Counted” and telephone forms were included in the census: they were not counted if they did not have a valid address or if they duplicated another return.
Comparison: 1990 Questionnaire Delivery and Return
Questionnaire delivery procedures in the 1990 census differed in some respects from those used in 2000 (National Research Council, 1995:App.B). In 1990 about 84 percent of total housing units were in mailout/mailback areas; 11 percent—less than in 2000—were in update/leave areas (update/leave was a new procedure in 1990); and 5 percent—more than in 2000—were in list/enumerate areas. The list/enumerate procedure in 1990 differed somewhat from that used in 2000: Postal Service carriers delivered unaddressed short-form questionnaires to housing units in 1990 and census enumerators then came by to pick up completed questionnaires or obtain the answers, list the housing units in an address register, and at a predesignated subset of units, collect responses to the sample (long-form) questions. In 2000 Census Bureau field staff took questionnaires with them as they listed housing units and enumerated residents at the same time.
The 1990 census mailout procedures had not included an advance letter; however, a reminder postcard was delivered to all addresses in both mailout/ mailback and update/leave areas. Responding by the Internet (which did not exist) was not an option. The questionnaire and mailing package were designed not to facilitate response as much as to permit data processing with the technology used in the 1960–1980 censuses (see “Data Processing,” below).
Overall, the mailing strategies used in the 1990 census did not appear to help mail response. The mail response rate declined from 75 percent in 1980 to 65 percent in 1990; the mail return rate declined from 81 percent in 1980 to 74 percent in 1990.
Because not all households will mail back a form and because many addresses to which questionnaires are delivered will turn out to be vacant or nonresidential, the 2000 census, like previous censuses, included a large field follow-up operation (see Thompson, 2000). Over 500 local census offices (LCOs) were set up across the country, which reported to 12 regional census centers. The LCOs were responsible for hiring the temporary enumerators and crew leaders who would be needed to conduct follow-up operations. In update/leave areas, enumerators were hired to deliver questionnaires prior to Census Day and to return to follow up nonresponding households. LCOs also carried out operations to enumerate special groups, such as group quarters residents, transients, and the homeless.
In anticipation of possible difficulties in hiring and also the possibility that the mail response rate would decline from 1990, LCOs were authorized to recruit aggressively in advance of Census Day, to hire more enumerators than they thought would be needed, and to pay above-minimum wages (which differed according to prevailing area wages). Most offices were successful in meeting their hiring goals before the first follow-up operations began in mid-April 2000.
Follow-up operations were carried out in two separate stages, discussed below. The first stage, conducted in April-June, was the nonresponse followup, designed to obtain a questionnaire from every nonresponding unit in the mailback universe (or to determine that an address was vacant or nonresidential). The second stage, conducted in June-August, was coverage improvement follow-up (CIFU), which included specific operations designed to check and supplement NRFU. Several operations included in CIFU for 1990 were dropped for 2000.
Preparation for NRFU began in early April 2000. Lists of addresses for inclusion in the NRFU workload were provided to the LCOs the week of April 11; a week later, notification was sent of late mail returns, which the LCOs deleted manually from their follow-up lists. The final workload totaled 41.7 million addresses. This total included addresses in the MAF for which a completed questionnaire was not checked in prior to April 18 and new addresses from DSF updates. It also included addresses marked for deletion in the update/leave operation and addresses for which postal carriers returned the questionnaires as not deliverable and no attempt was made to redeliver them by census staff. The purpose of NRFU for these addresses was to doublecheck their status and, if they were in fact occupied, to obtain an enumeration.6
In most LCOs, NRFU enumerators went into the field beginning April 27. Their first objective was to visit each household in person to try to obtain an interview, even if the residents said they had already mailed back a form.7 If unsuccessful, the enumerators were to try up to five additional times to obtain an interview, unless the residents were known to be out of town for an extended period or the housing unit was verified to be vacant or nonexistent by a proxy respondent (someone not a member of the household, such as a neighbor or landlord). Three of the follow-up attempts could be made by telephone if the enumerator could obtain a phone number. In the case of refusals, field observations indicated that some offices adhered to the six-visit rule, sometimes using different enumerators, while others allowed the use of proxy respondents without making all six visits. If no interview was obtained after the specified number of visits, then enumerators were instructed to obtain information from a proxy respondent, noting the name and address on the interview form. When an office had obtained information for 95 percent of its workload, the best enumerators were given the remaining cases to make one last attempt to obtain information from the household or a proxy, even if fewer than six visits had been made to the household. Some offices required that at least three visits be made to a household before allowing a last attempt.
Conducted concurrently with the NRFU enumeration was a quality assurance program, in which selected cases were reinterviewed to identify fabrication (“curbstoning”). A random sample of the workload of each enumerator was reinterviewed; also, cases were selected purposively for reinterview by identifying enumerators whose work did not match that of other enumerators in the area. About 6 percent of the workload was reinterviewed in all, and
preliminary analysis found discrepant results in a relatively small proportion of the reinterview batches (3.0%).8
NRFU operations were completed in most LCOs by June 26, so that the entire operation took only 8 weeks, shortening the original schedule by 1 week. At the conclusion of NRFU, enumerators had classified 62.3 percent of the 41.7 million addresses in their workload as occupied, 23.4 percent as vacant, 14.3 percent as “delete” (e.g., because the unit was demolished or nonresidential), and a handful (0.01%) as “not resolved” (Baumgardner et al., 2001:Table 4).
Coverage Improvement Follow-Up
The coverage improvement follow-up effort that followed NRFU included several operations that involved about 8.7 million housing units. The largest portion of the workload comprised 6.5 million housing units that had been classified as vacant or delete in NRFU. These units, which CIFU rechecked to determine if they might have been occupied on Census Day, were only 41 percent of total addresses identified as vacant or delete in NRFU. If such an address had not already been marked vacant or delete in another operation, it was revisited, but not otherwise. Examples of vacant or deleted units not included in CIFU were those classified as vacant or delete by an update/leave enumerator and a NRFU enumerator and those marked as undeliverable by a postal carrier and classified as vacant by a NRFU enumerator.9
There were four other components of the CIFU workload to visit or reinterview: (1) 717,000 addresses that were added to the MAF in update/leave, but from which no questionnaire was mailed back; (2) 372,000 addresses that were added to the MAF from the New Construction LUCA Program (in city-style areas); (3) 539,000 addresses for which forms were essentially blank because NRFU enumerators could not determine even the number of household residents (and a small number of forms that were lost in the process of data capture); and (4) 570,000 addresses that were visited for some other reason. The fourth category included addresses that were added to the MAF from late updates from the Postal Service DSF and the LUCA appeals process. It also included verification of addresses on “Be Counted” forms and those filled out by telephone questionnaire assistance staff.
Addresses initially classified by CIFU itself as vacant or delete that had not been visited in any previous operation (e.g., an address added from the New Construction Program) were reinterviewed for quality control purposes. The
entire workload for one district office, in Hialeah, Florida, was reinterviewed because of problems that came to light in that office, and selected housing units were reinterviewed in seven other offices for which problems were identified. The operations in 15 local offices were questioned by the House Subcommittee on the Census, but the Census Bureau determined, on review, that only two of these offices warranted some reenumeration. (These two offices are included in the total of seven in which partial reenumeration occurred.)
Overall, CIFU determined that 27 percent of the 8.7 million housing units visited were occupied, 43 percent were vacant, and 30 percent should be deleted. (Almost no units had an unresolved status at the end of CIFU; Baumgardner et al, 2001:Table 5.) CIFU enumerators were most likely to find occupied units among the addresses added in update/leave; they classified 45 percent of these addresses as occupied. Other categories had lower percentages of units classified as occupied: 35 percent for lost or blank returns, 27 percent for new construction addresses (53% of these addresses were not yet completed and so were deleted); and 24 percent for addresses classified as vacant or delete in NRFU that were rechecked in CIFU. The percentage of NRFU vacant and delete addresses that CIFU reclassified as occupied, however, was 2 to 3 times the percentage of vacant and delete units found to be occupied in previous censuses for which a vacancy recheck was carried out (see “1990 Coverage Improvement,” below). The reason may be that, as noted at the beginning of the section, CIFU rechecked less than half of the addresses that were classified as vacant or delete by NRFU.
Comparison: 1990 Field Follow-Up and Coverage Improvement
NRFU procedures in 1990 were similar in broad outline to the procedures used in 2000 (see U.S. Census Bureau, 1993:Ch.6). The NRFU enumerators were instructed to visit each household in person. If an enumerator could not obtain an interview but was able to obtain a telephone number, then he or she was to make up to five additional attempts to interview the household—three telephone attempts and two more personal visits at different times of the day. If the enumerator did not have a telephone number, he or she was to make two additional personal visits. When all of these attempts failed to result in an interview or if the case was a refusal or the respondent was away for an extended period of time, the enumerator was instructed to talk to someone outside the household to obtain “last resort” information. Such information was defined as three of the four characteristics of relationship to head of household, sex, race, and marital status for each household member and a description of the housing unit. When 95 percent of the caseload had been completed, the remaining cases were given to the best enumerators who were to make one last visit to try to gather “closeout” data, defined as at least two characteristics for each household member.
Concurrently with NRFU enumeration, a reinterview program was carried out to detect falsification, similar to the program in 2000. The 1990 quality control program reinterviewed 4.8 percent of the NRFU workload of 34 million housing units and estimated a very low rate of falsification overall (0.09%; see U.S. Census Bureau, 1994:30–34).
In contrast to 2000, the 1990 NRFU operations fell considerably behind schedule, largely because of the Census Bureau’s failure to forecast the extent of the decline in the mail response rate from 1980 to 1990—the Bureau projected a 70 percent response rate (down from 75% in 1980), but the actual rate at the time NRFU began was 63 percent (the rate subsequently rose to 65%). The Bureau had to obtain additional appropriations and scramble to hire sufficient workers for NRFU and other follow-up activities; it raised pay rates in 140 of the 449 district offices (equivalent to LCOs) and took other steps to increase productivity. The NRFU operation was planned to take 6 weeks from when it began in late April; however, only 72 percent of the workload was completed by that time (by June 6). Another 18 percent of the workload was completed in 2 more weeks, but it took another 6 weeks—until early August—to complete the remaining 10 percent of the workload (U.S. General Accounting Office, 1992:46).
A subsequent stage of follow-up in 1990 included several coverage improvement procedures (U.S. Census Bureau, 1993:6–37 to 6–38;6–53 to 6–56). An operation called field follow-up, carried out in June-August, rechecked most units classified as vacant or delete in NRFU. Units that were not rechecked included those in areas with high proportions of seasonal housing or boarded-up buildings, plus units classified as delete by two precensus address update operations and a NRFU enumerator (a more stringent criterion than that used in 2000). By August 1, 5.3 percent of deleted units and 7.1 percent of vacant units that were rechecked in field follow-up were converted to occupied. (The corresponding percentages in 1980 were 7.5% deleted units and 10% of vacant units converted to occupied.) These figures are considerably below the rate of conversion from vacant or delete to occupied in the 2000 CIFU (24%).
In addition to the recheck of vacant and delete units, the 1990 field followup operation revisited failed-edit mail returns. These cases were mail returns that lacked sufficient information to be processed and for which telephone follow-up was not successful (see “Data Processing,” below). For cost reasons, only a 10 percent sample of failed-edit short forms requiring field followup were included in the workload; in contrast, all long forms requiring field follow-up were included. The 1990 field follow-up also revisited a number of mailback cases for which there was no record of data capture.
Another 1990 coverage improvement operation was the “Were You Counted Campaign,” in which people who thought they had been missed were encouraged by media announcements in June-July 1990 to send in a special form. Those forms with addresses that could be assigned to census geography and
with complete content were put through an operation to determine if they duplicated other forms. There was no field verification of the address, except in the Detroit district office, from which an unusually large number of forms were received.
Another special operation was the recanvass, carried out in July-November 1990, in which selected blocks, including those in high growth areas and those identified by postcensus local review, were relisted. The households were then reenumerated, provided the enumerator determined that the unit existed as of April 1. In all, the Bureau recanvassed more than 650,000 blocks containing about 20 million housing units (20% of all units).
Blocks identified for recanvassing by localities came about because in 1990 (though not 2000), local jurisdictions nationwide were invited to review preliminary census counts of housing units by block for their areas (U.S. Census Bureau, 1993:6–45 to 6–46). The counts were provided in August 1990, and localities had 15 days to challenge them. Responses were received from about 25 percent of all jurisdictions, including all of the 51 largest cities. All challenged blocks in which the discrepancy between the census count and that provided by the locality exceeded a specified amount were added to the recanvass operation, for which additional funding had to be obtained.
As part of the coverage improvement effort in 1990, in 24 local offices, all households for which the questionnaires reported only one household member were reenumerated. This procedure was implemented in response to allegations in late summer 1990 that enumerators in some offices during the closeout phase of NRFU had recorded households as one-person households without actually obtaining an interview (i.e., they were curbstoning). In addition, seven local offices in New Jersey were identified in which it appeared that fabrication may have occurred; households in these offices were reinterviewed when the questionnaires indicated household size but recorded no characteristics of household members.
Finally, a special program was implemented to improve the coverage of people who were on parole or probation (U.S. Census Bureau, 1993:6–55). The first step was to contact each state to ask its parole or probation officers to distribute census forms to their assignees to be filled out and mailed back. This operation had a very low response rate, so census enumerators were sent to correction departments in designated counties to obtain information for parolees and probationers from administrative records. No attempt was made to contact parolees or probationers unless their addresses could not be verified. The operation was not completed until late November-early December 1990. The forms obtained were processed through an unduplication operation (see “Data Processing,” below); however, subsequent analysis determined that many of the parolee/probationer forms that were accepted in the census count represented duplicate enumerations (Ericksen, 1991:43–46).
Summary: 1990 and 2000
The description of 2000 and 1990 follow-up procedures makes it clear that they were large-scale, complex operations, similar in broad outline but sufficiently different in detail to make it difficult to compare results across years. It is difficult, for example, to compare results from the 2000 CIFU recheck of vacant and delete units with the 1990 field follow-up vacancy check because of differences in how the workload was defined. Also, it is not clear exactly how such terms as “proxy” (2000), “last resort” (1990), “closeout,” and “non-data-defined” were similar or dissimilar, again complicating the task of comparative evaluation.
One can, however, conclude that the Census Bureau was more successful in 2000 than in 1990 in controlling field follow-up operations and keeping them on schedule. Coverage improvement operations were more focused, and programs that appeared problematic in 1990 (e.g., the parolee and probationers check) were not repeated in 2000.
To supplement field operations and special programs to improve population coverage and cooperation with the census, the Census Bureau engaged in large-scale advertising and outreach efforts for 2000. For the first time, the Census Bureau budget included funds ($167 million) for a paid advertising campaign (recommended by a National Research Council panel in 1978).10
The 2000 advertising campaign was extensive, involving a major contractor, Young and Rubicam, which contracted with four other agencies to prepare ads targeted to particular population groups and communities. The advertising ran from October 1999 through May 2000 and included a phase to alert people to the importance of the upcoming census, a phase to encourage filling out the form, and a phase to encourage people who had not returned a form to cooperate with the follow-up enumerator. Ads were placed on television (including one during the 2000 Super Bowl), radio, newspapers, and other media, using multiple languages. Based on market research, the ads stressed the benefits to people and their communities from the census, such as better targeting of government funds to needy areas for schools, day care, and other services.
In addition to the ad campaign, the Census Bureau hired partnership and outreach specialists in local census offices, who worked with community and public interest groups to develop special initiatives to encourage participation in the census. The Bureau signed partnership agreements with over 30,000 organizations, including federal agencies, state and local governments, business firms, nonprofit groups, and others. The Bureau did not fund these groups,
but it provided materials and staff time to help them encourage a complete count. A special program was developed to put materials on the census in local schools to inform school children about the benefits of the census and motivate them to encourage their adult relatives to participate.
The Census Bureau director and other staff made numerous public appearances throughout the census period to stress the importance of a complete count and respond to questions and concerns. The director also put into place a program to use the Internet to challenge communities to raise their mail response rates. The 1990 response rates were posted for local areas on the Bureau’s web site beginning in mid-March, and 2000 response rates were regularly updated on the site through mid-April. Communities were challenged to exceed their 1990 rates by 5 percent. Although few communities achieved this goal, the overall response rate did not continue its decline from previous censuses.
The 1990 census had also included advertising and outreach efforts; however, their extent was less than in 2000. The advertising was prepared by a firm selected by the Advertising Council, which conducted its work on a pro bono basis. Ads were placed as public service announcements, which meant that many ads ran in undesirable times (e.g., middle of the night). The partnership program was not as extensive as in 2000.
In both censuses, perhaps more so in 2000, advertising and outreach efforts varied in intensity across the country. Some localities were more active than others in coordinating and supplementing outreach and media contacts. Whether this variability narrowed or widened the difference in net undercount rates among major population groups depends on the extent to which outreach efforts were more (or less) effective in hard-to-count areas in comparison with other areas.
Data processing for the 2000 census was a continuing, high-volume series of operations that began with the capture of raw responses and ended with the production of voluminous data products for the user community, which will be made available in 2001–2003.11 Important innovations were adopted for 2000. For the first time, the Census Bureau contracted with outside vendors for major components of data processing. Also for the first time, data capture was carried out with optical mark and optical character recognition technology. A telecommunications network linked Census Bureau headquarters in Suitland, Maryland; 12 permanent regional offices; the Bureau’s permanent computer center in Bowie, Maryland; 12 regional census centers and the Puerto Rico
Area Office; the Bureau’s permanent National Processing Center in Jeffersonville, Indiana; 3 contracted data capture centers in Phoenix, Arizona, Pomona, California, and Baltimore County, Maryland; 520 local census offices; and contracted telephone centers for questionnaire assistance (U.S. Census Bureau, 1999:XI-1).
Five operations in 2000 are described in this section: data capture, coverage edit and telephone follow-up, unduplication, editing and imputation, and other data processing. Data processing operations for 1990 are also summarized.
The first step in data processing was to check in the questionnaires and capture the data on them in computerized form. The return address on mailback questionnaires directed them to one of four data capture centers—the Bureau’s National Processing Center and three run by contractors. Each questionnaire had a bar code that was scanned to record its receipt. The questionnaires were then imaged electronically, check-box data items were read by optical mark recognition (OMR), and write-in character-based data items were read by optical character recognition (OCR). Clerks keyed data from images in cases when the OMR/OCR technology could not make sense of the data. Images of the long-form items were set aside temporarily to permit the fastest possible processing of short-form data.
Coverage Edit and Telephone Follow-up
The data on the questionnaires were reviewed by computer to identify those returns that failed coverage edit specifications. These failed-edit cases were reinterviewed by telephone, using contractor-provided clerical telephone staff. The workload for the coverage edit and telephone follow-up operation totaled about 2.3 million cases. It included returns that reported more household members in question one (“How many people were living or staying in this house, apartment, or mobile home on April 1,2000?”) than the number of members for which individual information (e.g., age, race, sex) was provided; mailed-back returns in which question one was left blank and individual information was provided for exactly six people (the limit of the space provided); mailed-back returns that reported household counts of seven people or more; and returns of four or more people that contained nonrelatives of the household head.
The purpose of the edit and telephone follow-up was to reduce undercounting of people in large households and nonfamily households. There was no field
follow-up for failed-edit households for which telephone follow-up was unsuccessful. Because of computer problems, the start of the coverage edit and telephone follow-up operation was delayed. Originally planned to be conducted in April-June 2000, it was carried out in May through mid-August.
Unduplication of Households and People
Two major, computer-based unduplication operations were carried out subsequent to field follow-up. One of those operations, the use of the primary selection algorithm (PSA) to unduplicate multiple returns for the same address, was planned from the outset and is described below. The other operation, the use of special software and procedures to reduce duplication of addresses in the MAF, was planned and implemented in summer 2000 to respond to evidence of duplicate addresses not eliminated by previous processing (described in “Master Address File,” above). The PSA and MAF unduplication operations were linked: final determination of which returns to delete from the census because they duplicated a return from another MAF address was not made until after the PSA had processed multiple returns for the same address.
The purpose of the PSA was to identify unique households and people to include in the census when more than one questionnaire was returned with the same census address identification number. Such duplication could occur in a number of ways: when a respondent mailed back a census form after the cutoff date for determining the NRFU workload and the enumerator then obtained a second form from the household (or perhaps identified the household as vacant); when someone was enumerated in a group quarters but provided another “usual” address to which his or her information was assigned; or when a respondent filled out a “Be Counted” form, thinking that he or she had been missed, but another member of the household also mailed back a questionnaire for the household (which might or might not contain information for the individual).
For each housing unit, returns with one or more persons in common were combined to form a single PSA household. All vacant returns for a housing unit were also combined to form a PSA household. In some cases more than one PSA household might exist for a unit. For each PSA household, the algorithm selected which return best represented the Census Day household (“basic” return) and which people from the other returns were part of that household.12
In all, 9 percent of census housing units had two returns and 0.4 percent had three or more returns. In most instances, the operation of the PSA discarded duplicate household returns or extra vacant returns. Less often, the PSA found additional people to assign to a basic return or identified more than one household at an address (see Baumgardner et al., 2001:22–27).
Editing and Imputation
Editing and imputation were carried out for all data-captured questionnaires. This operation included whole person and whole household imputation, called substitution, when there was minimal or no information for the person or household; editing content items for consistency and to fill in a missing item on the basis of a related item (e.g., to calculate age when only date of birth was provided); and imputation of specific content items, called allocation, when values were missing for one or more items.
All editing and imputation were computer based; there was no clerical editing of the questionnaires as in past censuses. When it was not possible to perform an edit that used other information for the same person or household, imputation was performed with hot deck methods that made use of information for other people and households in the immediate neighborhood. First used in processing the 1960 census, the Census Bureau’s computerized hot deck procedures have been refined to search for the best match for a person or household missing one or more related data items on the basis of a large number of known characteristics. The best match search is geographically restricted to take advantage of common characteristics among small-area populations (see Box A-3; see also Citro, 2000b).
Household and Person Imputation
There were 5.8 million people imputed or substituted in the 2000 census, amounting to 2.1 percent of the census household population count (Schindler, 2001). Substituted people broke down into three main groups:
1.172 million people (0.4% of the household population) were substituted because there was no information about the number of people living at that address or their characteristics. For units reported as occupied but for which household size was not known, the imputation process first categorized them as units at single-unit or multi-unit addresses. Then, household size was imputed from an occupied unit at a single-unit or multi-unit address with a reported population count from an enumerator-completed form. (In a refinement from 1990, mail returns that were not subject to field follow-up activities were excluded from the donor pool.) A similar process was followed for units for which occupancy status was not clear (the donor pool consisted of occupied and vacant units from enumerator-completed forms), and for units for which it was not even clear that they existed (the donor pool consisted of occupied, vacant, and deleted units from enumerator-completed forms). A potential donor record could be used as a donor only once and, in general, was selected from the same census tract as the unit requiring imputation. After imputing household size (and, if necessary, first imputing occupancy status and status as a housing unit), the computer duplicated another occupied
BOX A-3 Imputation Methods and Uses
It is standard practice to process censuses and surveys to review the input data, employ editing techniques to reconcile inconsistent or anomalous answers for a person or household, and employ imputation techniques to provide values for missing responses by making use of information reported for other items, persons, or households. In surveys, reweighting is often used to adjust for cases in which there is no information for a respondent.
Why Perform Editing and Imputation?
The reason to supply values for missing data and perform other edits is that the resulting data set is more useful for its intended uses, particularly when the data have multiple purposes and serve different users. The alternative of deleting records that have any missing values is to reduce the data that are available for analysis. Moreover, such a reduced data set may exhibit biases that a well-designed imputation system will moderate, or at least not make worse.
An example of a simple edit is when age or relationship is changed according to a specified rule when they are inconsistent (e.g., a child of the household head is reported as older than the head). Another example is when an item that was not supposed to be answered is changed from a reported value to a “not applicable” code (e.g., when hours worked last week is reported on the long form for someone who is unemployed).
Clerks reviewed census responses and noted errors as early as the 1830 census. By the end of the 19th century, clerical editing procedures had become quite elaborate. Computers were first used for machine editing in the 1960 census, although some clerical editing and follow-up was still conducted in censuses through 1990. Editing of data content was completely computerized in 2000.
The first use of imputation to supply values for missing data took place in the 1940 census when a method was devised to impute age for people who did not report their age. The 1960 census employed computers for imputation, using cold decks and hot decks.
The hot deck method was developed and refined so that by the 1980 census, the computer could search for the best match for a person or household missing one or more related data items on the basis of a large number of known characteristics instead of the one or two characteristics used in the past.
Sometimes imputation supplies values for all characteristics for some or all persons in a household by replicating (substituting) the record of a neighboring person or household. More often, imputation fills in the values for one or a few characteristics that are missing.
Cold and Hot Decks
Cold decks were originally sets of punched cards that contained numeric values to represent known distributions of answers to questions in a previous census or survey. (The term deck continued to be used even after the distributions were provided to computers in other media.) A cold deck might, for example, contain a random sequence of values for marital status such that a certain percentage of men would have the value for “married” assigned to them. The values in the cold deck would be assigned sequentially to men not reporting their marital status.
Hot decks, in contrast, are distributions of values that are constantly altered as questionnaires are processed and data for the latest person or housing unit are substituted for the values already in the hot deck matrix. Imputation for a missing entry is made from the latest value stored in the matrix that fits other known characteristics of the person or housing unit.
Hot decks have the advantages that they use data from the current, not past, census for imputation, that they preserve more of the variability in responses that occurs in the population, and that they take advantage of common characteristics among households in the same small geographic area.
housing unit record of the same size in the nearby area to provide characteristics for people in the household (see Griffin, 2001).
2.269 million people (0.8% of the household population) were substituted because the number of persons was known for their household but no other information was available. For these households, the computer duplicated another housing unit record in the nearby area of the same household size.
2.333 million people (0.9% of the household population) were substituted because no information was provided for them, although other members of their households had data reported. This situation could occur, for example, when a large household listed more than six people and the telephone follow-up was not successful in reaching the household to obtain information for the additional members. For these people, the computer duplicated a person from a nearby housing unit with the same characteristics as the unit with person(s) requiring substitution.13
Content Editing and Imputation
For short-form content items, editing and imputation rates for missing values were low: 1.1 percent for sex, 4.3 percent for age, 3.2 percent for race, 3.8 percent for Hispanic origin, 1.6 percent for household relationship, and 4.2 percent for housing tenure.14 These rates are for people who were missing one or more but not every short-form item (i.e., they exclude substituted people). In many instances, it was possible to fill in an answer from other information for the person or household, so that rates of hot deck imputation for short-form items were lower: 0.2 percent for sex, 2.9 percent for age, 3.2 percent for race, 3.4 percent for Hispanic origin, 1.3 percent for household relationship, and 3.6 percent for housing tenure. Information about editing and imputation rates for long-form content items is not yet available.
Other Data Processing
A number of other data processing steps were carried out, or are still in process, to generate data files and publications from the 2000 census records. Such steps for the short-form records include tabulating the data on various dimensions and modifying the data appropriately on files that are to be released
Terminology has not been consistent across censuses for the process of imputation. “Substitution” most often refers to cases when an entire household is imputed. When individual people are imputed into a household with other respondents, they are often referred to as “totally allocated persons,” as distinct from allocations for one or a few missing items.
Item edit and imputation rates are from tabulations by panel staff from U.S. Census Bureau, E-Sample Person Dual-System Estimation Output File, February 16, 2001 (weighted using TESFINWT). The rate for age excludes cases in which it was possible to estimate age from date of birth and vice versa. See also Chapter 6.
to the public in order to protect the confidentiality of individual responses. For the long-form records, there are the added steps of coding such variables as occupation and industry and weighting the records to short-form control totals on several dimensions.
Comparison: 1990 Data Processing
The 1990 census data processing system was more decentralized than in 2000 and made more use of clerical editing (see National Research Council, 1995:App.B). There were 7 processing offices and 559 district offices. Mailback questionnaires in district offices in hard-to-enumerate areas in central cities went directly to a processing office for check-in and data capture. Mailback questionnaires in other district offices and all enumerator-obtained returns went first to the district office for check-in and editing.
Mailback questionnaires sent to processing offices were checked in by scanning bar codes. The data were then captured by using the Census Bureau’s Film Optical Sensing Device for Input to Computers (FOSDIC), first developed for the 1960 census (Salvo, 2000). The computerized records were put through edit checks to identify households that had not provided complete data or would otherwise need telephone or personal visit follow-up (see “Field Follow-Up,” above). Once any further data had been received from the field, computerized editing, allocation, and imputation routines were used to fill in remaining missing or inconsistent data
Mailback questionnaires sent to district offices were checked in by scanning bar codes and then reviewed by clerks to identify cases that required follow-up. After completion of follow-up, the questionnaires were sent to the processing offices for data capture and computerized editing and imputation.
Another step in data processing included the search/match operation, in which forms received from various activities were checked against completed questionnaires for the same address to determine which people should be added to the household roster and which were duplicates. This operation was carried out for “Were You Counted” forms, parolee/probationer forms, and for people who sent in a questionnaire from one location with an indication that their usual home was elsewhere. Such people might have two homes, such as people who spend the winter in a southern state and the summer in a northern state. There was no way on the 2000 form to indicate usual home elsewhere.
At the conclusion of data processing in 1990, about 1.9 million people had their information imputed (substituted) from data for another person. Substituted people accounted for 0.8 percent of the household population in 1990, compared with 2.1 percent in 2000. Obtaining comparable rates of imputations of characteristics for people with partial data is difficult. It appears that rates of editing and imputation for short-form items were similar in 1990 and 2000— somewhat lower for some items and somewhat higher for other items.