This appendix provides additional detail on the operations of the 2000 census, noting differences from 1990 census procedures. It covers five topics:
the Master Address File (MAF) (including local review and internal checks for duplicate addresses);
questionnaire delivery and mail return (including redesign of mailings and materials and multiple response modes);
field follow-up (including nonresponse follow-up, NRFU, and coverage improvement follow-up, CIFU);
outreach efforts; and
data processing (including data capture, coverage edit and telephone follow-up, unduplication of households and people, and other processing).
Two important parts of census data processing—editing and imputation—are described in greater detail in separate appendices for the basic (complete-count) data (Appendix G) and the long-form-sample data (Appendix H). General theory and approaches to item imputation are discussed in Appendix F.
C.1 MASTER ADDRESS FILE
The 2000 census was conducted primarily by mailing or delivering questionnaires to addresses on a computerized mailing list—the MAF—and asking residents to fill out the questionnaires and mail them back.1 The Census Bureau first used mailout/mailback techniques with an address list in the 1970 census,2 but the procedures to develop the 2000 MAF differed in several important respects from those used in past censuses (see Working Group on LUCA, 2001; Owens, 2000). The major difference from 1990 was that the 2000 MAF was constructed using more sources.
C.1.a Initial Development
The Census Bureau used somewhat different procedures to develop the MAF for areas believed to have predominantly city-style mailing addresses (house number and street) than for areas believed to have predominantly rural route and post office box mailing addresses (see Box C.1). City-style areas were those inside the “blue line,” and non-city-style areas were those outside the “blue line.”3 For areas inside the blue line, the Bureau expected to have U.S. Postal Service carriers deliver questionnaires to most addresses on the list; for areas outside the blue line, the Bureau expected to use its own field workers to deliver questionnaires.
For remote rural areas, which have less than 1 percent of the population, Census Bureau enumerators developed the address list concurrently with enumerating households in person. For special places in which people live in nonresidential settings, such as college dormitories, prisons, nursing homes and other group quarters, the Bureau used a variety of sources to develop an address list.
Inside the “Blue Line”
As the starting point for the MAF for city-style areas inside the blue line, the Census Bureau took the 1990 census address list for these areas and updated it from the Delivery Sequence File (DSF) of the Postal Service. The DSF contains a listing of addresses to which mail is delivered, ordered by carrier routes. It is updated regularly. The Census Address List Improvement Act of 1994 (P.L. 103-430) allowed the Postal Service to share the DSF with the Bureau.
Although not part of its original plan, the Bureau determined that a complete field check of the city-style list should be conducted, which was done in a block canvass operation for all mailout/ mailback areas conducted in January–May 1999. The reason for the complete block canvass was the determination that the DSF was not as accurate or as up to date in all areas as needed for the MAF. The Bureau also provided an opportunity for local review in 1998 and 1999 (see Section C.1.b). Approximately 101 million addresses were included in the MAF for areas inside the blue line at the time when questionnaires were labeled and prepared for mailing in July 1999. The Postal Service conducted an intensive check of the DSF in early 2000, and updates were made to the MAF based on that check prior to questionnaire delivery.
Outside the “Blue Line”
To develop the MAF for non-city-style areas, the Bureau first conducted a complete address listing operation in July 1998–February 1999. The 1990 list was not used. There was also a local review program for areas outside the blue line in 1999. Approximately 21 million addresses were included on the MAF for areas outside the blue line at the time when questionnaires were labeled and prepared for delivery. Census enumerators further updated the MAF in these areas when they delivered questionnaires in February–March 2000.
C.1.b Local Review
The Census Address List Improvement Act of 1994, which allowed the Postal Service to share the DSF with the Census Bureau, also permitted the Census Bureau to invite local governments to review the MAF for their areas and provide additions, deletions, and
2000 CENSUS MASTER ADDRESS FILE
City-Style Areas (mailout/mailback areas inside the “blue line”)
Non-City-Style Areas (update/leave areas outside the “blue line”)
1990 ADDRESS CONTROL FILE
corrections to the Bureau (Working Group on LUCA, 2001).4 The Local Update of Census Addresses (LUCA) Program—covering counties, places, and minor civil divisions, over 39,000 jurisdictions in all—was conducted separately in areas inside the blue line (LUCA 98) and areas outside the blue line (LUCA 99). There was also a Special Places LUCA Program.
LUCA required participating local governments to sign a pledge to treat the address list as confidential. The program involved several steps of local review, field verification by the Bureau, and appeal to the U.S. Office of Management and Budget when localities disagreed with the Bureau’s decision to reject local changes to the MAF. Due to time constraints, some planned LUCA operations were combined and rescheduled (see Chapter 4, Table 4.2). In response to local concerns, a New Construction LUCA Program was added to give localities inside the blue line an opportunity during January–March 2000 to identify newly constructed housing units. Addresses identified in the program were not mailed questionnaires; instead, they were visited by enumerators during the coverage improvement follow-up operation in summer 2000.
Of the 39,051 jurisdictions that were eligible for either or both LUCA 98 or LUCA 99, it is estimated that 25 percent participated fully in one or both programs by informing the Census Bureau of
needed changes to the address list for their area (Working Group on LUCA, 2001:Ch.2). Participation varied by such characteristics as geographic region of the country, population size of jurisdiction, type of government, and city-style or non-city-style area (see Chapter 4, Table 4.4).
C.1.c Further Development of MAF
MAF was a dynamic file during the operation of the census. Not only were addresses added from each stage of census field operations, they were deleted in an effort to minimize duplicate and erroneous entries. In total, the Census Bureau estimates that about 4 million addresses were added to the MAF during census field operations—2.3 million addresses during questionnaire delivery in update/leave, update/enumerate, and list/enumerate areas (see Section C.2) and 1.7 million addresses during follow-up. About 10.4 million addresses were removed as duplicative of other addresses or nonexistent. About 5 million of these addresses were removed on the basis of two internal consistency checks, one of which was planned and the other of which was designed and implemented while the census data were being processed; the remaining addresses were deleted on the basis of field operations (see Section C.3). Whether the combination of internal checks and field checks reduced duplicate and erroneous addresses to a minimum or went too far or not far enough is a matter for evaluation (see Section 4-E.2). The final number of occupied and vacant housing units counted in 2000 was 115.9 million (Farber, 2001a:Tables 1, 2).
C.1.d Internal Checks for Duplicates
Reducing the NRFU Workload
In April 2000 the Census Bureau conducted an internal consistency check of the MAF prior to the beginning of nonresponse follow-up in order to remove from the NRFU workload as many addresses as possible that could clearly be identified as duplicative or nonexistent (Miskura, 2000a). At the conclusion of this operation, 3.6 million addresses were dropped or merged with another MAF address.
One source of potential duplicates and errors came about because LUCA—essentially, a new, untested program—did not run as smoothly as intended (Working Group on LUCA, 2001). Because of delays in providing materials to local governments to review, the Census Bureau agreed to include every address provided by a LUCA participant on the MAF that was used to label questionnaires in July 1999, even when there had not been time to verify the address in the field. LUCA-supplied addresses that the Bureau believed likely did not exist, based on field checks after July, were flagged. Processing specifications were developed to delete many of these addresses and other addresses of doubtful existence when no questionnaire was returned for them. In all, 2.5 million addresses that the Bureau had reason to believe did not exist were deleted from the MAF prior to nonresponse follow-up.
Also as part of this review, the Bureau attempted to identify duplicate addresses originating from LUCA or other sources. About 1.1 million addresses were merged with another address on the MAF when the addresses appeared to be exact duplicates. Follow-up was conducted either only for the one (merged) address or not at all if a questionnaire had been received for that address.5
Unduplication and Late Additions
Another important set of MAF internal checks, not previously planned, was put into place in summer 2000. From evaluations of MAF housing unit counts during January–June 2000 against estimates prepared from other sources, such as building permits, the Census Bureau determined that there were likely still a sizable number of duplicate addresses on the MAF (West and Robinson, 2001). Field verification carried out in June 2000 in a small number of localities substantiated this conclusion (Nash, 2000).
Consequently, the Bureau mounted a special operation to identify duplicate addresses and associated duplicate census returns to remove them from the MAF and the census. Software was written for this operation to match addresses and person records to identify potential duplicates. The flagged records were deleted from the
If questionnaires were received for two addresses that were deemed to be exact duplicates, the Primary Selection Algorithm checked for duplicate enumeration and determined the census household (see Section C.5.c).
census file of valid, completed returns and further examined. After examination, it was decided that a portion of the potential duplicates were likely valid returns for addresses not already in the census, and they were restored to the census file (late additions). At the conclusion of the operation, 1.4 million housing units and all 3.6 million people in those units were permanently deleted from the census file, from a total of 2.4 million housing units and all 6.0 million people in those units that had been initially flagged as potential duplicates (Miskura, 2000b).
C.1.e Comparison: Address List Development in 1990
The procedures used to develop the 1990 Address Control File (ACF) differed in important respects from those used to develop the 2000 MAF (see Box C.1). Overall, the Census Bureau used fewer sources in developing the 1990 ACF than it used for the 2000 MAF; also, the 1990 local review operation was considerably less extensive than the 2000 LUCA Program (see National Research Council, 1995b:App.B).
For 1990 in areas with city-style addresses, the Census Bureau made no use of the 1980 census address list or the Postal Service DSF. Instead, the starting point for the ACF was two files of lists purchased from vendors, supplemented by a field listing operation carried out by census field staff in summer 1988 (precanvass). The Postal Service performed several reviews of the list in 1988–1990; Bureau staff also checked the part of the ACF that derived from commercial lists in a block canvass in 1989. Governmental jurisdictions in the city-style areas were given an opportunity for review in summer 1989; however, they could not review specific addresses but only counts of addresses at the block level. About 16 percent of eligible local governments responded, adding about 400,000 housing units to the ACF (Bureau of the Census, 1993:6-44). By comparison, twice as many eligible governments—36 percent—participated in the LUCA 98 Program in city-style areas.
In areas with non-city-style addresses, the development of the 1990 address list was similar to that in 2000; census field staff conducted an address listing operation in fall 1989. Census enumerators also checked the list in March 1990 when they delivered questionnaires in the areas in which the update/leave technique (new for
the 1990 census) was used. However, there was no precensus local review program for the ACF in these areas.
C.2 QUESTIONNAIRE DELIVERY AND MAIL RETURN
The 2000 census, like the 1980 and 1990 censuses, was conducted primarily by delivering questionnaires to households and asking them to mail back a completed form. Procedures differed somewhat depending on such factors as type of addresses in an area and accessibility; in all, there were nine types of enumeration areas. Box C.2 provides brief descriptions of the nine types in 2000.
The two largest types of enumeration areas were: (1) mailout/ mailback, covering almost 82 percent of the population, in which Postal Service carriers delivered questionnaires, and (2) update/ leave/mailback (usually termed update/leave), covering almost 17 percent of the population, in which Census Bureau field staff delivered questionnaires and updated the MAF at the same time. These two types, together with small numbers of addresses in areas (6), (7), and (9), comprised the mailback universe, covering about 99 percent of the household population (calculated from Baumgardner et al., 2001). The remaining 1 percent of the household population was counted by census enumerators (see areas (3), (4), (5), and (8) in Box C.2). Separate enumeration procedures were used for such special populations as homeless people, residents of group quarters, and transients (see Citro, 2000c).
Approaches to boost mail response were to redesign the questionnaire and mailing package, adapt enumeration procedures to special situations (the reason for having nine types of enumeration areas), and allow multiple modes for response. Advertising and outreach efforts were also expanded from 1990 (see Section C.4).
The final mail response rate in 2000 (67 percent) was slightly higher than the rate in 1990 (65 percent); it was also considerably higher than the rate that was budgeted (61 percent), which reduced the burden of field follow-up. The mail return rate in 2000 (78 percent) was higher than the rate in 1990 (75 percent). This rate is a more refined measure of public cooperation than the mail response rate, which includes vacant and nonresidential addresses in the denominator in addition to occupied housing units (see Chapter 4, Box 4.1).
NOTE: For details, see U.S. Census Bureau (1999b).
C.2.a Redesign of Mailings and Materials to Boost Response
To encourage mail response, a new questionnaire format was adopted for 2000. Based on extensive research (see National Research Council, 1995b:Ch.6), a design was chosen that appeared as attractive and easy to fill out as possible. The mailing package was also redesigned to distinguish the questionnaire from junk mail and to motivate response (e.g., the envelope noted that responses were required by law).
One design change for the questionnaire was to ask households to list all members but to limit the space for characteristics to six members—instead of seven, as in 1990—in order to make the questionnaire less intimidating. It was planned to follow up households with more than six members by telephone (see Section C.5).
In the mailout/mailback area, multiple mailings were used to increase response. The first mailing was an advance letter (a new approach for 2000). The purpose of the letter was to alert residents to watch for the questionnaire, to provide a means for them to request a questionnaire in a language other than English, and to inform them of employment opportunities in census local offices. The second and third mailings were the questionnaire and a reminder postcard.
In both mailout/mailback and update/leave areas, the Bureau originally planned to deliver a second questionnaire to households not returning a form. Early testing showed that the use of a second questionnaire could increase mail response rates by as much as 10 percent (National Research Council, 1995b:120). However, the Bureau determined that vendors could not process the list of nonresponding households quickly enough to be able to mail out a replacement questionnaire on the schedule required. Mailing a second questionnaire to all households, as was done in the 1998 dress rehearsal, was deemed too expensive and likely to lead to negative publicity and confusion.
The advance letter operation did not proceed as smoothly as hoped. A programming error resulted in the addition of an extra digit to the beginning of every street address in the mailing; however, the barcode included on the letter used the correct address information, so that the Postal Service’s sorting machines processed the letters properly. Hence, the addressing error did not stop the
letters from being delivered.6 In addition, the final version of the letter was not fully tested when it was decided after the 1998 dress rehearsal to add to the letter a way to request a foreign-language questionnaire. There was considerable public confusion about what to do with the enclosed return envelope if one did not need a special questionnaire. However, there were no apparent untoward effects of these problems on the public’s cooperation with the census, and the publicity may have been helpful in alerting people to the need to respond.
C.2.b Multiple Response Modes
Another innovation for 2000 to encourage response was to allow multiple response modes. Households that received a short-form questionnaire could fill out a short form on the Internet or by telephone. To answer questions and also permit telephone response, the Bureau contracted with a commercial phone center to operate a toll-free telephone questionnaire assistance system. This system provided assistance in English, Spanish, and several other languages. Individuals could also pick up “Be Counted” forms, which were made available in six languages at various local sites throughout the country just prior to Census Day.
Because multiple response modes might not only boost return rates but also result in more responses that would require address verification and unduplication, the Census Bureau did not promote the Be Counted Program vigorously. Also, it did not widely publicize the Internet response option because of concerns about being able to handle a large response and maintain security. As it turned out, of 76 million questionnaires that were returned by households, 99 percent arrived by mail and only 1 percent by other modes: 66,000 were Internet returns; 605,000 were Be Counted forms; and 200,000 were forms completed by telephone. Not all Be Counted and telephone forms were included in the census: they were not counted if they did not have a valid address or if they duplicated another return.
See Prepared Statement of Kenneth Prewitt, Director, U.S. Bureau of the Census, before the Subcommittee on the Census of the House Committee on Government Reform, March 8, 2000; http://www.census.gov/dmd/www/3-8-00.html [9/20/03].
C.2.c Comparison: 1990 Questionnaire Delivery and Return
Questionnaire delivery procedures in the 1990 census differed in some respects from those used in 2000 (National Research Council, 1995b:App.B). In 1990 about 84 percent of total housing units were in mailout/mailback areas; 11 percent—less than in 2000—were in update/leave areas (update/leave was a new procedure in 1990); and 5 percent—more than in 2000—were in list/enumerate areas. The list/enumerate procedure in 1990 differed somewhat from that used in 2000: Postal Service carriers delivered unaddressed short-form questionnaires to housing units in 1990, and census enumerators then came by to pick up completed questionnaires or obtain the answers, list the housing units in an address register, and at a predesignated subset of units, collect responses to the sample (long-form) questions. In 2000 Census Bureau field staff took questionnaires with them as they listed housing units and enumerated residents at thesametime.
The 1990 census mailout procedures had not included an advance letter; however, a reminder postcard was delivered to all addresses in both mailout/mailback and update/leave areas. Responding by the Internet (which did not exist) was not an option. The questionnaire was designed not to facilitate response as much as to facilitate ready data capture (see Section C.5).
Overall, the mailing strategies used in the 1990 census did not appear to help mail response. The mail response rate declined from 75 percent in 1980 to 65 percent in 1990; the mail return rate declined from 81 percent in 1980 to 75 percent in 1990.7
C.3 FIELD FOLLOW-UP
Because not all households will mail back a form and because many addresses to which questionnaires are delivered will turn out to be vacant or nonresidential, the 2000 census, like previous censuses, included a large field follow-up operation (see Thompson, 2000). Over 500 local census offices (LCOs) were set up across the country, which reported to 12 regional census centers. The LCOs
were responsible for hiring the temporary enumerators and crew leaders who would be needed to conduct follow-up operations. In update/leave areas, enumerators were hired to deliver questionnaires prior to Census Day and to return to follow up nonresponding households and vacant units. LCOs also carried out operations to enumerate special groups, such as group quarters residents, transients, and the homeless.
In anticipation of possible difficulties in hiring and also the possibility that the mail response rate would decline from 1990, LCOs were authorized to recruit aggressively in advance of Census Day, to hire more enumerators than they thought would be needed, and to pay above-minimum wages (which differed according to prevailing area wages). Most offices were successful in meeting their hiring goals before the first follow-up operations began in mid-April 2000.
Follow-up operations were carried out in two separate stages, discussed below. The first stage, conducted in April–June, was the nonresponse follow-up, designed to obtain a questionnaire from every nonresponding unit in the mailback universe, occupied and vacant (or to determine that an address was nonresidential). The second stage, conducted in June–August, was called coverage improvement follow-up (CIFU), which included specific operations designed to check and supplement NRFU. Several operations included in CIFU for 1990 were dropped for 2000.
C.3.a Nonresponse Follow-Up
Preparation for NRFU began in early April 2000. Lists of addresses for inclusion in the NRFU workload were provided to the LCOs the week of April 11; a week later, notification was sent of late mail returns, which the LCOs had to delete manually from their follow-up lists. Also, the LCOs had to add information about surnames to their follow-up lists. The surname information was intended to help enumerators collect data accurately in situations in which questionnaires were misdelivered in multiunit structures and rural areas with clustered mailboxes. Because of a programming error, the surname information had to be sent separately to the LCOs (U.S. General Accounting Office, 2000c:11).
The final NRFU workload totaled 41.7 million addresses. This total included addresses in the MAF for which a completed ques-
tionnaire was not checked in prior to April 18 and new addresses from DSF updates. It also included addresses marked for deletion in the update/leave operation and addresses for which postal carriers returned the questionnaires as not deliverable and no attempt was made to redeliver them by census staff. The purpose of NRFU for these addresses was to doublecheck their status and, if they were in fact housing units, to obtain an enumeration.8
In most LCOs, NRFU enumerators went into the field beginning April 27. Their first objective was to visit each nonrespondent housing unit in person to try to obtain an interview, even if the residents said they had already mailed back a form,9 or to obtain selected housing characteristics for vacant housing units. If unsuccessful, the enumerators were to try up to five additional times to obtain an interview, unless the residents were known to be out of town for an extended period or the housing unit was verified to be vacant or nonexistent by a proxy respondent (someone not a member of the household, such as a neighbor or landlord). Three of the follow-up attempts could be made by telephone if the enumerator could obtain a phone number. In the case of refusals, field observations indicated that some offices adhered to the six-visit rule, sometimes using different enumerators, while others allowed the use of proxy respondents without making all six visits. If no interview was obtained after the specified number of visits, then enumerators were instructed to obtain information from a proxy respondent, noting the name and address on the interview form. When an office had obtained information for 95 percent of its workload, the best enumerators were to be given the remaining cases to make one last attempt to obtain information from the household or a proxy, even if fewer than six visits had been made to the household. Some offices required that at least three visits be made to a household before allowing a last attempt.
Conducted concurrently with the NRFU enumeration was a quality assurance program, in which selected cases were reinter-
viewed to identify fabrication (“curbstoning”). A random sample of the workload of each enumerator was reinterviewed; also, cases were selected purposively for reinterview by identifying enumerators whose work did not match that of other enumerators in the area. About 6 percent of the workload was reinterviewed in all, and preliminary analysis found discrepant results in 3 percent of the reinterview batches. The quality assurance reinterview process was delayed in some LCOs. Also, some reinterview forms were lost or not filled out correctly, so that analysis of the reinterview results must be interpreted with caution (see Baumgardner et al., 2001:17; see also Morganstein et al., 2003, who describe flaws in the quality assurance efforts).
NRFU operations were completed in most LCOs by June 26, so that the entire operation took only 8 weeks, shortening the original schedule by 1 week. At the conclusion of NRFU, enumerators had classified 62.3 percent of the 41.7 million addresses in their workload as occupied, 23.4 percent as vacant, 14.3 percent as “delete” (e.g., because the address identified a demolished structure or was nonresidential), and a handful (0.01 percent) as “not resolved” (Baumgardner et al., 2001:Table 4).
C.3.b Coverage Improvement Follow-Up
The coverage improvement follow-up effort that followed NRFU included several operations that involved about 8.9 million housing units. The largest portion of the workload comprised 6.5 million housing units that had been classified as vacant or delete in NRFU. These units, which CIFU checked to determine if they might have been occupied on Census Day, were only 41 percent of total addresses identified as vacant in NRFU. If such an address had not already been marked vacant or delete in another operation, it was revisited, but not otherwise. Examples of vacant or deleted units not included in CIFU were those classified as vacant or delete by an update/leave enumerator and a NRFU enumerator and those marked as undeliverable by a postal carrier and classified as vacant by a NRFU enumerator.10 In addition, vacant units that NRFU enumerators had classified as “seasonal” were not checked in CIFU.
There were five other components of the CIFU workload to visit or revisit: (1) 775,000 addresses that were added to the MAF in update/leave and urban update/leave, but from which no questionnaire was mailed back; (2) 372,000 addresses that were added to the MAF from the New Construction LUCA Program (in city-style areas); (3) 540,000 blank mail returns (including a small number of forms that were lost in the process of data capture); (4) 547,000 addresses that were added to the MAF from late updates from the Postal Service DSF; and (5) 86,000 addresses that were visited for some other reason. The fifth category included 62,000 addresses that were reenumerated in Hialeah, Florida; 17,000 addresses from the LUCA appeals process, and 7,000 other addresses (Moul, 2003:Table 2). A separate field operation was conducted to verify addresses on Be Counted forms and those filled out by telephone questionnaire assistance staff.
Addresses initially classified by CIFU itself as vacant or delete that had not been visited in any previous operation (e.g., an address added from the New Construction Program) were revisited for quality control purposes. The entire NRFU workload for one district office, in Hialeah, Florida, was reenumerated because of problems that came to light in that office (content was not being collected and sometimes not even the population count). Selected housing units were reenumerated in seven other offices for which problems were identified. The operations in 15 local offices were questioned by the House Subcommittee on the Census, but the Census Bureau determined, on review, that only two of these offices warranted some reenumeration. (These two offices are included in the total of seven in which partial reenumeration occurred.)
Overall, CIFU determined that 27 percent of the 8.9 million housing units visited were occupied, 43 percent were vacant, and 30 percent should be deleted. (Almost no units had an unresolved status at the end of CIFU; Moul, 2003:Table 3.) CIFU enumerators were most likely to find occupied units among the addresses added in update/leave; they classified 45 percent of these addresses as occupied. Other categories had lower percentages of units classified as occupied: 30 percent for blank mail returns, 27 percent for new construction addresses (52 percent of these addresses were not yet completed and so were deleted); and 23 percent for addresses classified as vacant or delete in NRFU that were checked in CIFU (Moul,
2003:Table 5). The percentage of NRFU vacant and delete addresses that CIFU reclassified as occupied, however, was 2 to 3 times the percentage of vacant and delete units found to be occupied in previous censuses for which a vacancy recheck was carried out (see below). The reason may be that, as noted at the beginning of the section, CIFU rechecked less than half of the addresses that were classified as vacant or delete by NRFU.
C.3.c Comparison: 1990 Field Follow-Up and Coverage Improvement
NRFU procedures in 1990 were similar in broad outline to the procedures used in 2000 (see Bureau of the Census, 1993:Ch.6). The NRFU enumerators were instructed to visit each household in person. If an enumerator could not obtain an interview but was able to obtain a telephone number, then he or she was to make up to five additional attempts to interview the household—three telephone attempts and two more personal visits at different times of the day. If the enumerator did not have a telephone number, he or she was to make two additional personal visits. When all of these attempts failed to result in an interview or if the case was a refusal or the respondent was away for an extended period of time, the enumerator was instructed to talk to someone outside the household to obtain “last resort” information. Such information was defined as three of the four characteristics of relationship to head of household, sex, race, and marital status for each household member and the number of units in the structure for each housing unit. When 95 percent of the caseload had been completed, the remaining cases were given to the best enumerators who were to make one last visit to try to gather “closeout” data, defined as at least two characteristics for each household member.
Concurrently with NRFU enumeration, a reinterview program was carried out to detect falsification, similar to the program in 2000. The 1990 quality control program reinterviewed 4.8 percent of the NRFU workload of 34 million housing units and estimated a very low rate of falsification overall (0.09 percent; see Bureau of the Census, 1994:30–34).
In contrast to 2000, the 1990 NRFU operations fell considerably behind schedule, largely because of the Census Bureau’s failure to
forecast the extent of the decline in the mail response rate from 1980 to 1990—the Bureau projected a 70 percent response rate (down from 75 percent in 1980), but the actual rate at the time NRFU began was 63 percent (the rate subsequently rose to 65 percent). The Bureau had to obtain additional appropriations and scramble to hire sufficient workers for NRFU and other follow-up activities; it raised pay rates in 140 of the 449 district offices (equivalent to LCOs) and took other steps to increase productivity. The NRFU operation was planned to take 6 weeks from when it began in late April; however, only 72 percent of the workload was completed by that time (by June 6). Another 18 percent of the workload was completed in 2 more weeks, but it took another 6 weeks—until early August—to complete the remaining 10 percent of the workload (U.S. General Accounting Office, 1992:46).
A subsequent stage of follow-up in 1990 included several coverage improvement procedures (Bureau of the Census, 1993:6-37 to 6-38;6-53 to 6-56). An operation called field follow-up, carried out in June–August, rechecked most units classified as vacant or delete in NRFU. Units that were not rechecked included those in areas with high proportions of seasonal housing or boarded-up buildings, plus units classified as delete by two precensus address update operations and a NRFU enumerator (a more stringent criterion than that used in 2000). By August 1, 5.3 percent of deleted units and 7.1 percent of vacant units that were rechecked in field follow-up were converted to occupied. (The corresponding percentages in 1980 were 7.5 percent deleted units and 10 percent of vacant units converted to occupied.) These figures are considerably below the rate of conversion from vacant or delete to occupied in the 2000 CIFU (24 percent).
In addition to the recheck of vacant and delete units, the 1990 field follow-up operation revisited failed-edit mail returns. These cases were mail returns that failed computer or clerical review with regard to completeness of coverage and content (Bureau of the Census, 1995b:8-10) and for which telephone follow-up was not successful (see Section C.5). Because of backlogs in the telephone follow-up operation for questionnaires handled by processing offices (those from central city areas), after mid-June most questionnaires in these offices that failed the content review and were not resolved by tele-
phone were not sent to field follow-up. The 1990 field follow-up also revisited a number of mailback cases for which there was no record of data capture.
Another 1990 coverage improvement operation was the “Were You Counted” campaign, in which people who thought they had been missed were encouraged by media announcements in June–July 1990 to send in a special form. Those forms with addresses that could be assigned to census geography and with complete content were put through a search and matching operation to determine if they duplicated other forms. There was no field verification of the address, except in the Detroit district office, from which an unusually large number of forms were received.
Another special operation was the recanvass, carried out in July–November 1990, in which selected blocks, including those in high growth areas and those identified by postcensus local review, were relisted. The households were then reenumerated, provided the enumerator determined that the unit existed as of April 1. In all, the Bureau recanvassed more than 650,000 blocks containing about 20 million housing units (20 percent of all units).
Blocks identified for recanvassing by localities came about because in 1990 (though not 2000), local jurisdictions nationwide were invited to review preliminary census counts of housing units by block for their areas (Bureau of the Census, 1993:6-45 to 6-46). The counts were provided in August 1990, and localities had 15 days to challenge them. Responses were received from about 25 percent of all jurisdictions, including all of the 51 largest cities. All challenged blocks in which the discrepancy between the census count and that provided by the locality exceeded a specified amount were added to the recanvass operation, for which additional funding had to be obtained.
As part of the coverage improvement effort in 1990, in 24 local offices, all households for which the questionnaires reported only one household member were reenumerated. This procedure was implemented in response to allegations in late summer 1990 that enumerators in some offices during the closeout phase of NRFU had recorded households as one-person households without actually obtaining an interview (i.e., they were curbstoning). In addition, seven local offices in New Jersey were identified in which it appeared
that fabrication may have occurred; households in these offices were reinterviewed when the questionnaires indicated household size but recorded no members (Bureau of the Census, 1993:6-55).
Finally, a special program was implemented to improve the coverage of people who were on parole or probation (Bureau of the Census, 1993:6-55). The first step was to contact each state to ask its parole or probation officers to distribute census forms to their assignees to be filled out and mailed back. This operation had a very low response rate, so census enumerators were sent to correction departments in designated counties to obtain information for parolees and probationers from administrative records. No attempt was made to contact parolees or probationers unless their addresses could not be verified. The operation was not completed until late November-early December 1990. The forms obtained were processed through an unduplication operation (see Section C.5); however, subsequent analysis determined that many of the parolee/probationer forms that were accepted in the census count represented erroneous enumerations (Ericksen et al., 1991:43–46).
C.3.d Summary: 1990 and 2000
The description of 2000 and 1990 follow-up procedures makes it clear that they were large-scale, complex operations, similar in broad outline but sufficiently different in detail to make it difficult to compare results across years. It is difficult, for example, to compare results from the 2000 CIFU recheck of vacant and delete units with the 1990 field follow-up vacancy check because of differences in how the workload was defined. Also, it is not clear exactly how such terms as “proxy” (2000), “last resort” (1990), “closeout,” and “non-data-defined” were similar or dissimilar, again complicating the task of comparative evaluation.
One can, however, conclude that the Census Bureau was more successful in 2000 than in 1990 in controlling field follow-up operations and keeping them on schedule. Coverage improvement operations were more focused, and programs that appeared problematic in 1990 (e.g., the parolee and probationers check) were not repeated in 2000.
C.4 OUTREACH EFFORTS
To supplement field operations and special programs to improve population coverage and cooperation with the census, the Census Bureau engaged in large-scale advertising and outreach efforts for 2000. For the first time, the Census Bureau budget included funds ($167 million) for a paid advertising campaign (recommended by National Research Council, 1978). In previous censuses, the Advertising Council arranged for advertising firms to develop ads and air them on a pro bono, public service basis (Anderson, 2000).
The 2000 advertising campaign was extensive, involving a major contractor, Young and Rubicam, which contracted with four other agencies to prepare ads targeted to particular population groups and communities. The advertising ran from November 1, 1999, to June 5, 2000, and included a phase to alert people to the importance of the upcoming census, a phase to encourage filling out the form, and a phase to encourage people who had not returned a form to cooperate with the follow-up enumerator. Ads were placed on television (including one during the 2000 Super Bowl), radio, newspapers, and other media, using multiple languages. Based on market research, the ads stressed the benefits to people and their communities from the census, such as better targeting of government funds to needy areas for schools, day care, and other services.
In addition to the ad campaign, the Census Bureau hired partnership and outreach specialists in local census offices, who worked with community and public interest groups to develop special initiatives to encourage participation in the census. The Bureau signed partnership agreements with over 100,000 organizations, including federal agencies, state and local governments, business firms, nonprofit groups, and others. The Bureau did not fund these groups, but it provided materials and staff time to help them encourage a complete count. A special program was developed to put materials on the census in local schools to inform school children about the benefits of the census and motivate them to encourage their adult relatives to participate.
The Census Bureau director and other staff made numerous public appearances throughout the census period to stress the importance of a complete count and respond to questions and concerns. The director also put into place a program to use the Internet to
challenge communities to raise their mail response rates. The 1990 response rates were posted for local areas on the Bureau’s Web site beginning in mid-March, and 2000 response rates were regularly updated on the site through mid-April. Communities were challenged to exceed their 1990 rates by 5 percent. Although few communities achieved this goal, the overall response rate did not continue its decline from previous censuses.
The 1990 census had also included advertising and outreach efforts; however, their extent was less than in 2000. The advertising was prepared by a firm selected by the Advertising Council, which conducted its work on a pro bono basis. Ads were placed as public service announcements, which meant that many ads ran in undesirable times. The partnership program was not as extensive as in 2000.
In both censuses, perhaps more so in 2000, advertising and outreach efforts varied in intensity across the country. Some localities were more active than others in coordinating and supplementing outreach and media contacts. Whether this variability narrowed or widened the difference in net undercount rates among major population groups depends on the extent to which outreach efforts were more (or less) effective in hard-to-count areas in comparison with other areas.
C.5 DATA PROCESSING
Data processing for the 2000 census was a continuing, high-volume series of operations that began with the capture of raw responses and ended with the production of voluminous data products for the user community, which were made available in 2001–2003.11 Important innovations were adopted for 2000. For the first time, the Census Bureau contracted with outside vendors for major components of data processing. Also for the first time, data capture operations were carried out using optical character recognition technology in addition to optical mark recognition. A telecommunications network linked Census Bureau headquarters in Suitland, Maryland; 12 permanent regional offices; the Bureau’s permanent
computer center in Bowie, Maryland; 12 regional census centers and the Puerto Rico Area Office; the Bureau’s permanent National Processing Center in Jeffersonville, Indiana; 3 contracted data capture centers in Phoenix, Arizona, Pomona, California, and Baltimore County, Maryland; 520 local census offices; and contracted telephone centers for questionnaire assistance (U.S. Census Bureau, 1999b:XI-1).
Five operations in 2000 are described in this section: data capture, coverage edit and telephone follow-up, unduplication, editing and imputation, and other data processing. Data processing operations for 1990 are also summarized.
C.5.a Data Capture
The first step in data processing was to check in the questionnaires and capture the data on them in computerized form. The return address on mailback questionnaires directed them to one of four data capture centers—the Bureau’s National Processing Center and three run by contractors. Each questionnaire had a bar code that was scanned to record its receipt. The questionnaires were then imaged electronically, check-box data items were read by optical mark recognition (OMR), and write-in character-based data items were read by optical character recognition (OCR). Clerks keyed data from images in cases when the automated technology could not make sense of the data. Keying of the additional long-form-sample items was deferred until fall 2000 to permit the fastest possible processing of the basic (complete-count) data from short and long forms.
C.5.b Coverage Edit and Telephone Follow-Up
The data on mailed-back questionnaires were reviewed by computer to identify those returns that failed coverage edit specifications. These failed-edit cases were reinterviewed by telephone, using contractor-provided clerical telephone staff. The workload for the coverage edit and telephone follow-up operation totaled about 2.3 million cases. It included returns that reported more, or fewer, household members in question one (“How many people were living or staying in this house, apartment, or mobile home on April 1, 2000?”) than the number of members for which individual informa-
tion (e.g., age, race, sex) was provided; returns in which question one was left blank and individual information was provided for exactly six people (the limit of the space provided on the mail questionnaires); and returns that reported household counts of seven people or more.
The purpose of the edit and telephone follow-up was to reduce coverage errors in the households selected for follow-up and to obtain basic characteristics for household members for whom the household had no room to report their characteristics on the form. No characteristics were obtained for missing responses for household members for whom only some characteristics were reported. There was no field follow-up for failed-edit households for which telephone follow-up was unsuccessful. Because of computer problems, the start of the coverage edit and telephone follow-up operation was delayed. Originally planned to be conducted in April–June 2000, it was carried out in May through mid-August.
C.5.c Unduplication of Households and People
Two major, computer-based unduplication operations were carried out subsequent to field follow-up. One of those operations, the use of the primary selection algorithm (PSA) to unduplicate multiple returns for the same address, was planned from the outset and is described below. The other operation, the use of special software and procedures to reduce duplication of addresses in the MAF, was designed and implemented in summer 2000 to respond to evidence of duplicate addresses not eliminated by previous processing (described in Section C.1.d). The special unduplication operation used the results of the PSA; final determination of which returns to delete from the census because they duplicated a return from another MAF address was not made until after the PSA had processed multiple returns for the same address.
The purpose of the PSA was to identify unique households and people to include in the census when more than one questionnaire was returned with the same census address identification number. Such duplication could occur in a number of ways: when a respondent mailed back a census form after the cutoff date for determining the NRFU workload and the enumerator then obtained a second form from the household (or perhaps identified the household as
vacant); when someone was enumerated in a group quarters but provided another “usual” address to which his or her information was assigned; or when a respondent filled out a Be Counted form, thinking that he or she had been missed, but another member of the household also mailed back a questionnaire for the household (which might or might not contain information for the individual).
For each housing unit, returns with one or more persons in common were combined to form a single PSA household, retaining only one response for each household member who was reported on more than one return, as well as the responses for household members who were reported on only one return. All vacant returns for a housing unit were also combined to form a PSA household. In some cases more than one PSA household might exist for a unit. For each PSA household, the algorithm selected which return best represented the Census Day household (“basic” return) and which people from the other returns were part of that household.12
In all, 9 percent of census housing units had two returns that were eligible for the PSA operation, and 0.4 percent had three or more eligible returns. (Extra returns for an address that had no useful information were not included in the operation.) In most instances, the operation of the PSA discarded duplicate household returns or extra vacant returns. Less often, the PSA found additional people to assign to a basic return or identified more than one household at an address (see Baumgardner et al., 2001:22–27).
C.5.d Editing and Imputation
Editing and imputation were carried out for all data-captured questionnaires that were retained in the census after the PSA operation. The editing and imputation process included whole-household imputation, called substitution, when there was minimal or no information for the housing unit; editing content items for consistency and to fill in (assign) values for missing items on the basis of related items (e.g., to calculate age when only date of birth was provided); and imputation of content items using values reported for another person or household, called allocation, when values were missing for one or more items. See Box 4.2 in Chapter 4 for types of whole-household imputation; Table 4.1 for whole-household imputation
rates by type in the 1980–2000 censuses; and Chapter 7 for basic (complete-count) and long-form-sample item imputation rates.
All editing and imputation were computer based; there was no clerical review or editing of any items as in past censuses. When it was not possible to perform an edit that used other information for the same person or housing unit, imputation was performed using hot-deck methods that made use of information for other people and households, generally in the immediate neighborhood. First used in processing the 1960 census, the Census Bureau’s computerized hot-deck procedures have been refined and elaborated. The donor pool is geographically restricted to take advantage of common characteristics among small-area populations (see Appendix G; see also Citro, 2000b).
C.5.e Other Data Processing
A number of other data processing steps were carried out to generate data files and publications from the 2000 census records. Such steps for the complete-count records included tabulating the data on various dimensions and modifying the data appropriately on files that were to be released to the public in order to protect the confidentiality of individual responses. For the long-form-sample records, there were the added steps of coding such variables as occupation and industry and weighting the records to complete-count control totals on several dimensions.
C.5.f Comparison: 1990 Data Processing
The 1990 census data processing system was more decentralized than in 2000 and made more use of clerical review (see Bureau of the Census, 1995b:Ch.8; National Research Council, 1995b:App.B). There were seven processing offices and 559 district offices. Mailback questionnaires in district offices in hard-to-enumerate areas in central cities went directly to a processing office for check-in by scanning bar codes, data capture by using the Census Bureau’s Film Optical Sensing Device for Input to Computers (FOSDIC), and computerized review to identify cases that failed to meet the edit specifications for completeness of coverage and content. Failed-edit cases went to telephone follow-up, and those cases that could not be
contacted were sent to a district office for field follow-up. However, backlogs in the telephone follow-up operation necessitated curtailment of field follow-up for cases that could not be contacted by telephone, and, for cost reasons, only a 10 percent sample of mailed-back short forms that failed the content review (and not also the coverage review) were sent for telephone follow-up. Enumerator returns for central city offices were checked-in at the district office and then sent to the processing office for data capture, computerized review of coverage and content, and telephone follow-up as needed. Enumerator returns were not eligible for field follow-up. Once any further data had been received from follow-up, computerized editing, whole-household imputation, and item imputation routines were used to fill in remaining missing or inconsistent data.
Mailback and enumerator returns in other district offices went first to the district office for check-in, clerical review of coverage and content, telephone follow-up as needed, and field follow-up of failed-edit mail returns for which telephone follow-up was unsuccessful. After completion of follow-up, the questionnaires were sent to the processing offices for data capture and computerized editing and imputation.
Another step in data processing included the search/match operation, in which forms received from various activities were checked against microfilm images of questionnaires for the same address to determine which people should be added to the household roster and which were duplicates. This operation was carried out for “Were You Counted” forms, for parolee/probationer forms, and for people who sent in a questionnaire from one location with an indication that their usual home was elsewhere. Such people might have two homes, such as people who spend the winter in a southern state and the summer in a northern state. There was no way on the 2000 form to indicate usual home elsewhere.
The FOSDIC technology used for data capture was originally developed by Census Bureau staff for the 1960 census and reengineered and enhanced in 1970, 1980, and 1990 (see Salvo, 2000). It involved two main stages: microfilming the questionnaires and using the FOSDIC equipment to scan the microfilm and read the filled-in answer circles for each item and output responses to a computer file (the answer dots showed up as light images on a dark background). In 1990, FOSDIC processed over 130 million questionnaires; about
900,000 forms had to be “repaired” by clerks and remicrofilmed before they could be read (e.g., because the forms were torn or folded improperly and so were out of alignment for scanning). The FOSDIC equipment could read answer dots and sense the presence of write-in entries but not capture such entries directly. Write-in responses were keyed by clerks using the paper questionnaires for long-form-sample items and a microfilm access device for keying of write-in responses to the race question. After keying, the write-in responses were coded by a combination of computer and clerical review.