Modernizing Geographic Resources
A DECENNIAL CENSUS IS FUNDAMENTALLY an exercise in geography. The root constitutional mandate of the census explicitly links it to the nation’s electoral geography, as the census serves as the basis for shifting states’ representation in the U.S. House of Representatives every 10 years to match population shifts over the decade. Each new decennial census also offers new perspectives on the nation’s civic geography, providing rich information on how and where the American public lives and how the characteristics of small geographic areas and population groups have changed with time. In order to produce this information, the Census Bureau requires a great deal of accurate, raw geographic data—a full and complete address list and a mechanism by which those addresses can be associated with specific locations. Without this raw information, it would be impossible for the census to achieve its goal of counting each resident once and only once and within a precise geographic boundary. As the panel stated in its first interim report, “the address list may be the most important factor in determining the overall accuracy of a decennial census” (National Research Council, 2000a:35).
The “three-legged stool” strategy outlined by the Census Bureau for the 2010 census calls for modernization of the Bureau’s primary geographic resources:
the Master Address File (MAF), the source of addresses not only for the decennial census but also for the Census Bureau’s numerous survey programs; and
the Topologically Integrated Geographic Encoding and Referencing System (TIGER), a database describing the myriad geographic boundaries that partition the United States.
The specific set of activities that the Census Bureau has described to achieve this modernization is known as the MAF/TIGER Enhancements Program (MTEP), an “8-year, roughly $500 million undertaking” (U.S. Department of Commerce, Office of Inspector General, 2003).
Given its nominal goal, the MTEP may be of paramount importance in terms of its potential impact on the quality of the 2010 census. However, the critical word in that statement is “nominal” since the term “MAF/TIGER Enhancements Program” suggests significant enhancements to both the MAF and TIGER. We do not argue that TIGER is unimportant; it is a critical geographic resource for census needs and it is in dire need of modernization. However, the MAF/TIGER Enhancements Program is oriented overwhelmingly toward TIGER and does little to enhance—to improve—the MAF. The Census Bureau’s strategy for dealing with the MAF is, to an unfortunate degree, little more than routine maintenance—seemingly deferring active attention to the MAF until a complete block canvass very late in the census cycle (thus repeating a costly operation from 2000 that had been implemented as an eleventh-hour fix). The panel’s unease regarding the Bureau’s prospects for making material progress in improving its geographic resources for 2010 is further heightened by the apparent lack of comprehensive and realistic plans and schedules for the TIGER modernization effort.
In this chapter, we briefly review the development of both the MAF and TIGER (Section 3-A) before discussing the details of the MAF/TIGER Enhancements Program (3–B). Our general assessment of the program follows (3–C), with our particular
call for attention to MAF improvement discussed separately in Section 3-D. Our recommendations—including designation of a MAF coordinator, strengthened geographic partnerships, and empirical justification of potential address sources—are detailed in Section 3-E.
3–A DEVELOPMENT AND CURRENT STATE OF THE MAF AND TIGER
Before we discuss the specific enhancements program that has been initiated by the Census Bureau, it is useful to first briefly review the nature and status of the two geographic systems addressed by the package.
3–A.1 The Master Address File
Purpose and Scope
The Master Address File (MAF) is the Census Bureau’s complete inventory of known living quarters and business addresses in the United States and its island areas. The MAF contains a mailing address for each of those living quarters, if one exists. For housing units or living quarters without mail addresses, descriptive addresses (e.g., “2-story colonial with dormer windows”) may be coded.
The MAF also includes an intricate set of flags and indicators that denote the operations that added or edited each address. It does not, however, record the date or time when an address was entered in the file or when it was modified. In principle, the MAF is a constantly evolving and continually updated resource; the “snapshot” of the MAF that is extracted and used to conduct the census is called the Decennial Master Address File, or DMAF.
Construction of the 2000 Census Master Address File
The concept of a continuously maintained MAF is a relatively new one; in the 1990 and earlier censuses, address lists were compiled from multiple sources prior to the census (e.g., lists were purchased from commercial vendors) and were not retained after the census was complete. The practice of maintain-
ing the address list—to support not only the decennial census but also the Census Bureau’s other survey programs—was initiated after the 1990 census. In part, writes Nash (2000:1), “a major impetus for this change was the undercounts experienced in the 1990 and earlier decennial censuses, nearly a third of which was attributed to entirely missing housing units.” An initial MAF was constructed using the city-style addresses1 on the Address Control File (ACF) developed for the 1990 census (Hirschfeld, 2000).
To populate the MAF, the Census Bureau “devised a strategy of redundancy using a variety of sources for addresses,” thus “[assuming] responsibility for developing a comprehensive, unduplicated file of addresses” (Nash, 2000:1). Most prominent of the update sources were two that were endorsed by one of our predecessor Committee on National Statistics (CNSTAT) panels on the decennial census (National Research Council, 1995:5), which recommended that the Census Bureau “develop cooperative arrangements with states and local governments to develop an improved master address file” and that the U.S. Postal Service be given “an expanded role” in census address list operations. Both these recommendations were significant in that they required legislative authority in order to operate within the prohibition on release of confidential data codified in U.S. Code Title 13, the legal authority for census operations.2 Congress granted this authority in the Census Address List Improvement Act of 1994 (Public Law 103-430).
The Delivery Sequence File One provision of the Census Address List Improvement Act authorized the Census Bureau to enter into a data-sharing arrangement with the U.S. Postal Service,
under which the Postal Service would regularly share its Delivery Sequence File (DSF) with the Census Bureau.3 The DSF is the Postal Service’s master list of all delivery addresses served by postal carriers.4 The name of the file derives from the Postal Service-specific data coded for each record along with a standardized address and ZIP code: namely, codes that indicate how the address is served by mail delivery (e.g., carrier route and the sequential order in which the address is serviced on that route). The DSF record for a particular address also includes a code for delivery type that is meant to indicate whether the address is business or residential.
Because the census is conducted largely through mailed questionnaires—most of which are subsequently mailed back—the U.S. Postal Service is a crucially important conduit in the census process. Moreover, the Postal Service is a constant presence in the field, servicing existing and emerging routes on a daily basis. For these reasons, securing access to the DSF was a major accomplishment. But while the DSF is an undoubtedly vital source of address information, it is incomplete for census purposes both because the list of mail delivery addresses is only a subset of the complete list of housing units in the United States and because it does not always properly distinguish multiple housing units within the same structure.
The Postal Service began sharing the DSF with the Census Bureau in the mid-1990s. Currently, as part of the Bureau’s ongoing Geographic Base Support Program, new versions of the
Specifically, the legislation text indicates that “the Postal Service shall provide to the Secretary of Commerce for use by the Bureau of the Census such address information, address-related information, and point of postal delivery information, including postal delivery codes, as may be determined by the Secretary to be appropriate for any census or survey being conducted by the Bureau of the Census. The provision of such information under this subsection shall be in accordance with such mutually agreeable terms and conditions, including reimbursability, as the Postal Service and the Secretary of Commerce shall deem appropriate.”
The list does not include general delivery addresses. Additional information on the DSF and commercial programs under which private companies are able to match their own address lists against the DSF can be found on the U.S. Postal Service Web site at http://www.usps.com/ncsc/addressservices/addressqualityservices/deliverysequence.htm [3/1/04].
DSF are shared with the Bureau twice per year and updates or “refreshes” to the MAF are made at those times.
Local Update of Census Addresses The Census Address List Improvement Act of 1994 also authorized the secretary of commerce and the Census Bureau to “provide officials who are designated as census liaisons by a local unit of general purpose government with access to census address information for the purpose of verifying the accuracy of the address information of the bureau for census and survey purposes.” The act obligated the Census Bureau to “respond to each recommendation made by a census liaison concerning the accuracy of address information, including the determination (and reasons therefor) of the bureau regarding each such recommendation.” The act thus permitted the Census Bureau to share with a local or tribal government for review and update the address data it had on file for that locality.
To preserve Title 13 confidentiality, the information to be disclosed to any particular locality was limited to address information and to the set of addresses for that area. Ultimately, the address information would be shared with local or tribal governments only if they signed an agreement to keep it confidential and to dispose of it when finished with review.
In August 1996, the Census Bureau initiated a program to acquire address list information from local governments. The Program for Address List Supplementation (PALS) contacted local and tribal governments (along with regional planning agencies) and solicited whatever lists of city-style addresses they maintained for their jurisdictions. However, the Bureau quickly concluded that the program was troubled: local address lists were not necessarily in computer-readable format, or were not formatted in such a way (including apartment and unit designators) as to match with the emerging coding system for the MAF. More significantly, response by local governments to an open-ended query for local address lists—ideally coded to the appropriate census block—was low. The program was officially terminated in September 1997 (U.S. Census Bureau, Geography Division, 1999).
The Census Bureau’s next attempt at local geographic part-
nerships followed more closely the Address List Improvement Act by releasing parts of the Census Bureau’s MAF for review rather than requesting entire address lists. The resulting program became known as the Local Update of Census Addresses (LUCA), though it is also occasionally referred to as the Address List Review Program. LUCA was conducted in two waves:
LUCA 98. In 1998, local and tribal governments in areas with predominantly city-style addresses were given the opportunity to review the Census Bureau’s address list. Bureau cartographers used blue lines to distinguish city-style from non-city-style address areas on the maps that defined eligibility for LUCA. As a result, LUCA 98 was said to target localities lying “inside the blue line.”
LUCA 99. In 1999, attention turned to areas outside the “blue line,” those with non-city-style addresses.5 Local and tribal governments were again invited to review Census Bureau materials, but this time the offer was to review block-level counts of housing units rather than actual addresses.
To participate in LUCA, local and tribal governments were required to identify liaisons who would handle the address list materials and take an oath of confidentiality. Materials were then sent to the governments, which had a specified time period to review them and submit any proposed changes. These changes were then reviewed by the Census Bureau, which often opted to reject part or all of the localities’ suggested additions or deletions to the address list. An appeals process was set up under the auspices of the Office of Management and Budget (OMB), giving local and tribal governments a final opportunity if they found grounds to quarrel with the Census Bureau’s judgments.
The Working Group on LUCA commissioned jointly by this panel and the Panel to Review the 2000 Census conducted an ex-
Box 3.1 Results of LUCA Working Group Study
The Working Group on LUCA commissioned jointly by this panel and the Panel to Review the 2000 Census was composed of state and local government personnel who had been involved in their area’s participation in the Local Update of Census Addresses (LUCA) program. The working group conducted a sample survey of LUCA-participant governments, inquiring about the techniques and resources they employed in order to complete a review of their local MAF segment. The working group report also provides detailed case study reports of LUCA participation, ranging in scope from rural communties to efforts at the state level to coordinate localities’ participation in LUCA. The working group also analyzed available data on local and tribal government participation, including the numbers of addresses submitted by governments and accepted or rejected by the Bureau. However, available data did not allow for assessment of the number of completed census enumerations obtained using addresses added uniquely or in part by LUCA. The working group issued its final report in 2001.
The working group’s analysis (Working Group on LUCA, 2001) led it to identify three principal barriers to effective local government particpation in LUCA:
The timing of the LUCA program leading to the 2000 census was also a concern to participants. Even large local governments with complete local geographic information files found it difficult to meet the turnaround time required for submission of addresses to the Census Bureau. The problem may have been compounded for local governments with less-developed geographic resources and in cases where manual review of address lists was the best or only available option; indeed, tight timelines combined with the requisite investment of resources may have dissuaded some governments from participation.
The working group found some evidence of increased participation and local cooperation in cases where state, regional, or county organizations worked to coordinate responses by multiple governments, sometimes providing a valuable “LUCA education” function. Improved training and guidance on the expectations of the program were identified as possible factors for increasing partipation in a LUCA-style program for the 2010 census.
tensive review of the LUCA process from the participants’ (local government) perspective (Working Group on LUCA, 2001). The working group’s principal findings are summarized in Box 3.1.
Block Canvass In the 1990 and earlier censuses, when address lists were not maintained from census to census but rather assembled before the decennial enumeration, a complete field canvass of the city-style addresses in designated mailout/mailback areas was a standard—but costly—operation. The Census Bureau had hoped to avoid a complete block canvass before the 2000 census; in introducing the Address List Improvement Act of 1994, U.S. Representative Thomas Sawyer expressed hope that “collection and verification of address information in primarily electronic format” from the Postal Service and local governments “will greatly reduce the amount of precensus field canvassing,” an activity that he indicated had proven “expensive and often inaccurate.”6 Rather than a complete block canvass, the Census Bureau planned to target specific areas with coverage gaps and focus field canvass activities on those areas.
In spring and summer 1997, as a continuous MAF began to take shape, optimism about the completeness of DSF updates gave way to doubts when it also became clear that PALS was not proving an effective means to obtain address information from local and tribal governments. Internal evaluations convinced the Bureau that relying on DSF and LUCA alone could leave gaps in MAF coverage; in particular, the Bureau was concerned that “the DSF file missed too many addresses for new construction and
was not updated at the same rate across all areas of the country” (National Research Council, 1999:39).
Accordingly, the Census Bureau opted to change course and conduct a full canvass of addresses in mailout/mailback areas “in a manner similar to the traditional, blanket canvassing operations used in prior censuses.” The Bureau noted that the change would incur a large expense, but, recognizing the Bureau’s concerns, a previous CNSTAT panel “strongly endorse[d] this change in plans” (National Research Council, 1999:25,39).
Plans for the complete block canvass overlapped with the emerging plans for the LUCA program. The Bureau originally planned for LUCA 98 to obtain feedback in early 1998, so that resulting changes to the MAF would be ready for the block canvass in late 1999. However, delivery of MAF segments to most participating LUCA 98 localities was delayed. This led to a revised plan that LUCA 98 changes would be compared to the MAF after block canvassing was complete. Further delays led to abandonment of a reconciliation operation in which discrepancies between LUCA and block canvass observations would have been reviewed with localities; instead, localities received a list of accepted and rejected addresses in LUCA’s “final determination” phase and were given 30 days to submit appeals to OMB’s address list appeals office (Working Group on LUCA, 2001).
3–A.2 The TIGER Database
Purpose and Scope
The TIGER database is, effectively, a cartographic resource that defines a complete digital map of the United States and its territories. It is intended to capture not only visible features—the centerlines of streets, rivers, and railroads, and the outlines of lakes, for instance—but the myriad political and administrative boundaries that may not correspond exactly with visible physical locales. Accordingly, the TIGER database includes the political geography of 3,232 counties or county-level equivalents, more than 30,000 county subdivisions or minor civil divisions, and more than 20,000 named places, among other political units.
Of the many geography types defined by the TIGER database, the most important are the boundaries of census blocks. Census blocks are the smallest unit of geography for which basic population data are tabulated in the census, and these block-level data are aggregated to form political and other administrative boundaries. TIGER’s primary function in census operations is geocoding, the matching of a given address or location to the census block in which it lies. Once a location has been matched to the correct census block, its location in higher-level geographic aggregates constructed from blocks is also known, and so census returns may be properly tabulated by geographic unit.
In addition to the geocoding function, the Census Bureau has relied on TIGER for three other major uses (O’Grady and Godwin, 2000; U.S. Census Bureau, 2001b):
geographic structure and relational analysis: the definition of how one geographic area relates to another, which is important for being able to aggregate small units like blocks into coherent higher-level geographic entities;
geographic definitions: a repository for the current definitions of geography levels recognized by the Bureau; and
map production: the basis for printed maps used by census enumerators, and other geographic products.
The Census Bureau’s full TIGER database contains both point and line features; in particular, points define the location of known housing units in areas without city-style addresses. However, most public exposure to TIGER comes via TIGER/Line files, a public excerpt of the TIGER database that contains only linear features such as roads, rails, and political boundaries (not specific housing unit locations). The TIGER/Line files, which contain complete street coverages with address ranges, helped facilitate the emergence and growth of the geographic information systems (GIS) industry.
The TIGER database is one part of a larger TIGER system, which includes the support structure of hardware and software necessary for maintaining the database. TIGER was initially created using a unique, home-grown language developed by the Census Bureau, and various software programs to update the
database and to produce maps were similarly written to accommodate this customized internal language. As we will discuss, the proposed MAF/TIGER enhancements make changes in both the database and system senses, improving the content of the database as well as overhauling its support machinery.
How the TIGER Database Began
The TIGER database was developed by the Census Bureau, with assistance from the U.S. Geological Survey (USGS), to support the 1990 census. “TIGER began life as a patchwork quilt of data sources” (O’Grady and Godwin, 2000:6), two of which were primary. One of these sources was the Geographic Base File/Dual Independent Map Encoding (GBF/DIME) files used by the Census Bureau to do address matching to street segments in the 1980 census. The GBF/DIME files foreshadowed TIGER in that they applied topological principles in piecing together points, lines, and polygons (Hirschfeld, 2000); they also began the move toward including more than streets and roads in census maps, adding features such as water, rail, and invisible boundaries. But the files were limited in scope, covering only the urban centers of 276 metropolitan areas—“less than 2 percent of the land area but 60 percent of the people in the United States” (Carbaugh and Marx, 1990). To complete the geographic coverage of the nation, the address reference information in the GBF/DIME files was merged with computer-coded versions of the water and transportation features defined by the USGS series of 1:100,000-scale topographic maps (Marx, 1986).
As O’Grady and Godwin (2000) note, “accuracy was crucial” when TIGER was first assembled “but only in a relational sense.” “The coordinate information presented in the TIGER/Line files is provided for statistical analysis purposes only,” wrote Carbaugh and Marx (1990); “it is only a graphic representation of ground truth.” Put another way, the priority in early TIGER was to achieve basic functionality for census purposes, which meant favoring relational accuracy (describing how geographic features relate to each other, such as whether census blocks are adjacent) over positional or locational accuracy (precise location of geographic features relative to a chosen standard). O’Grady and
Godwin (2000:5–6) recall that the Census Bureau drew on properties of the USGS maps in publishing the following positional accuracy statement in the documentation for TIGER/Line files released in 1995:
The positional accuracy varies with the source materials used, but at best meets the established National Map Accuracy standards (approximately ± 167 feet) where 1:100,000-scale maps from the USGS are the source. The Census Bureau cannot specify the accuracy of feature updates added by its field staff or of features derived from the GBF/DIME-Files or other map sources. Thus, the level of positional accuracy in the 1995 TIGER/Line files is not suitable for high-precision measurement applications such as engineering problems, property transfers, or other uses that might require highly accurate measurements of the [Earth’s] surface.
The overall positional accuracy of early TIGER was also limited by shortcomings in the GBF/DIME files, which were also oriented toward relational accuracy. In particular, Census Bureau enumerators and staff later found that “hydrographic features are not represented well” in TIGER database segments derived from the GBF/DIME files (Rosenson, 2001:1).
Updates to TIGER
During the 1990s, the TIGER database was updated using additional sources, each with unique (and often unknown) levels of positional accuracy. Among those sources are the following programs that are likely to continue during and after the MAF/TIGER Enhancements Program, although exactly how and when the resulting information will be incorporated—and how the programs might be restructured—is as yet unspecified:
Boundary and Annexation Survey (BAS): an ongoing voluntary survey in which TIGER-generated boundary maps are sent to local and tribal governments for review and update.
MAF Geocoding Office Resolution (MAFGOR): aprogram in which city-style address records from the Postal Service Delivery Sequence File (DSF) that cannot be geocoded in TIGER are referred to census regional offices for review.
Targeted Map Update (TMU): a regular program in which census field staff update address ranges, add new streets, and update feature names in selected areas.
Digital Exchange (DEX): a system that draws on local and tribal geographic database files.
Of these, the DEX system (Rosenson, 2001) developed in the late 1990s is of particular interest as improvements to its capabilities will be a major part of TIGER realignment. DEX does not directly manipulate local and tribal geographic files but rather a processed extract known as an “exchange file.” The system is strictly limited to working with road features and the attributes associated with them, including ZIP codes. The exchange file derived from a local geographic file is a street centerline database coded in TIGER format. This exchange file is then matched to the TIGER file based on both spatial location and attribute information (e.g., street name), beginning with matches on the intersection points between named road features in each file.
After matching, one of the files is “rubber-sheeted”—meaning that its features are adjusted to better match attributes in the other file, with neighboring attributes being adjusted simultaneously, as necessary. As Rosenson (2001) notes, this “rubber-sheeting” can be done to either file but, at least in early DEX implementation, the process could introduce topological errors such as lines that cross each other without a system-defined point marking their intersection. Thus, in order to preserve TIGER’s topological structure, DEX manipulates the local “exchange file” to match certain TIGER features.
Though some DEX capability had been developed and selected local geographic files were obtained prior to the 2000 census, active TIGER updating using DEX was deferred during the actual conduct of the 2000 census.
The Need to Modernize
The development of TIGER is a milestone of which the Census Bureau should be extremely proud. A homegrown database management system constructed to manipulate an enormously complex network of visible and invisible boundaries, TIGER
became an exemplar of what a GIS can do. The example of TIGER—and, significantly, the public availability of TIGER/Line files, a full and fine-scale public atlas of the United States—touched off a commercial GIS revolution. Businesses and organizations of all sizes are continuing to learn the power of spatial data analysis, and the work of TIGER to bring together and make publicly available base geographic layers helped make that possible. TIGER successfully satisfied the operational demands of two decennial censuses. The coding system may be (in computer years) old and the structures arcane, but it is a rare in-house software product that can successfully cope with a production cycle of billions of printed maps and millions of addresses for geocoding as TIGER did in the 1990 and 2000 censuses.
But, as is the case with some revolutions, the first entrant ushers in tremendous change and then is unable to keep pace with the new world thus created; so it is with TIGER. Though the text-based TIGER/Line files are parsable by commercial GIS applications, the native TIGER database structure is not compatible with modern database tools. As a result, it has not been possible to directly update TIGER’s street coverages using the GIS files updated and maintained by local and tribal governments. The Census Bureau’s unique role in delineating census blocks—the base units that are aggregated to form most political districts—and ongoing programs such as the Boundary and Annexation Survey (BAS) give the Census Bureau advantages in defining the invisible political boundaries that cross-cut the nation. But commercial GIS has made it possible for external companies and local and tribal governments to build on the TIGER/Line base, realigning features when errors are found and making updates to street, rail, water, and other features to a degree that Census Bureau resources have not permitted.
3–B THE MAF/TIGER ENHANCEMENTS PROGRAM
The Bureau has set forth five objectives as essential steps in a comprehensive MAF/TIGER modernization:
improve address/street location accuracy and implement automated change detection;
implement a modern processing environment;
expand and encourage geographic partnership options;
launch the Community Address Updating System (CAUS), which has also been known as the American Community Survey Coverage Program; and
implement periodic evaluation activities and expand quality metrics.
They are spelled out with subtasks in the following sections.
3–B.1 Objective One: Address/Street Location Accuracy
Objective One—the actual realignment of TIGER geographic features—is the centerpiece of the MTEP, enough so that it has acquired an acronym of its own. The contract to carry out Objective One—also known as the MAF/TIGER Accuracy Improvement Project (MTAIP)—was awarded to the Harris Corporation of Melbourne, Florida, in June 2002.
As described in documentation provided to the panel, the basic subtasks envisioned under Objective One are as follows:
correct (in TIGER) the location of every street and other map feature used by field staff and governmental partners for orientation, as well as the location of every boundary used for tabulation of decennial census and household survey data;
correct (in the MAF) the location of every housing unit and group quarters from which the decennial census and the household surveys collect data; and
implement an effective change detection methodology to document the location of every new street and living quarters, along with the street name and address for each.
Means of Updating Accuracy
As it has been explained to the panel, the basic idea of Objective One is to perform a single, extensive update of TIGER for each county based on an external source with, presumably, more
current and accurately positioned feature information. These outside sources may include GIS files developed and maintained by local or tribal governments, commercial GIS files, or digital orthophotography/aerial photography. Once the TIGER data for a county are realigned, they can be continually updated through change detection—for instance, features may be added as a result of comparison of TIGER to newer aerial photographs of a region. Through this strategy—extensive initial realignment, followed by change detection—the Census Bureau hopes to maintain TIGER so that its features are current to within one year.
This general framework provides great flexibility for the Census Bureau and its contractor to implement the TIGER update; at present, however, to the extent that plans have been shared with the panel, this flexibility translates into little specificity.
The Census Bureau has established a cartographic accuracy standard for the realigned TIGER database: 7.6 meters CE95, meaning that, for a sample of control points measured on the ground and the corresponding locations in the geographic database, at least 95 percent of the database-recorded points should lie within a 7.6-meter radius of the corresponding ground-recorded points. According to the Census Bureau’s presentation to another National Research Council committee, the 7.6 meters CE95 resolution was chosen because it is the minimum required accuracy “to support use of GPS equipped handheld computers to achieve 99.6% geocoding accuracy for tabulations”; it is also said to be “based on accuracy of enumerator’s GPS-equipped hand-held computer and relationship of enumerator to street centerline” (LaMacchia, 2003).
The Census Bureau informed the panel in September 2003 that it expects 1,200 of 3,232 total counties to be able to provide local files meeting this accuracy standard (Jackson, 2003:2). The request for proposals (RFP) issued to solicit contractor bids to perform Objective One indicates the Census Bureau’s strong preference to use local or tribal government GIS files as the update source whenever possible.7 But, based on the information
The RFP and other documents related to Objective One, the MAF/TIGER Accuracy Improvement project, are archived at http://www.census.gov/geo/mod/maftiger.html [3/1/04].
known to the panel, no approach has been specified for the balance of counties for which local GIS files are not available or do not meet the Bureau’s accuracy standard. The Census Bureau has conducted experiments using subcontractors to perform updates based on digital orthophotographs and other image sources. Other potential means of collecting the geographic information include buying commercially available GIS files or using field staff to collect GPS trace data while driving or walking streets. It is as yet unclear which of these (or other) mechanisms the Census Bureau and the Harris Corporation will favor in the absence of local files (or when local files are of insufficient quality) to perform the initial, global realignment. In the omnibus appropriations bill for fiscal 2004, House and Senate appropriators “[direct] the Secretary of Commerce to take all necessary measures to reduce the payment for information currently available from certain governments” and “to utilize global positioning system technology and aerial photography to update existing information only if these measures are shown to be cost effective” (H. Rept. 108-141, citing H. Rept. 108-221).
As it is unclear what exact source will be used for the initial realignment in particular counties, it is even less clear what source will be used to update TIGER files in the change detection process, and with what frequency this will be done.
Franz (2002) described the following priority structure that the Census Bureau has identified for carrying out Objective One realignment, with the first being the top priority:
linear feature realignment across all areas;
establishing/correcting structure locations in areas outside the 2000 census mailout/mailback area;
establishing/correcting structure locations inside the 2000 census mailout/mailback area; and
establishing/correcting locations for residential structures over nonresidential structures, in carrying out the previous two steps.
Under plans developed in 2002, the Census Bureau and the Harris Corporation are supposed to realign counties on the following timetable: 250 in fiscal year 2003; 600 in 2004; 700 in 2005;8 700 in 2006; 600 in 2007; and 382 in 2008. In principle, change detection to make further alterations is supposed to begin when counties are complete, so that 250 counties are slated for change detection in fiscal year 2004, 850 in fiscal 2005, and so forth, until all counties are handled using change detection methods in 2009.
3–B.2 Objective Two: Modern Processing Environment
Objective Two of the Enhancements Program targets TIGER in the systems sense, modernizing the structure of the database. The current homegrown TIGER system suffers from key limitations, prominent among them the inability to directly link with commercial GIS packages (and hence local and tribal GIS files maintained using those packages) and the limitation that only one module (county) of TIGER may be “checked out” for updating at a time. Changes to the database structure also require that the suite of support software used to generate products from TIGER—for instance, to print maps for field enumeration—must be reauthored and tested.
The Census Bureau’s stated subtasks for Objective Two are as follows:
make maximum possible use of commercial off-the-shelf (COTS) and geographic information systems (GIS) tools to allow for rapid development of new applications; and
customize the COTS/GIS tools to the minimum extent possible to avoid schedule and cost obstacles when the COTS/GIS vendors deploy new versions of their software.
Under original timelines specified by the Census Bureau, fiscal 2003 was to be the peak year of Objective Two work, with
some slight drop-off in fiscal 2004. Residual effort was expected in fiscal 2005 and 2006, with Objective Two not listed as an activity in 2007 or later years.
We discuss the tasks to be accomplished under Objective Two in greater detail in Section 6-C.
3–B.3 Objective Three: Geographic Partnerships
Objective Three acknowledges the crucial role of state, local, and tribal governments in maintaining geographic resources, not only for the TIGER realignment of Objective One but for continued update of the MAF, as in the LUCA program.
Subtasks of Objective Three identified by the Census Bureau are as follows:
devise and deploy new strategies to communicate more effectively with governments to increase the level at which they participate in MAF/TIGER review and update activities;
devise and deploy new ways to integrate more effectively the address list review, street update, and boundary reporting activities that now exist as separate programs; and
establish new partnerships with other federal agencies and private-sector firms that have GIS and address files with information of value to an accurate and complete MAF/TIGER.
Under original timelines shared with the panel, fiscal 2004 was scheduled to be the peak year of Objective Three work. The levels of effort expected on this objective in each of the years 2003 and 2005 through 2010 are to be roughly equivalent.
3–B.4 Objective Four: Community Address Updating System
Briefly known as the ACS Coverage Program, the Community Address Updating System (CAUS) is the address list update component of the proposed American Community Survey (ACS). The basic idea of the program is to make use of the continued field presence that would be necessary to conduct the ACS, allowing ACS enumerators the opportunity to provide geographic
updates. One hope is that the ACS enumerators might be particularly helpful in identifying geographic and housing changes in rural areas, where local and tribal files might be less detailed (or unavailable).
The Census Bureau has identified the following subtasks for Objective Four:
focus predominantly on rural areas, in which the Census Bureau has concluded that the U.S. Postal Service’s Delivery Sequence File (DSF) does not effectively identify the existence or location of new housing units; and
provide address list (and street) updates beyond what can be identified through the current twice-yearly DSF “refresh” process to ensure a uniformly accurate sampling frame nationwide for the ACS and the other household surveys.
Through contractors, the Census Bureau has developed prototype Automated Listing and Mapping Instrument (ALMI) software, making use of a GPS receiver and a laptop computer. The ALMI system could permit ACS enumerators who encounter a new street that is undefined in TIGER to record a GPS trace as they drive along the street and to note the location of houses along that street; these inputs could later be converted to TIGER.
The anticipated level of effort that the Census Bureau expects to expend on Objective Four is roughly equivalent during each of the fiscal years 2003–2010.
3–B.5 Objective Five: Evaluation and Quality Metrics
Finally, Objective Five concerns the assessment of progress and quality; subtasks identified by the Census Bureau for this Objective include the following:
provide quality metrics information that will guide (target) areas in need of corrective action beyond the changes identified in the change detection and CAUS activities;
document progress toward improving the accuracy and completeness of the street, address, and boundary information in MAF/TIGER; and
ensure the availability of accurate and comprehensive metadata that meet federal standards for the information in MAF/TIGER.
The anticipated level of effort that the Census Bureau expects to expend on Objective Five is roughly equivalent during each of the fiscal years 2003–2010.
3–B.6 Update on Enhancements Program Progress
The Census Bureau’s goal for fiscal 2003 was to complete Objective One TIGER realignment for 250 counties. At the panel’s September 2003 meeting, the Census Bureau reported that it was set to meet that goal, with 244 counties already completed. Only 60 of those completed were realigned by the Harris Corporation, which holds the Objective One contract and is responsible for realigning TIGER data for 600 counties in fiscal 2004; the remainder represent work from other contractors on earlier pilot projects. Eight of the 60 files were said to have been returned to Harris for “rework” because of unspecified problems (Jackson, 2003:3).
As of September 2003, the Census Bureau had collected 1,038 GIS files from local and tribal governments and was testing them to see whether they met the Bureau-imposed 7.6 meters CE95 accuracy standard. In September, the Bureau reported that it had collected ground control points with GPS receivers for 777 of the files; results showed an equal divide, with 390 meeting or exceeding the 7.6-meter standard and 387 failing (Jackson, 2003:3). In October, the Bureau submitted an update to the panel, now stating that 826 files had been tested, with 461 of these meeting the standard and 365 failing (U.S. Census Bureau, 2003e:1). Noting that many of the 365 subuniform-standard files nonetheless appeared to be more positionally accurate “in the densely settled extent of their coverage,” the Bureau and the Harris Corporation are said to be developing “a method for utilizing the accurate sub-extent of local GIS files (with Harris supplying and utilizing an accurate source for the balance area) by the end of fiscal year 2004” (Jackson, 2003:3).
Objective One of the Enhancements Program faces a heavier workload in fiscal 2004, with the goal of realigning 600 coun-
ties. The Bureau expects that the Harris Corporation will use local or tribal GIS files to update 350 of those counties and that, “for the remaining 250 counties, Harris will acquire, evaluate, and use sources such as commercial GIS files, imagery, and field-collected GPS road centerline data” (U.S. Census Bureau, 2003e:2). As we discuss in Section 3-C.1, the Bureau has provided no indication as to which counties will be targeted for update in 2004.
After TIGER files have gone through initial realignment, they are then supposed to be subject to updating using change detection—that is, using a newer-vintage local GIS file or aerial photography to automatically find new streets or structures. According to the Bureau, “requirements and methodology for detecting change (growth) for areas that have been realigned” are to be drawn up in fiscal 2004 (Jackson, 2003:4). To what extent, if at all, delays in finalizing these requirements result in delays in updating the 250 2003-realigned or 600 2004-realigned files remains to be seen.
We will discuss the status of Objective Two, the database redesign and conversion, in Section 6-C. Objectives Three and Four (partnerships and CAUS, respectively) remain largely unplanned; a “program master plan for geographic partnerships” is slated to be developed during fiscal 2004 and CAUS implementation (like the ACS) was postponed due to late closure on the fiscal 2003 budget.
3–C ASSESSMENT OF GEOGRAPHIC MODERNIZATION EFFORTS
3–C.1 Locational Accuracy of TIGER
Problems with the positional accuracy of TIGER have been apparent to the Census Bureau and its users for some time; anecdotal experiences of problems with TIGER representations were reported by field enumerators during the 2000 census and in feedback from local and tribal governments that participated in LUCA (Working Group on LUCA, 2001). Quantitative evidence of TIGER discrepancies can be found in Liadis (2000), the report of a Census Bureau experiment that collected GPS
readings for approximately 6,700 “anchor points” spread across selected census tracts in eight counties. Distances were computed between these “ground truth” coordinates and the longitude/latitude combination coded in TIGER. The results show evidence of considerable local variation, even across tracts within the same county. The distance between TIGER representation and ground truth varied according to the method used to introduce the point into TIGER. Somewhat ironically, more recent update programs—which added features by digitally inserting them as freehand drawings—accounted for the largest deviations from ground truth, while pre-1990 sources (e.g., GBF/DIME files) and programs involving direct use of local and tribal geographic files (e.g., DEX) generally came closest to true locations. The Census Bureau’s Geography Division also conducted pilot experiments comparing TIGER coordinates for small geographic samples to a combination of GPS coordinates and commercially available cartographic databases (U.S. Census Bureau, Geography Division, 2000) and to digital orthophotos giving an aerial view of ground features (O’Grady, 2000).
Though the full extent of TIGER inaccuracy may be unknown, there is enough evidence available that the panel endorses the aims of Objective One. Errors in the placement of roads, boundaries, and other geographic features are sufficiently serious and numerous that the TIGER database is in need of a comprehensive update. Moreover, raw TIGER/Line files cannot be fully trusted for routine GIS- and non-GIS-related tasks.
Given that locational error in TIGER is extensive enough to require correction, it follows naturally that accomplishing the basic task envisioned under Objective One is essential to the modernization of the census. GPS coordinates collected by PCDs are useful only to the extent that they can be accurately placed on base maps with streets and other key features. An accurately aligned TIGER, faithful to polygonal features such as municipal boundaries, can be passed along to localities and made available on the Internet, thereby allowing local and tribal entities the opportunity to report changes made to both linear (e.g., road and railroad) and polygonal features (e.g., administrative borders collected by the Boundary and Annexation Survey) in a more efficient and accurate way. If localities can readily utilize
an aligned TIGER for geocoding their own address files, comparisons with (and updating of) the MAF may eventually become routine.
Hence, the panel supports Objective One of the Enhancements Program and is heartened by the general steps taken to accomplish the objective. In particular, the panel views the acquisition of an outside contractor as a sign of significant progress, rather than keeping the process of TIGER updating a purely in-house operation. As Census Bureau staff noted in an interview, it is indeed a “very major departure for us” to seek external help in retooling TIGER, but “we’ve come to the conclusion [that] we need to take advantage of [vendors’] expertise and understanding” (O’Hara and Caterinicchia, 2001).
In the panel’s assessment, the Census Bureau deserves high grades for its determination to fix a major problem as well as for the boldness of the approach outlined in Objective One. That said, concerns about the work remain, and the plausibility of the Census Bureau’s ambitious realignment timetable would be bolstered considerably through attention to the following:
a detailed work plan, including the order in which counties will be initially updated;
realistic estimates of the number of available state and local GIS files that meet, in part or in full, the Census Bureau’s chosen positional accuracy standard for the realigned TIGER;
a clear plan for the evaluation of initially realigned TIGER files in order to inform future realignment as well as to recalibrate the Objective One timeline and budget; and
specification of plans for the postrealignment change detection program.
A point of some contention between the panel and the Census Bureau has been the order in which Objective One realignment will be performed. Aside from indicating that jurisdictions involved in mid-decade census tests or dress rehearsals will be given priority, the Census Bureau has not given a clearer idea of how it expects the flow of county-by-county processing to proceed. The notion of ordering is understandably somewhat sen-
sitive, since no locality would relish being last in the queue. However, the ambitious timetable laid out earlier in this chapter is unrealistic—at best—without some sense of ordering. The alternative—effectively starting 3,232 independent updating efforts simultaneously and hoping that 850 fall into realignment by the end of 2004—does not inspire confidence. There is no right answer to the question of ordering—conceivable mechanisms include starting with urban counties or rural counties, starting with original GBF/DIME areas, sequencing by population, or sequencing by some assessment of how out of alignment TIGER is for an area. But providing some structure to the task seems essential for measuring progress toward complete realignment and could add plausibility to the hypothesized timetable.
At the panel’s September 2003 meeting, the Census Bureau acknowledged this concern, noting that “it has been a challenge to balance the desire to establish a firm and detailed county-by-county schedule for the realignment effort on the one hand, and [maintain] the flexibility to take advantage of newly emerging tribal and local source data on the other hand.” The Census Bureau now indicates that the listing of local source files to be realigned “will be firmed up quarterly, 30 days prior to the start of the quarter” (Jackson, 2003:3).
In addition, a subtle point raised in our earlier discussion of the Census Bureau’s Digital Exchange (DEX) program deserves fuller explication. Given two GIS files (a local file and the TIGER data), a “rubber-sheeting” process manipulates certain matched features in one file to conform to the other, shifting related features automatically. The Census Bureau’s early DEX system altered the local file to follow known features in TIGER in order to avoid topological bugs that might result otherwise—a justifiable choice, perhaps, but one that runs counter to the purpose of updating the presumably misaligned TIGER based on presumably accurate local files. We hope and trust that this approach has been rectified as the Bureau has developed procedures with its contractor; the Bureau noted in its update of Enhancements Program progress that “Harris is required to align TIGER road features exactly to the source data (which, again, must meet or exceed the 7.6 meter accuracy standard) as well as maintain TIGER’s topological integrity” (U.S. Census Bureau, 2003e:2).
Further empirical information on discrepancies between local file content and existing TIGER topology (and their resolution), along with additional detail on how the Harris Corporation’s alignment tools handle topological gaps and generally manage the conflation between local and TIGER files, could strengthen confidence in the finished product.
3–C.2 Balance of the MAF/TIGER Enhancements Program
The panel applauds the Census Bureau’s efforts to adopt GPS technologies and a modern processing environment using COTS products to achieve Objectives One and Two. We comment on Objective Two—discussing major points of concern—in Section 6-C, in the context of the census technical architecture.
We also note that the Census Bureau has made some steps toward establishing metrics to evaluate improvements in accuracy, as called for by Objective Five. Work with contractors has brought about an image-based rough assessment system that allows accuracy checks on incoming files, as well as progress on evaluation of files on the basis of control points, and a soon-to-be-installed system for quantifying and tracking TIGER errors over time. It is essential, in our view, that quality assessment through such metrics be an ongoing and well-timed process so that updating of the database achieves the apparent goal: information in TIGER maintained to a currency of one year or less at all times.
As elaborated in Chapter 8, the panel emphatically believes that Objective Five is a crucial part of the Enhancements Program and should lead to the development of general quality metrics for all of the Census Bureau’s geographic programs. However, with respect to progress on Objective Five, two comments must be made. First, it is possibly telling that neither the Census Bureau’s presentation to the panel in September 2003 (Jackson, 2003) nor the subsequent update in October (U.S. Census Bureau, 2003e) addressed progress on Objective Five. Beyond the diagnostic function for local files to be included in Objective One realignment, general progress on metrics for TIGER quality and coverage is not known. Second, and more fundamentally, all discussion of Objective Five activities—see, for instance, the
Bureau-identified subtasks in Section 3-B.5—has focused almost exclusively on quality metrics for geographic coordinates, not for addresses. That is to say, to the extent that Objective Five is defined at present, it is focused on TIGER and TIGER realignment; it is not focused on the MAF, a fact that we believe is symptomatic of a larger lack of attention on the Bureau’s part.
3–D WEAKNESS: ENHANCING THE MAF
It is clear that the MAF/TIGER Enhancements Program has the potential to enhance TIGER, making necessary improvements given known problems with TIGER accuracy. But, for the sake of census accuracy, a more important question is how the program will enhance the MAF—that is, how it will add new addresses, screen for duplicates, and generally ensure that address rosters are as complete and accurate as possible. On this score the Enhancements Program falls seriously short, in our view, due to the lack of development of Objectives Three and Four. More generally, the Census Bureau’s current strategy shows relative inattention to MAF improvement and, worse, shows signs of repeating costly errors from the 2000 experience.
The magnitude of the Objective One task of realigning TIGER features—and the monetary cost associated with it—give the Enhancements Program a TIGER-centric feel. And Objectives One, Two, and Five seem to speak to the MAF largely as it inherits its quality from TIGER. Indeed, the Bush administration’s budget messages to Congress for both fiscal 2004 and 2005 describe the geographic leg of the Census Bureau’s 2010 strategy as a plan for “enhancing the Census Bureau’s geographic database and associated address list” (emphasis added). In line with our comments in opening this chapter, the MAF is too critical to the quality of the census and other survey programs to be treated merely as an add-on.
3–D.1 Current Plans for MAF Updates for 2010
The Census Bureau argues that the combination of three activities—“the ongoing MAF/TIGER updating using the Delivery Sequence File, CAUS, and enhancements included in the pro-
posed MAF/TIGER modernization initiative”—“should result in an up-to-date address list for the entire United States” (U.S. Census Bureau, 2003c:11). More specifically, the update strategy is based on a rough urban/rural dichotomy:
The Postal Service’s DSF is intended to be the address update source “in areas where DSF addresses can be assigned a physical location, such as urban areas with city-style addresses” (U.S. Census Bureau, 2003c:9).
“In rural areas with non-city-style addresses,” the Bureau argues that the DSF updating process “cannot be used,” and so the Census Bureau intends to update this segment through CAUS. The Bureau indicates that the areas for which DSF updates cannot be used “encompass the majority of the Nation’s land area and about 15 percent of the population” (U.S. Census Bureau, 2003c:9).
These update sources are to be supplemented in the MAF/TIGER Enhancements Program, which we interpret to mean a successor to the 2000 census LUCA program under Objective Three.
The backbone of the Census Bureau’s update strategy is the twice-yearly “refresh” that comes from the Postal Service’s Delivery Sequence File. These regular updates are considered to be part of the Bureau’s Geographic Support Base Program, not the MAF/TIGER Enhancements Program. While the DSF is certainly an important source of address information, reliance on the DSF as the principal source of address updates for (by the Bureau’s estimate) 85 percent of the household population raises at least two concerns:
Historical precedent in the 2000 MAF-building process. As we indicated earlier, DSF updates were previously viewed by the Census Bureau as a primary address source after the 1994 passage of legislation that enabled sharing of this information with the Postal Service. However, the Bureau perceived problems with the level of DSF coverage in fast-growth and new construction areas and had to initiate a costly complete block canvass (National Research Council, 1999) in an attempt to ensure comprehensive coverage.
Limitation of DSF to mail delivery population. Again, by definition, the DSF is intended to document mail delivery addresses, which is not equivalent to the complete list of housing units in the United States.
The Census Bureau’s planned activity to update addresses in rural areas is CAUS, which—to briefly review—is an associated program of the American Community Survey (ACS). Under CAUS, ACS field representatives would list addresses (and update streets, using traces from a GPS receiver) through a laptop computer-based tool known as the Automated Listing and Mapping Instrument (ALMI). However, general concerns raised by dependence on CAUS as an address update source include the following:
Linkage to ACS funding. Full and sustained funding for the ACS has not yet been secured; consequently, the budgetary viability of CAUS is not known. Implementation of CAUS must also await full mobilization of ACS support staff (which will presumably entail more time as well, as the establishment of ACS operations takes priority), which will add to the delay in the possible receipt of CAUS updates. Finally, the number of CAUS field personnel will be linked to the number of ACS enumerators. While it is hoped that budget commitments to ACS will not oscillate, the effectiveness of CAUS could be impaired if ACS funding is not stable over the years.
ACS workload management. It is unclear how much time and manpower ACS managers will commit to the side work of the address listing given the ambitious timetable of ACS data collection.
Unclear/unspecified mechanism for targeting areas for update. The plans for deployment of CAUS representatives to collect information in particular geographic areas are as yet unspecified. One approach might be for enumerators to list new streets or developments they find by happenstance in carrying out their regular ACS work, but that is surely an unreliable means of covering the entire rural population. The draft ACS operations plan indicates that “ACS planners
[will] use various methods for identifying where coverage is insufficient,” including “work with community officials to acquire information about new addresses, new streets, and/or areas of significant growth” (U.S. Census Bureau, 2003c:10). But, again, the mechanics of this targeting are uncertain.
The third element in the address update strategy—a LUCA-type program—is a topic we will discuss in greater detail in Section 3-E.5. But, for the purpose of the argument at hand, the major concern regarding a new local address review program is simply that no prototype plans have yet been developed.
3–D.2 Block Canvassing
In our second interim report, the panel commented (National Research Council, 2003a:66):
We assume that the Bureau hopes to avoid a complete block canvass prior to the 2010 census, given the cost of that operation and that it was treated as a last resort in 2000.
Our supposition was that the Census Bureau would pursue targeted block canvassing—identifying selected geographic areas with sufficiently fast growth or other characteristics to warrant a thorough precensus address list check.
In responding to the interim report at our September 2003 meeting, the Census Bureau expressed surprise at this statement, maintaining that a full block canvass was always part of the Census Bureau’s 2010 plan. We respectfully disagree; part of the tenor we recall in early discussion of the MAF/TIGER Enhancement Program was the need for continuous address updating over the next decade in order to avoid a block canvass. The Census Bureau’s document on projected life-cycle costs of the 2010 census suggests the desire to replace a last-minute canvass with continuous updating. “While address building and TIGER updating occurred to a limited extent over the decade leading to Census 2000,” the document says, “the major updating activities occurred during 1998–99 and involved expensive, complex, laborintensive field operations.” As a result of regular DSF updates
and local and tribal updates, “the 2010 Census will be armed with a far more comprehensive, timely, and accurate address list—one of the best predictors of a successful census—without the complexity, risk, end-of-the-decade costs, and last minute address building costs” (U.S. Census Bureau, 2001a:3–4).
Regardless of when the idea reemerged, the panel acknowledges with some concern that a full block canvass now appears to be part of the Census Bureau’s plan, though no detailed schedule or specifications are known to us, nor have any changes to the operation from its 2000 census implementation been described.
We understand the draft nature of the current 2010 planning documents and are thus hesitant to quote from them extensively. But the Census Bureau’s comments on address list issues in its draft baseline design document (Angueira, 2003b:3) suggest an emerging direction that could potentially be so damaging to a quality census that they merit detailed examination. The comments begin:
When address list updating gets underway in 2009, census geographers and field staff will be working with an address list unprecedented in its accuracy and completeness.
Were nothing to be done with the Master Address File between now and 2010, the statement would hold by virtue of the fact that (unlike censuses before 1990) the 2000 MAF was not discarded following the census. Just as the 1990 Address Control File was the lead contributor of addresses on the 2000 MAF (Vitrano et al., 2003), so too is it reasonable to expect that the 2000 MAF will contribute the core set of addresses to the 2010 MAF.
Still, having the 2000 MAF in hand does not give license to defer active address list updating to 2009. We believe—and sincerely hope—that the sentence is a misstatement; indeed, in later text, the baseline design document strikes a more reasonable note, pledging “work with USPS, local, and tribal partners” through the decade and saying that, “whenever we identify new housing units or those that no longer exist, we will update our files” (Angueira, 2003b:4). A different interpretation of the first sentence is that the general term “address list updating” is be-
ing used to describe a more specific operation, most likely block canvassing.
The draft baseline design continues (Angueira, 2003b:3–4):
As part of the MAF/TIGER Enhancements Project, the Local Update of Census Address program (LUCA) will have been streamlined and improved based on lessons learned from the Census 2000 LUCA experience, and the address list for the entire universe will have been maintained and updated on a continuing basis. … There will be an address updating operation in 2009 in areas that we believe have experienced significant changes. … The streamlined, ongoing LUCA program will culminate with a final opportunity for local governments to review their address lists, which will occur prior to address canvassing. We will then validate any LUCA adds during address canvassing. We will have a New Construction operation, and will attempt to include those addresses in questionnaire delivery. The New Construction adds will be validated during a later operation.
The implications of these statements are disturbing in two key respects. First, the passage lists several different address updating mechanisms (considering updates from the DSF and CAUS as part of the MAF being “maintained and updated on a continuing basis”) but provides a very weak sense of their order and scheduling. That a block canvass would not overlap a LUCA-type program—as it did in 2000—is an improvement. But how all the activities fit into a coherent timeline is not clear—particularly if 2009 is the start date. Second, the casting of the block canvassing operation as a validation step for LUCA is troubling as it imparts to block canvassing a “most trusted” authority. We do not argue that local and tribal knowledge of addresses is fool-proof, and there is need for some sort of validation. However, it is unclear whether empirical evidence supports the assertion that block canvassing is more likely than other operations to correct addresses. Creating the impression that, near the end of the decade, the Census Bureau will make a major deployment of field staff to perform block canvassing because local input on address information is somehow less trustworthy may only serve to further hinder participation by local and tribal authorities in Census Bureau activities.
3–E.1 Plan MAF Improvements Independent of MAF/TIGER Enhancements
The Census Bureau needs to outline goals pertaining directly to the MAF independent of the goals for TIGER—for example, in the development of quality metrics and the identification of housing unit duplication. Overall milestones and tasks need to be specifically set for Objectives Three and Four, to determine how these objectives may work to control housing unit duplication and to more accurately identify and account for multiunit housing structures. It is also vitally important that MAF improvements be coordinated with efforts to list and enumerate the population living in special places and group quarters; we will describe both group quarters and multiunit structures in Chapter 5.
The Panel to Review the 2000 Census discusses the problems of the 2000 MAF in great detail (National Research Council, 2004:Ch.4), and argues that the process for updating the MAF during the years leading to the 2010 census is in need of serious revision. We concur, and accordingly stress the following recommendation (a synthesis and extension of both Recommendation MAF–1 from our second interim report and National Research Council (2004:Rec. 4.1)):
Recommendation 3.1: The Census Bureau must devise a plan and develop effective procedures for updating and correcting the Master Address File (MAF). A complete and accurate Master Address File is critical not only to the success of the 2010 census but also to the effective implementation of the American Community Survey, the other household surveys conducted by the Census Bureau, and the 2008 dress rehearsal. Because the 2000 MAF was not simply discarded following the 2000 census (as occurred in censuses prior to 1990), the 2010 census will have as a base an address file of unprecedented completeness, but that does not obviate the need for continual updating, filtering, unduplicating, and cleaning of the MAF during the years leading to the 2010 census.
The plan for a continually updated 2010 MAF must include, but not be limited to, the following:
A clear articulation of how the MAF/TIGER Enhancements Program and other Census Bureau activities will add missing housing unit addresses, remove duplicate addresses, and generally correct the Master Address File, independent of benefits derived from being cross-referenced to an updated TIGER database;
More effective definitions of housing units and methods to obtain accurate address listings for structures containing multiple housing units, as it is not sufficient to know only the address or geographic coordinates of the structure location;
Detail on the temporal sequencing and adequacy of address updates from the U.S. Postal Service’s Delivery Sequence File, the Census Bureau’s Community Address Updating System, and as-yet unspecified local partnership programs;
More effective means to define, list, and enumerate group quarters living arrangements, which should be done in coordination with the development and maintenance of the MAF; and
A detailed plan for Objective Five (quality metrics) of the MAF/TIGER Enhancements Program, including a program of evaluation and assessment of MAF coverage and input to the MAF/TIGER redesign (Objective Two), so that the revised database structure includes appropriate address source codes and other useful variables for evaluation.
3–E.2 Coordinate Responsibility for the MAF
In Chapter 6, we advocate the creation of a new position within the Census Bureau—a system architect for the decennial census—with the primary goal of integrating and coordinating work on architecture remodeling. We believe that improving the MAF is likewise an area that would benefit greatly from focused staff effort. At least three major divisions within the Bureau (Geography, Field, and Decennial Management; see Box 2.2) have a strong stake in the maintenance and use of the MAF as it pertains to the decennial census, and the Demographic Surveys division also has a stake given MAF use in conducting the Bureau’s household surveys. Given the legitimate (but sometimes competing) interests of the various divisions, it would be useful to vest responsibility for coordinating MAF improvement and research in one office with both the connections and the ability to work with all relevant divisions.
We reiterate a recommendation from our second interim report (National Research Council, 2003a:Rec. MAF–2):
Recommendation 3.2: The Census Bureau should create and staff a position to oversee the development and maintenance of the MAF as a housing unit inventory, with a focus on improving methods to designate, list, and update units. This position should be responsible for development and implementation of plans drawn up consistent with Recommendation 3.1.
Census Bureau staff expressed skepticism about this recommendation in their reaction to the second interim report at the panel’s final public meeting in September 2003, arguing that the Bureau’s organization is not given to the creation of centralized “czar” positions. That argument, however, underscores the point of this and several other recommendations in this report: real integration in achieving census objectives will require some thinking outside the lines of existing organizational trees. In our assessment, the Census Bureau’s approach of handling MAF issues by committee is ineffective and leads to serious underutilization of the Bureau’s existing staff and resources; MAF development should be supported with a clear structure of orga-
nization and accountability, as the Enhancements Program has done for TIGER.
3–E.3 Improve Research on the Delivery Sequence File
Our next four recommendations call for the development of an empirical, research-based approach to MAF updating efforts throughout the decade. Each prospective address input source should be carefully examined, weighing strengths, weaknesses, and costs, and reasonable estimates of the source’s potential contribution to the 2010 MAF should be produced.
The U.S. Postal Service’s Delivery Sequence File provides twice-yearly “refreshes” to the MAF under the Census Bureau’s current geographic support system. The efficacy of these updates—in general, and differentially by geography and urban/rural status—has not yet been fully demonstrated.
Recommendation 3.3: The Census Bureau should pursue more effective partnership and research collaboration with the U.S. Postal Service, including but not limited to further work on “undeliverable as addressed” items from the 2000 census, assessment of the address coverage quality of the Delivery Sequence File (DSF), and possibilities for more accurate translation of post office box listings and other DSF entries to street addresses and geographic coordinates.
3–E.4 Define the Role of the Community Address Updating System
Objective Four of the MAF/TIGER Enhancements Program—CAUS—has been delayed in implementation due to the lack of initial funding for the ACS. The expectations for CAUS have never been entirely clear. As we noted in Section 3-D.1, the system has been described as vital to securing address updates “in rural areas with non-city-style addresses,” which represent approximately 15 percent of the population (U.S. Census Bureau, 2003c:9). However, at the panel’s last public meeting in September 2003, senior Census Bureau staff commented that not much
would be lost if CAUS were not fully implemented, contingent as it is on funding of the ACS.
However, the Census Bureau has indicated that it trained 400 field representatives on CAUS methodology in August 2003, began listing operations in October 2003, and has continued to refine the ALMI GPS-equipped laptop computer used to collect CAUS data (U.S. Census Bureau, 2003e:4). In light of these investments, as 2010 planning proceeds, the Bureau needs to make clear the expectations for CAUS, including assessment of the long-term feasibility of the activity and of its potential contribution to the 2010 MAF. If CAUS is indeed crucial to securing updates from rural areas, given the uncertainty about the program’s implementation, consideration needs to be made as to whether alternate sources could provide the information.
Recommendation 3.4: The Census Bureau should assess how critical the Community Address Updating System (CAUS) is to providing address updates in rural, non-city-style address areas. Such an assessment should include not only estimates of the number of addresses that could be provided and the workload that could be handled by CAUS/American Community Survey staff, but also empirical evidence on coverage gaps in the U.S. Postal Service’s Delivery Sequence File by geographic area or type.
3–E.5 Plan Local Geographic Partnerships and Implement Early
To its credit, the Census Bureau has recognized the importance of partnerships with local and tribal governments by designating their creation and maintenance as Objective Three of the Enhancements Program. The Bureau’s RFP for the TIGER realignment of Objective One makes this clear, noting that “the success of [Objective One], and the continuous update of the information in MAF/TIGER, requires ongoing interaction between the Census Bureau and its federal, state, local, and tribal government geographic partners.” However, the Bureau has not provided a clear indication of how such partnerships would work.
While the panel acknowledges that the funds available for expanding and encouraging geographic partnership options have been limited, the cryptic descriptions of Objective Three that we have received do not make clear how and when the Bureau intends to involve local and tribal partners in these programs.
A major stated role for local and tribal geographic partners is to contribute to Objective One by sharing their current GIS files with the Census Bureau to support TIGER realignment. But in this matter, and in past geographic interactions such as LUCA, the Census Bureau has often approached “partnership” as a one-sided exchange: “partners” expend resources and turn information over to the Bureau. The principal reward to a local or tribal government for entering into such a partnership is definitely not trivial: the prospect of a more accurate census count. The panel recognizes that the Census Bureau is not a fund-granting organization and hence cannot directly subsidize local or tribal governments to improve and submit their geographical resources. That said, the Bureau should aim for partnerships that are true exchanges of information: for instance, by giving census field and regional staff an increased role in interacting with local and tribal authorities and collecting information updates. At the very least, steps should be taken to lessen the burden of partnership on the local and tribal governments—for example, by conducting LUCA-like address list reviews electronically with submissions via the Internet, and coordinating the various geographic data collection programs so that localities are not asked for similar information in different formats by different divisions of the Census Bureau.
The Census Bureau needs to articulate a plan for communication with localities that takes advantage of existing structures, including the State Data Center Network, the Federal-State Co-operative Program for Population Estimates, state and regional councils of governments, and other local governmental entities. The role of the Census Regional Office Geographic Coordinators relative to these entities and to Census Bureau headquarters needs to be spelled out.
The ability and willingness of different governments to join forces with the Census Bureau vary widely. It is inevitable that local efforts will be differentially expressed in different areas of
the country, whether such efforts involve mapping, address listing, or the nurturing of partnerships. While all areas should receive equal treatment in the spirit of fairness, local interest, feasibility, and cost-effectiveness might well dictate otherwise. Moreover, although geographic partnerships with local and tribal governments can be useful to tap the knowledge and expertise of those closest to the field, variations in GIS usage may affect the accuracy of local and tribal government geographic resources and may introduce errors when combined with census resources. In the interest of effectiveness, we recommend careful analysis of the successes and failures of prior LUCA programs in order to properly conduct future community participation programs. Close evaluation of the 2000 address file by type of enumeration area, by dwelling type, by the contribution of geographic update programs like LUCA, and by region of the country—highlighting areas where elicitation of local and tribal information may be most beneficial—is surely required if the Census Bureau is going to maintain the MAF in a cost-effective manner in the years leading to the 2010 census. The Bureau’s future plans for LUCA and other partnership programs should also provide for evaluation of those partnerships, not only to inform the effectiveness of local contributions from the census perspective but also to give feedback to participating local and tribal governments.
We reiterate a recommendation from our second interim report (National Research Council, 2003a:Rec. MAF–3) and add two other points on the nature of partnerships:
Recommendation 3.5: The Census Bureau should immediately develop and describe plans for partnerships with state, local, and tribal governments in collecting address list and geographic information. Such plans should include a focus on adding incentives for localities to contribute data to the census effort, making it easier for localities and the Bureau to exchange geographic information. Accordingly, plans for partnerships should include:
clear articulation of realistic schedules for local input and review;
definition and clear presentation of benchmark standards for local data to be submitted to the Bureau;
mechanisms for providing effective feedback to local and tribal governments, detailing and justifying the Bureau’s decisions to use or not use the information provided; and
coordination of efforts across the Bureau so that calls for local and tribal entities to supply input to the Master Address File, TIGER, the Boundary and Annexation Survey, and other Bureau programs are not unduly redundant and burdensome.
3–E.6 Justify the Complete Block Canvass
In Section 3-D.2, we commented on Census Bureau reaction to the assumption, stated in our second interim report, that the Bureau hoped to forestall a complete block canvass in the 2010 census. Our commentary in the interim report continued (National Research Council, 2003a:66):
In the absence of evidence that the combination of DSF and LUCA leading up to 2010 can overcome the last-minute doubts that arose in the late 1990s and without a clearer plan for CAUS—it is difficult to see how a full block canvass can be averted.
We continue to stand by this assertion, and have called for development of empirical evidence on possible DSF, CAUS, and LUCA contributions to the 2010 MAF. Likewise, we believe that the Census Bureau’s decision to proceed with a full block canvass should also be justified with empirical evidence.
We do not suggest that block canvassing is an idea that lacks merit. The evaluations of the 2000 census suggest that the effort contributed many addresses to the MAF (Burcham, 2002; Vitrano et al., 2003) and was generally very good at verifying existing units. However, evidence also suggested relatively high rates of inconsistency (22–24 percent) between addresses added or deleted by the block canvassing operation and results in the
census; an example of an inconsistency is a housing unit added by block canvassing but then found during the census operations to be an invalid housing unit (Burcham, 2002:38–39).
We believe it rash to commit to such an expensive operation as full block canvassing absent both a compelling base in empirical evidence and a determination that targeted canvasses in specific (e.g., fast-growth suburban) areas are infeasible. It is decidedly a mistake to consider a full block canvass without early attention to effective canvass techniques for all types of housing stock, particularly small multiunit structures (see Section 5-C.1). The panel is also concerned that reliance on a block canvass may send unfortunate mixed messages about the relative quality of the address list needed for different purposes—that special operations are needed to derive an address list of presumably higher quality than that needed for the Census Bureau’s other surveys and, particularly, the ACS. We therefore recommend:
Recommendation 3.6: The Census Bureau should evaluate the necessity of its plans to conduct a complete block canvass shortly before the 2010 census. Such justification must include analysis of extant census operational data and should include, but not be limited to, the following:
arguments as to why selective targeting of areas for block canvass is either infeasible or inadequate, and as to how the costs of the complete block canvass square with the benefits; and
analysis of how a full block canvass fits into the Census Bureau’s cost assumptions for the 2010 census.
If plans proceed for a complete canvass, the Bureau should also consider how such a mass field deployment prior to 2010 could be used to achieve other improvements or efficiencies, such as the collection of GPS trace data as supplement to or as quality control for the TIGER realignment.
3–E.7 Exploit 2000 MAF Data, and Redesign MAF for Evaluation in 2010
A recurrent theme in our preceding remarks is that there is a strong need for empirical assessment of the quality of potential address sources for the 2010 MAF. The natural starting place for such an evaluation would be the Census Bureau’s MAF Extract. Based on the 2000 census Decennial Master Address File—the “snapshot” of the MAF used to generate census mailing labels and to monitor mail response—the MAF Extract includes “flags” that indicate which of several sources contributed the address to the MAF. The MAF Extract also contains selected outcome measures, such as whether the address record was actually used in the 2000 census and whether it was tagged as a potential duplicate during the ad hoc duplicate screening program of early to mid-2000 (Nash, 2000).
The MAF Extract has certain liabilities, chief among them that the system of flags used to indicate the source of an address does not show the complete history of an address in the MAF. Other than a rough temporal ordering of the input sources themselves, it is usually impossible to determine which source first contributed the address. Nonetheless, the extract is critical to answering key questions about the MAF-building process, and the panel continues to urge that the data resource be tapped for as much information as possible.
Analyses of the MAF Extract should consider the type of enumeration area for each address in the 2000 census (e.g., mailout/mailback or update/leave) as well as geographic region. The main objective of the analysis is not to highlight how different areas of the country may have fared under various programs, but rather to obtain knowledge of how people in those areas respond and interact with census activities in order to improve planning for future census programs.
Some key questions to address through Census 2000 evaluations are the following:
Why were addresses included in the MAF but not in the 2000 census?
This question provides perspective for the others on this list and is a good starting place.
How useful were the DSF updates in the identification of new units, especially in high-growth areas of the nation?
The goal is to examine how much of the newest housing was picked up in a timely fashion by the U.S. Postal Service. The answers can provide valuable clues about the effort the Census Bureau should put into other avenues (e.g., new construction programs) as sources of information on new housing.
How effective were LUCA inputs relative to what was already known (or was promptly seen) in a DSF update? Of those contributions that can be determined as “unique,” how many governments were represented and what kind of housing do these addresses represent?
While LUCA must be conducted as part of the preparation for the 2010 census, the resources the Census Bureau chooses to expend on it can vary dramatically. The answer to this question can also inform strategies for the LUCA program for 2010.
What were the original sources of address records that were deleted in the ad hoc duplicate identification and removal process conducted in 2000?
Duplication related to address listing anomalies can be rectified once the specific problems with the duplicate addresses have been identified. Identifying the original source of the affected addresses is a prime means for doing that.
What were the original sources of addresses that were flagged as potential duplicates but later reinstated?
This question addresses the hypothesis that some addresses, originally considered as potential duplicates, were put back into the census in error. The Census Bureau already has an estimate of this number. By identifying the original sources of these addresses, the Bureau will have valuable clues about what produced this problem and how to avoid it in the future.
What were the original sources of addresses for housing units where an interview was not obtained in nonresponse follow-up (NRFU)?
One hypothesis about the shortfall of long-form data in the 2000 census posits that NRFU enumerators encountered high levels of resistance from respondents who were being enumerated for the first time (some of whom escaped detection in 1990). Where did the addresses of these tough-to-enumerate units fall? (Of course, this is not the only or most likely hypothesis to explain problematic long-form data, but the question warrants attention and the Census Bureau’s MAF Extract data may be able to provide useful information.)
What were the original sources of addresses for housing units that were subsequently declared nonexistent or were not found in NRFU?
NRFU enumerators had the option of entering codes for “cannot locate,” “duplicate,” and “nonresidential,” among others, as reasons for listing a unit as “nonexistent.” Were these potential duplicates added back in, were erroneous addresses brought in from LUCA that were not detected by the Census Bureau, or were these problem addresses disproportionately from some other original source?
For cases where a unit was determined not to exist in coverage improvement follow-up (CIFU; the final follow-up stage during the actual fielding of the census), what was the original source of the address? How many addresses were erroneously kept in the census and then deleted when the Bureau went out to check in CIFU?
The Census Bureau’s topic report on address list development (Vitrano et al., 2003) is a step toward answering these questions. In particular, it makes strides toward managing the poor and confusing MAF codes indicating operations that added or edited the address in order to ascertain the original source of each address record. But it is only a step. Accordingly, we recommend:
Recommendation 3.7: The Census Bureau must:
fully exploit the address source information in the MAF Extract in order to complete 2000 cen-
sus evaluations, fill gaps in knowledge remaining from the 2000 census evaluations, and assess causes of duplicate and omitted housing units; and
build the capability for timely and accurate address evaluation into the revised MAF/TIGER data architecture, including better ways to code address source histories and to format data sets for independent evaluation.