Technical Infrastructure and Business Process
MASSIVE AMOUNTS OF INFORMATION—from the answers to every question on every returned questionnaire to the personnel and payroll records for hundreds of thousands of temporary employees—must be managed and processed in order to conduct a successful decennial census. To process all this information, the census relies on a complex technical architecture—the collection of people, computer hardware and software, and telecommunication networks that supports the complete workings of the census. Included in this technical infrastructure are subsystems to track personnel hires and fires, monitor caseload and make enumerator assignments, capture and synthesize data, generate maps, and myriad other functions, which must function not only at Census Bureau headquarters but also at regional offices, data collection centers, and hundreds of temporary local census offices.
The 2000 census relied on several major systems (U.S. Census Bureau, 2000; Titan Corporation, 2003), including the following:
Geographic Support System (GSS): a facility for deriving extracts from MAF/TIGER as necessary and printing enumerator maps;
Pre-Appointment Management System/Automated Decennial Administrative Management System (PAMS/ADAMS): asystem to support the hiring, processing, and payment of temporary employees, as well as administrative data archiving;
Operations Control System (OCS 2000): a caseload management system to define and track enumerator assignments, as well as to monitor duplicate and missing addresses;
Data Capture System (DCS 2000): a system for the check-in and scanning of completed questionnaires;
Telephone Questionnaire Assistance/Coverage Edit Follow-Up (TQA/CEFU): a program to provide support for respondents requiring assistance or additional forms, as well as follow-up data collection from respondents by phone;
Internet Data Collection/Internet Questionnaire Assistance (IDC/IQA): a system for the support of limited-scale Internet response to short-form questionnaires;
Accuracy and Coverage Evaluation (ACE): aprogramtoprovide support for a follow-up survey to assess possible undercount (including maintenance of laptop computers used by enumerators and the Matching and Review Coding System [MaRCS] used in matching the survey responses to census returns);
Management Information System (MIS 2000): a system for senior management planning and information tracking, including schedule and budget planning and tracking;
Headquarters (HQ) Processing: the analysis and processing of final data, including production of reapportionment and redistricting population counts, as well as other data products; and
Data Access and Dissemination System (DADS): asystem for dissemination of census data to the public, most notably through the American FactFinder Web site (http://factfinder.census.gov).
In the end, the information systems of the 2000 census achieved the desired results. “Operationally, most agree that this decennial census was a success—participation was higher than
anticipated … and operations concluded on time,” notes an assessment prepared by the U.S. Department of Commerce, Office of Inspector General (2002:iii). However, the assessment continues, the means by which it was achieved—including the patchwork of information systems—led to other descriptions: “costly, complex, and high risk.”1
The technical infrastructure of the 2000 census was generated without reference to an overall blueprint; individual systems were pieced and linked together, often having been developed quickly and without full opportunity for testing. Though not as well publicized as the Census Bureau’s major proposed initiatives for the 2010 census, the Bureau has taken steps toward a more rigorous development process for the 2010 census technical infrastructure. Specifically, efforts are under way to model the logical infrastructure of the census—the complete mapping of information flows through the entire decennial census. Properly executed, logical infrastructure models allow for alternative organizational structures and assumptions to be tested in the abstract. Alternative models can be compared before deciding on a model; that finished model then serves as blueprint, specification, and template for constructing the physical (hard-ware/software) technical systems. Full use of logical architecture modeling has the potential to greatly reduce risk in system development and ensure that the various information subsystems of the census communicate effectively with each other.
In this chapter, we examine this modeling effort as well as the Bureau’s broader effort to develop its technical infrastruc-
ture. Section 6-A describes the basic concepts of an architectural model and discusses the Census Bureau’s initial implementation; our assessment of the modeling effort is given in Section 6-B. In Section 6-C, we address a major, specific piece of the broader architecture for 2010: namely, the revised database structure for the Master Address File (MAF) and TIGER system. We close the chapter in Section 6-D by outlining major challenges faced by the Bureau in managing and finalizing the technical infrastructure of the 2010 census.
6–A TOWARD A “BUSINESS PROCESS” OF THE DECENNIAL CENSUS
Past experience with reengineering and upgrading information technology operations within corporations and government agencies suggests that the most prudent and productive approach is to proceed in well-thought-out stages or steps:
Define a “logical architecture” or “business process” model. A first step is to articulate the set of activities and functions currently performed by the organization and the informational dependencies among them. This model of activities and functions is called a logical architecture. It may also be called a business process model because it defines the ways in which operations are carried out to accomplish the intended objectives of an organization. In the census context, the current business process would be the information flows and tasks associated with the 2000 census. We will explain the nature of logical architecture or business process models in greater detail in the following section.
Reengineer the logical architecture. The completed logical architecture may be viewed as an “as-was” model; again, in this case, the as-was model would describe the activities of the 2000 census. Using the as-was model as a base, the next step is to produce one or more “to-be” models—that is, to identify new assumptions and objectives and to adjust the as-was logical architecture model as necessary to find the optimal way to structure functions under the new demands.
Different to-be models can then be compared against each other in order to reach a final architecture model.
Construct the physical technical infrastructure using the reengineered logical architecture as a guide. The finished logical architecture/business process model is then used as the template and specification for a new physical technical infrastructure—the actual network of hardware and software systems assembled to carry out the organization’s work.
Any other approach—such as failing to map business functions in terms of overall objectives or rushing to make decisions on technical infrastructure too early—serves only to allow the organization to make more mistakes, albeit (probably) faster than before.
The Census Bureau has begun the task of reengineering the decennial census infrastructure in this manner because it fits into the objective of early planning and testing envisioned as part of its broad strategy for the 2010 census and because it brings the Bureau and the Department of Commerce into fuller compliance with the Information Technology Management Reform Act of 1996 (also known as the Clinger-Cohen Act).2 This act called for federal agencies to reexamine their information technology (IT) structures, requiring greater attention to how IT furthers the agency’s goals and to modeling current and modernized IT structures as a business process. The Chief Information Officers (CIO) Council, created by executive order, subsequently developed the Federal Enterprise Architecture Framework (FEAF), a set of minimum standards for description of IT programs and modernizations.
6–A.1 Baseline: Logical Architecture of the 2000 Census
The Census Bureau contracted with the Centech Group, an IT company based in Arlington, Virginia, to develop its baseline for infrastructure reengineering: namely, a business process model
of the operational flows underlying the 2000 census. Lockheed Martin was subsequently brought in as a subcontractor. The result of this first stage of work is a map of the logical architecture of the 2000 census, and it is summarized in a report by the contractor (Centech Group, Inc., 2002a). A more detailed companion volume examines each logical segment of the model in greater detail (Centech Group, Inc., 2002b). The model developed in this contract does not cover every decennial census operation but concentrates on what the Census Bureau identified as major business process areas.
The logical architecture models developed by the Census Bureau under this contract adhere to the Integration Definition for Function Modeling (IDEF0) language, a method that has been adopted as a federal standard for representing organizational functions and flows.3 IDEF0 models use simple graphical structures to organize information. Functions (activities) of an enterprise are rendered as boxes, which are connected by arrows representing information constraints. For large enterprise models, a high-level diagram is typically produced as a guide or road map for the analyst; smaller pieces are then indexed based on this high-level map and are available in full detail on separate pages.
A logical architecture model is a blueprint of the workflow of a particular enterprise. It describes the nature of information that must be passed from point to point at various phases of the operation and, in doing so, highlights information interfaces—points of connection both within the system and with external entities. The model thus defines the baseline capability that must be present when a physical technical infrastructure is constructed. It may also convey a rough sense of where, geographically or organizationally, groups of activities should be clustered.
To better understand what a logical architecture model of the decennial census is, it is also important to be clear about what it is not. The main purpose of an IDEF0-based logical architecture model is to emphasize process and function. To that end, the model effectively disregards two variables that are
of some natural concern. First, it does not assign completion times to any function or process. Rather, it describes forward information flow through a business process without delineating a timeline or schedule of the process. Individual segments of the model may be completely distinct in terms of their execution time or may overlap extensively. Second, IDEF0 models are not based on existing organizational boundaries; logical segments are partitioned strictly based on function and purpose, without respect to internal work divisions that may already exist within an enterprise.
An important question in building IDEF0 models is the level of detail required in the diagrams in order to facilitate effective process reengineering. FIPS Publication 183, which defines IDEF0 structures, suggests that each parent box (function) be decomposed until it can be expressed in 3 to 6 child boxes (Part B.2.1.4). The arrows representing information constraints should be expressed in the same level of detail as the boxes (Part B.2.2.2); that is, a rule of thumb is that activities are not adequately decomposed if boxes have more than 6 arrows on any side.
Finally, since the concepts may be confused, it is important to emphasize that a logical architecture is not equivalent to a physical computing or technical architecture. Properly executed, a logical architecture does not define the specific computing platform or database structure to be used, and it certainly does not presume to dictate the specific variables or records to be saved in particular databases. However, the logical architecture can provide a template for the physical trappings; the diagrammed flows and constraints of the model give shape to and provide baseline specifications for the types of activity that physical systems must be able to perform. Moreover, although a logical architecture documents work, it should be invariant to specific operational decisions—whether certain data are input at one computer or at twenty or, in the context of the census, whether operations take place in 500 local census offices or 600.
After defining operational flows, the Census Bureau began to render diagrams and logical flows captured in the logical architecture model for the 2000 census using System Architect, a
software package developed by Popkin Software, Inc. This work was done in support of a limited pilot “reengineering exercise.”
6–A.2 Reengineering Exercise
Between August and October 2002, Census Bureau staff performed a logical architecture reengineering exercise, again contracting with the Centech Group, which issued the final results in a report (Centech Group, Inc., 2002c). To keep the exercise manageable, given the Bureau’s newness to the process, reengineering activities were narrowed in scope to focus on the census process steps from data collection through data processing. Candidate areas for retooling were proposed and considered for inclusion in the exercise, which ultimately concentrated on adapting the as-was model of the 2000 census to reflect three potential areas of change:
Control of follow-up procedures: make nonresponse followup assignments dynamically, based on regular updates of response status for all housing units during census conduct and on the progress of individual enumerators;
Centralized data capture and formatting for all response modes: ensure that data provided to headquarters are in uniform format regardless of response type (mail, telephone, Internet); and
Redistribution of “undeliverable as addressed” questionnaires: adapt sorting and screening processes to stream-line handling of questionnaires returned by the U.S. Postal Service, for easier identification of vacant housing units.
Architecturally, adaptation of the as-was 2000 census model to incorporate these operations included many changes in followup information processing as well as the addition of data centers4 to perform processing and formatting tasks.
As part of the exercise, Census Bureau staff developed a list of sixteen principles to guide the logical architecture as the three
selected changes were incorporated in a to-be design. As the contractor report notes, individual architectural principles may, by design, oppose each other—“optimization for one principle may cause non-compliance with another principle” (Centech Group, Inc., 2002c). The hope is to find alternative architectural flows that best balance the demands of the entire set of principles.
In the Bureau’s exercise, two of the architectural principles are “consider the needs of the respondent” and “facilitate counting everyone once, only once, and in the right place.” These principles can be weighed against each other by the degree to which they contribute to overall goals of the enumeration. They can also be used to evaluate competing “to-be” logical architecture models. For instance, a higher number of response modes available to respondents under one plan might be considered evidence in its favor with respect to the “consider the needs of the respondent” principle, but not in its favor with respect to the “once, only once, and in the right place” principle due to the potential for duplication. In the reengineering exercise, Census Bureau staff identified a number of such measures (quantitative and qualitative), which serve as evaluation criteria to compare the baseline as-was model (the 2000 census structure) with the proposed initiatives for the 2010 census.
6–A.3 After the Pilot: Steps Toward an Architecture
Work on the pilot reengineering exercise ended in October 2002, and in January 2003 Census Bureau staff began work on other architectural products. Initial work on an activity model for the 2010 census was completed in October 2003 (U.S. Census Bureau, 2003b).
The panel enthusiastically endorses and supports the Census Bureau’s work on its pilot logical architecture project and strongly urges its continuation.
Completion of a logical architecture model for the 2000 decennial census and of a redesigned model for the 2010 census would be major accomplishments and deserve recognition for
their potential utility. As the contractor’s report notes, the Census Bureau has traditionally put “little emphasis on assessment of the entire ‘end-to-end’ decennial census process” (Centech Group, Inc., 2002a:vii). Hence, the Bureau’s efforts in working toward that complete model are indeed very encouraging. As we noted in our second interim report (National Research Council, 2003a), the Bureau’s selection of modeling products and paradigms have thus far been quite sound.
6–B.1 The Need for Institutional Commitment
The Census Bureau’s emerging plans for the 2010 census are laden with new initiatives and new technologies: a parallel data process in the ACS; more extensive ties to an updated MAF/TIGER system; data capture and transmissions from PCDs; Internet transactions; use of administrative records systems; and in-time collection and archival of information for immediate use in quality control and quality assurance. Each of these activities will require care when incorporated into a logical architecture for the 2010 census.
Constructing an extensively reconfigured logical architecture—and, more importantly, using the resulting model as a template for building the actual physical infrastructure for the 2010 census—is an arduous task. And though the effort of using a completely realized logical architecture to build the physical technical architecture will ultimately reduce operational risk in census conduct, the architecture-building process is not without risks of its own. In terms of general recommendations as the Census Bureau continues with its architecture work, the panel’s suggestions are generally consistent with those of an earlier National Research Council panel on which members of the current panel also served. The earlier panel was charged to advise the Internal Revenue Service on the modernization of its internal systems (National Research Council, 1996), a task similar in certain respects to reconfiguration of the decennial census. Accordingly, our lead recommendations are similar. First, successful reengineering efforts typically require active “champions” at the highest management levels, and the Bureau must seek champions for its architecture construction process.
Second, in order to conduct a successful reengineering process, the Census Bureau will need to bolster its technical expertise in enterprise modeling.
6–B.2 Management “Champions”
The major technological enhancements envisioned under the Census Bureau’s proposed plan for the 2010 census are distinctive not only for their range but also for the manner in which they cut across long-standing organizational divisions within the Bureau. For example, PCDs with GPS receivers are a field data collection tool, and therefore many requirements for the devices will have to be driven by field personnel needs; however, they are of limited use if the positional accuracy of TIGER is not improved. Additionally, computer-assisted questionnaires contained on the devices would benefit from cognitive and usability testing.
The approach of enterprise or logical architecture modeling is to concentrate on function and information flow rather than on preexisting work conditions, though indeed the finished result of modeling may suggest more efficient ways to structure operational workload. However, experience in carrying out similar infrastructure remodelings suggests that it will be vitally important to have strong support at the highest levels of management at the Bureau—in effect, to have influential “champions” of architecture reengineering. These people can effectively convey the importance of the task and encourage all divisions to “buy in” to modeling activities, and can then coordinate and integrate the emerging system.
Recommendation 6.1: In order to achieve the full benefit of architecture modeling, the highest management levels of the Census Bureau should commit to the design and testing of a redesigned logical architecture, so that the most promising model can facilitate the implementation of an efficient technical infrastructure for the 2010 census.
6–B.3 Establishing a System Architect
The development of an adequate business process model for the 2010 census will require a serious effort that must be well staffed and well supported. Although the support and commitment of top-level management are necessary, the panel believes that authority for coordinating and developing that model should be vested in one person—a system architect for the 2010 decennial census. We recommend that such a position be created as soon as possible and that a well-qualified candidate be hired to fill the job.
The system architect should be supported by a full-time staff of reasonable size in order to ensure the expertise necessary for a modeling methodology that is new to the Census Bureau. The system architect and related staff have a primary role as information gatherers, tapping the expertise of other Bureau staff to build and revise architecture models. But another important role is outreach, in a sense—helping to build commitment to architectural principles by informing other parts of the Census Bureau of modeling results and demonstrating their usefulness.
As we will discuss in Section 6-C, a system architect has been appointed to oversee the redesign of the MAF/TIGER database redesign (Objective Two of the MAF/TIGER Enhancements Program). In our assessment, this is a positive development; the database redesign is a critical 2010 census activity, and strong coordination is helpful. We urge that the decennial census architecture and MAF/TIGER database redesign teams not work in isolation from each other; rather, their activities should be coordinated through regular interaction between the appointed system architects. The development of PCDs and other field systems is also a sufficiently major piece of the broader 2010 census architecture that we believe appointment of a subsystem architect could be beneficial.
Recommendation 6.2: To ensure the successful integration of new technologies and techniques in the census process, the Census Bureau should create and staff the position of system architect for the decennial census. The selected candidate should have exper-
tise in modeling business processes, designing large-scale systems, and conducting reengineering activities. The system architect must be given the authority to work with and coordinate efforts among the organizational divisions within the Census Bureau and should serve as a champion of the importance of architecture reengineering at the highest levels of management within the Bureau.
The Census Bureau should also consider designating a subsystem architect for portable computing devices and related field systems, as has already been done for the MAF/TIGER redesign. The efforts of the MAF/TIGER redesign and PCD subsystem architects should be coordinated in partnership with the system architect for the decennial census.
6–B.4 Cautionary Note: Breadth and Difficulty of Task
We wish to make clear our view that it is both important and appropriate that the Census Bureau is pursuing enterprise architecture modeling of the decennial census. Proper execution of this modeling will facilitate the testing and evaluation of alternative system structures in the abstract, adding rigor to the development of census hardware and software support systems and reducing overall operational risk. But, to underscore the recommendations made in the previous sections, we also wish to make it clear that the difficulty of the task should not be underestimated, nor should the importance of championship and commitment to the modeling activity at all levels of the Bureau.
We have reviewed U.S. Census Bureau (2003b), the Bureau’s “Business Architecture 1.0,” as well as the reports from the earlier pilot logical architecture and reengineering studies. We have heard initial plans for the MAF/TIGER database modernization (Section 6-C) and have seen basic operational workflows for PCDs as they will be implemented for nonresponse follow-up in the 2004 census test (Section 5-A.1). That said, given the experience of some panel members in working on architectural reengineering of major systems, our impression—and it is admittedly only an impression—is that the components we have seen make
up a rather small share (perhaps 20 percent or less) of the real architecture required to support the 2010 decennial census. It is also possible that—given the inherently limited nature of pilot activities to date—the products we have seen may reveal only some 20 percent of that 20 percent.
The 2010 decennial census, as a whole, must be viewed as a complex system; integration of that system has often been stated as a primary goal of the reengineered census. It must be recognized that all information systems employed during the decennial census are parts of the overall technical infrastructure that is necessary to support the census. But technical infrastructure “integration” cannot mean just providing a means for moving information back and forth among information subsystems—all systems are “integrated” by that limited definition. Rather, effective integration involves careful analysis of the distribution of functionality among subsystems, their informational interdependencies, and ultimately their geographical replication and distribution; it means careful examination for efficiencies and reduction of redundancies in task.
Our reading of U.S. Census Bureau (2003b) suggests that it represents a good start to building architectural models but one that can be improved with experience. In particular, the diagrams in the document are good at modeling the fine-level detail of various activities but are less good at giving a sense of context and placement within the system as a whole. They are not rendered at the level of detail that is appropriate for effective process reengineering (see guidelines in Section 6-A.1). For instance, a single diagram covering “Infrastructure” (basically, the actual building of the Bureau’s information technology systems) shows 8 main activities; 81 information products are shown to be used or produced by these activities, and 24 supporting tools or systems are identified in the diagram. Of the 8 activities, only one—“Perform Logistics Support”—is decomposed in a finer-level diagram, while other activities such as “Manage Public Communication Program” and (particularly) “Manage Temporary Workforce” should probably be decomposed by another level or two before real reengineering of this model segment can fruitfully proceed (U.S. Census Bureau, 2003b:Tab 18, p. 7, Diagram A2). As it stands now, the diagram is too clut-
tered to convey high-level flows but not decomposed enough to uniquely identify all information flows. [Other similar examples can be found in the draft architectural documents, including U.S. Census Bureau (2003b:Tab 18, p. 9, Diagram A3), which depicts 8 activities, 163 information products, and 52 supporting tools and systems—far too busy as a high-level summary.] As the Census Bureau becomes more familiar with enterprise architecture modeling capabilities, we encourage it to consider a slightly more “top-down” approach in its modeling, revisiting and revising the high-level connections with activities and working down to the finer-activity details.
6–C THE ARCHITECTURE OF CRUCIAL SUBSYSTEMS: THE TIGER REDESIGN
Before we list additional comments and recommendations on the emerging technical infrastructure plans, we think it is useful to devote some attention to a specific, major piece of that larger puzzle.
Objective Two of the MAF/TIGER Enhancements Program is to convert the current database structure underlying the Master Address File and the TIGER geographic database to a modern processing environment. As we discussed in Section 3-A.2, the TIGER database was a considerable technological achievement when it was developed in the mid-1980s in support of the 1990 census. It was created by the Census Bureau using homegrown structures, in large part because commercial database applications available at the time were not well suited to managing the required topological integrity—the complex interrelations of various points, lines, and polygons that make up a national map. In the decades since, database software has made considerable advances while the TIGER database structure has remained largely the same. As a result, TIGER now suffers from archaic restrictions on file access and from difficulties in training staff to use the custom software. As an added complication, the Master Address File and the TIGER database have previously been maintained as separate structures, connected when necessary by geocoding (literally, referencing to find where address entries are located relative to TIGER-defined lines and polygons). The major aims in
modernizing the MAF/TIGER database environment are to make the databases easier to maintain and use, as well as to establish a more rigorous link between the two by housing them in the same data structure.
During the panel’s early interactions with the Census Bureau regarding the MAF/TIGER Enhancements Program, the difficulty of this Objective Two modernization seemed to be consistently underestimated. In those early discussions, the conversion was characterized as a fairly easy step: a new database structure would be identified and new support software would be written (and tested and certified error free). Work on the TIGER database could then be suspended for a period of a few days, information ported over to the new structure, and the task would be done. All experience with such upgrades suggests that such a rosy scenario is misguidedly optimistic. In our more recent discussions, the Census Bureau has, we are pleased to note, moved away from this earlier position and has made progress in defining and articulating the conversion task.
The Census Bureau’s current plans are to complete the database conversion in fiscal 2006. To that end, fiscal 2004 will be particularly critical, with decisions slated to be made on the commercial off-the-shelf software packages to be adopted for use in the project and on requirements and specifications for both the hardware and software scheduled to be completed. Under the Census Bureau’s current plan, fiscal 2005 would involve software development and testing and the installation and testing of necessary computer hardware; fiscal 2006 would involve continued testing and, ultimately, migration of the data.
It is important to note that Objective One—the realignment of TIGER features—is not contingent on the completion of Objective Two. The Census Bureau and its Objective One contractor are intending to maintain updated and realigned TIGER files in the old TIGER database format until Objective Two is complete and the new structure is ready. To our knowledge, there is no expectation that—should the Objective Two database conversion be completed on schedule in 2006—the Objective One contractor would be required to switch formats and begin providing realigned files in the new TIGER format.
We articulated some principal advantages of the TIGER
database conversion in Section 3-B.2 and throughout Chapter 3. Among these are easier potential data interface with state and local governments, given that the current native TIGER database structure is inconsistent with modern geographic information systems (GIS) software. Also, a modern system built on commercial database software makes it easier to recruit and train employees, rather than requiring extensive retraining in an old and site-specific software environment.
However, the project entails risks in several key respects. MAF and TIGER are both so central to census operations—given their use in creating maps, extracting census and survey address frames, and geocoding—that severe risk and cost could be incurred if the conversion is delayed or cannot be completed in a timely fashion. Failure to adequately test the new MAF/TIGER hardware and software—or the lack of adequate time to perform such testing—could easily lead to serious bugsanderrorsthatmayonlybedetectedaftertheconversion is complete and therefore could be very costly. Moreover, even if the Bureau’s timetable for the modernization holds true and the project is completed in fiscal 2006, that will be too late for a fully implemented modern MAF/TIGER system to be incorporated into the 2006 census test. The Census Bureau has already confirmed to one of its oversight authorities that a completed MAF/TIGER redesign and implementation cannot be tested in 2006 (U.S. Department of Commerce, Office of Inspector General, 2003).
Another potentially serious risk lies in the ability of selected commercial off-the-shelf software to meet requirements. The Census Bureau has already indicated that it has chosen to implement the core of the new MAF/TIGER system using Oracle Spatial database tools. Our understanding is that Oracle Spatial is a relatively new addition to Oracle’s database products. In addition, implementation of the new MAF/TIGER will depend heavily on an Oracle add-on product, Topology Manager, which we understand is still in beta testing. While there is no reason at this time to suspect that the Oracle tools will be problematic, given their newness, the risk must certainly be acknowledged. The importance of maintaining the topological integrity of TIGER cannot be overstated.
As is the case with the balance of the MAF/TIGER Enhancements Program, the panel supports the goal of Objective Two. Modernization of the technical underpinnings of the MAF and TIGER databases is essential to the continued usability of these critical data resources, as well as to facilitate more seamless interactions between the Census Bureau and state, local, and tribal government partners. In terms of content, echoing comments in Chapters 3 and 8, we strongly recommend that the Census Bureau carefully consider the data items it includes in the new MAF/TIGER database system rather than simply porting existing data into a new shell. In particular, greater attention should be paid to storage of metadata and changelog information in order to facilitate quick and effective evaluation—so that, for instance, it is possible to reconstruct rather than approximate the history of a particular address as it appears in various update sources or to determine the update history of particular street centerlines in TIGER.
The panel believes that the Census Bureau will be best served by an incremental development approach in redesigning the MAF/TIGER database—that is, that modernization should not be attempted on the entire database structure at once but rather divided into smaller, achievable subtasks. Each subtask would then be carried out—and rigorously tested—in turn. A major goal in approaching the work in this manner is to have available at all times a database structure for MAF and TIGER that is operable and capable of achieving all of its census missions. That system will be a hybrid, gradually evolving into the completed new design as work increments are completed. A hybrid that is continually operable and capable is preferable to the development and implementation en masse of a completely new database structure, which could easily be jeopardized and rendered a complete loss by changes in budget or resources.
As we noted in Section 6-B.3, we are encouraged by the designation of a system architect for the MAF/TIGER database redesign, and strongly urge that this person’s work be done in conjunction with a system architect for the decennial census as a whole. That said, we are concerned about aspects of the relationship between the Census Bureau and its principal Objective Two contractor, since our understanding is that a contractor em-
ployee will serve in the system architect role. In panel members’ experience with projects of this type, it is unwise to assign that much design, management, and decision authority in the hands of a contractor: a system architect should have a deep understanding of the existing and proposed systems, their history, and their interconnections and relations to other parts of the enterprise and, typically, contractors do not have that depth of institutional knowledge. We believe that the Census Bureau should have a Bureau employee paired with contractor personnel for every key task and skill, especially for senior management and decisionmaking roles. Bureau staff should certainly learn from contractors, but they should not be dependent on them in the long term for skills or knowledge. Key decisions should be made by Census Bureau staff.
We note that making the transition from the existing TIGER database structure to Oracle-based systems will necessarily mean a switch to object-oriented programming, design, and testing. In our experience with computer science projects, the first-time adoption of object-oriented programming approaches is a particularly tricky one, fraught with unanticipated difficulties and surprises. Consistent with recommendations we make in the next section regarding software engineering approaches, we strongly encourage the Bureau to develop a small review team of experienced computer scientists and software developers to monitor and facilitate the Bureau’s move into this new development paradigm.
Our final comment on the MAF/TIGER database redesign is that we believe it could present a unique opportunity to build ties to the software development community. Development of the original TIGER in the mid-1980s helped spawn the geographic information systems industry. We believe that the new database structure—and the attendant rewriting of the support software currently used to update TIGER, create maps, and match addresses to geographic coordinates—gives the Census Bureau the chance, at little cost, to again influence the industry if it pursues the redesign with a measure of openness. By publishing the technical description and offering public access to the code (but not, obviously, the complete and Title 13-protected MAF and TIGER data) of its support software, developers would have the chance
to scrutinize, modify, mimic, and improve the software. Those developer contributions could, in turn, be adopted or rejected by the Bureau as it pursues its own development, but at least partners in the broader community will have had the opportunity to participate and contribute.
Recommendation 6.3: As part of the MAF/TIGER redesign, the Census Bureau should consider ways to make its application code for mapping, geocoding, digital exchange, map editing, and other functions openly available in order to facilitate continued ties to and improvement in geographic information systems software applications and to tap the feedback of the broader computer science/software development community.
6–D CHALLENGES IN TRANSITION FROM LOGICAL TO PHYSICAL INFRASTRUCTURE
A business process or logical architecture model will define the activities and the informational interfaces and dependencies required to carry out the 2010 census. Between now and the dress rehearsal in 2008 (with an opportunity to do related testing in 2006), an integrated information system—a physical technical infrastructure—must be put into place to support those activities and satisfy their informational requirements. In preparation for the refinement of the 2010 logical architecture and the transition to a physical infrastructure, we offer some further comments based on past experience with reconfiguring information systems. We raise these points—some of them cautionary in nature—not to deter the Census Bureau from proceeding with architecture modeling efforts but to emphasize the difficulty and importance of the task.
6–D.1 Potential Pitfall: Locking in Physical Infrastructure Too Early
A major danger in making the transition from retooled logical infrastructure to completed physical infrastructure is a “rush
to judgment”—a decision to finalize physical structures too early. Moore’s Law—the adage that computing power tends to double roughly every 18 months—is well known; the rate of change in the computer technology world is indeed astounding. Thus, in settling on the purchase of a particular computer or software package, the Census Bureau runs the same risk faced by millions of personal computer buyers in the past several years: namely, nearly instant obsolescence, as the capabilities of the chosen product are bested shortly thereafter by the next generation of product.
As discussed further in Section 5-A, the selection of PCDs is a particular area where the Census Bureau should remain cognizant of the dangers of deciding on physical form too early. At present, small-scale tests of basic skills are being conducted—navigation using a map displayed on a palm-sized screen, administration of a computerized questionnaire on a small computing device, and so forth. It is important that the Census Bureau continue to conduct prototype testing of this nature in order to get some sense of current capabilities. However, it is likely to be a mistake to draw final conclusions on qualities like desired PCD weight, size, and memory capacity based on early test results. PCDs are relatively simple computing devices with reliable storage and test input facilities. Additional features that may be desired include: a color display with good resolution, a GPS latitude-longitude acquisition device, electronic communication facilities such as a landline modem, and perhaps encryption and decryption capabilities. However, the most important product of early PCD testing is not so much a checklist of desired features but a clearly articulated plan of the workflows and information flows that must be satisfied by PCDs, as they fit into the broader technical infrastructure of the census.
6–D.2 Enterprise Architecture as Learning Tool and Guide to Organizational Change
The end goal of business process or logical architecture reengineering is the production of a smoothly functioning finished physical architecture—an amalgam of people, software, computer systems, and telecommunications systems. Given this
purpose, it is perhaps too easy to cast the effort as purely technical and technological, but this would be a highly inaccurate impression. We strongly encourage the Census Bureau to take full advantage of the exercise of architecture reengineering by viewing the effort not merely as the means to reengineer the Bureau’s computer systems but also as a key information tool to reengineer its own organization and operations.
As indicated above, IDEF0 logical architecture models emphasize function and process independent of labor and departmental boundaries within an organization. Large organizations that develop rigid internal divisions over time can benefit from—and find refreshing—the exercise of stepping back and specifying basic flows of information, without the need to consider which division performs a given function or to which directorate it may report. For the Census Bureau, this logical architecture modeling represents a “new, and very different, perspective on decennial census operations,” one “based on logical groupings of functions [and highlighting] the commonality across similar processes that were developed independently for different operations” (Centech Group, Inc., 2002a:vii). Accordingly, this new approach represents a potential step away from the “compartmentalized thinking” the panel warned against in its letter report (National Research Council, 2001c).
By these comments, we do not suggest the need for wholesale change in the way the Census Bureau is currently structured. What we do suggest is that the Bureau could benefit greatly from the development of a task-based project management approach. The analysis of information flows in architecture models may suggest logical clusterings of activities—or redundancy in activities—and provide clues for how parts of the Bureau may best be mobilized to carry out each task.
6–D.3 Changing Architecture and Methods Simultaneously
Reengineering the Census Bureau’s information systems is a very large and complex project in its own right. However, it is made vastly more difficult because the Bureau will be reengineering a very large and complex integrated system at the same time as it attempts to make substantial changes in the tools
and methods it plans to use—for instance, the migration of the MAF/TIGER system to a commercial off-the-shelf database system, the development (in the ACS) of a complete data system parallel to the census, and the implementation of new response modes. The added difficulty involved in developing new methods simultaneously with new architecture argues ever more strongly for a strong, coordinated system architect for the census, as synchronized efforts will be key to successful implementation.
6–D.4 Improving Software Engineering and Development
The Census Bureau has indicated that, as it pursues the TIGER database modernization, it has also taken on the goal of improving the Bureau’s Capability Maturity Model (CMM) score, a measure of an organization’s maturity in software engineering (Franz, 2002). This is certainly a worthwhile goal, but one that we caution should not be approached casually. In isolation, taken as a single goal, experience suggests that organizations take approximately 2–3 years to move up one CMM level. The fact that the Census Bureau is simultaneously undertaking broader systems engineering and major technology projects in the TIGER redesign and PCD implementation may further extend the time needed to increase the score or to complete the systems projects under development.
As with our other cautionary notes in this chapter, we raise the difficulty of the task not to discourage the Census Bureau from taking action but rather to state that it is more complicated and time-consuming than may be expected. Allowing one of these paths—improving software engineering capability or designing system architecture—to proceed in isolation from the other could be a critical and costly error if time and resources elapse without both contributing jointly to census objectives.
Recommendation 6.4: The Census Bureau should generally improve its software engineering processes and should pursue its goal of raising its Capability Maturity Model score in software development. In particular, the Bureau should focus on available tools and techniques in rigorously developing and
tracking software requirements and specifications. In beginning the task of improving its software practices, though, the Census Bureau must recognize that the effort is a difficult one, requiring high-level commitment in the same manner as architecture reengineering.
On a related note, and consistent with the Bureau’s broader efforts to improve software engineering practices, we urge the Census Bureau to assess its standards and planning assumptions related to hardware and software experience. It is well known in the software development community that it is vastly more expensive to detect bugs and operational errors when hardware and software have been fielded than to catch those bugs during prerelease testing; this lesson has also been learned by other survey organizations as they have moved increasingly into computer-assisted interviewing methods (National Research Council, 2003b). For the census—a technologically intensive survey of grand scale with a strict timeline—catching software errors early is particularly important to smooth operations.
Recommendation 6.5: The Census Bureau should evaluate and improve its protocols for hardware and software testing, drawing on expertise from the computer science and software development communities. Rigorous hardware and software testing should be factored into census operational schedules, in addition to the field testing performed in the 2006 proof-of-concept test, the 2008 dress rehearsal, or such other formal census tests as may arise.