Auxiliary Data Systems
The Census Bureau has created several data systems for the 2000 census to monitor and control various aspects of address list formation, mailout-mailback, nonresponse follow-up, data capture, and the compensation of enumerators and other field employees. While the primary goals of these systems are monitoring and control, they also can be used to examine census processes, and so they are useful for identifying areas in which alternative methods might have advantages for the 2010 census. The panel examined these systems to determine whether data that were being collected, often on a temporary basis, could be retained for later analysis.
The panel's proposals for retention of auxiliary census processing data focus on additions to the master trace sample. The richness of that planned database makes it a preferred environment for analysis. Therefore, rather than suggest retention of, say, information on the primary selection algorithm, the panel has suggested that information from relevant census processes be folded into the master trace sample database. The panel has no other suggestions concerning retention of census data; however, the panel does have suggestions on how the data systems themselves might be modified with respect to improvements in satisfying their primary purpose.
The panel was briefed on nine main data systems:
-
the decennial master address file (DMAF), which includes every master address file residential address;
-
the decennial response file 1 (DRF1);
-
the decennial response file 2 (DRF2);
-
the census unedited file (CUF), which is essentially a composite of information from the DMAF and the DRF2 files;
-
the census edited file (CEF), which is also essentially a composite of information from the DMAF and the DRF2 files;
-
the operational control system 2000 (OCS 2000), which monitors field data collection activities, including nonresponse follow-up, in the regional census centers and the local census offices;
-
the management information system (MIS 2000), which tracks on a daily basis the status, cost, and timing of each census operation in terms of the anticipated completion and budget in the master activities schedule;
-
the preappointment management system/automated decennial administrative management system (PAMS/ADAMS), which covers enumerator hiring and compensation; and
-
the data capture system (DCS 2000), which monitors the receipt of census forms and monitors which questionnaires have been successfully processed in one of the data capture centers.
In addition, summaries from the management information system will feed into a data warehouse that Census Bureau staff will use to adjust resource allocation and make other day-to-day decisions.
These systems represent an impressive collection of software specifically developed for running the 2000 census. Many of them are updates from earlier versions used for the 1990 census, and most of these systems were successfully tested in the dress rehearsal.
The panel has two proposals for future censuses. First, the Census Bureau has decided to discard the visual images of the census questionnaires. 9 Even for the entire census, but certainly for the master trace sample,
9 |
One of the first steps in data capture for the 2000 census will be to “photograph” each census form, i.e., obtain a visual image. (The physical form will be destroyed early in the process.) Optical mark and optical character recognition operates from this visual image of the census form. Once the data are captured from these visual images, the visual images will also be discarded. There are groups that are concerned over the plan not to save these electronic visual image files. They argue that not saving them would be a major change in policy from the past (see Federal Register, 1999). The panel did not discuss this aspect of the issue for the current report. |
the visual images represent a relatively modest amount of data that could provide important information in planning the 2010 census. They could provide information about the technology needed for data capture, errors that are made in data capture, and issues concerning ease of response, such as problems with the use of foreign language questionnaires. Therefore, the Census Bureau should reconsider its decision, particularly for the master trace sample cases.
Second, the software systems were developed on site by Census Bureau staff. This approach was certainly the correct decision in the 1980s given the unusually large size of census data sets relative to the typical commercial applications at that time. In addition, there are obvious benefits to using custom software that is specifically targeted to the various census applications. Also, custom software provides the Census Bureau with greater understanding of the workings of the software, facilitating modifications and maintenance. However, continuing advances in technology have made large data sets more common with the associated development of commercial products, and there are considerable benefits to making use of more standardized commercial software to the extent possible. The greatest disadvantage of custom software is that it makes the Census Bureau too dependent on a small number of employees who fully understand it. Maintaining custom software also requires that the Census Bureau retain expertise in both using and modifying the software. It is difficult to hire people for either task because of the lack of outside expertise (and because the skills then acquired are not easily transferred to other jobs). Therefore, the panel proposes that the Census Bureau reevaluate its decision to use custom software for the 2010 census.