The Master Trace Sample

BASIC STRUCTURE

The master trace sample is a probability sample of addresses that will be selected during the various stages of formation of the final 2000 master address file. The plan is to keep all of the relevant information on census processes for each address and its occupants in the sample by retaining the full history of values for each data field as the census progresses, in addition to other information related to field enumeration. Thus the information will include both the data collected from the respondents and data related to census operations, such as number of follow-up attempts, thereby showing how the various stages of data collection, processing, and treatment work. The resulting database is intended to be used to examine a wide variety of questions concerning census operations, including evaluation of potential alternatives to current census procedures for use in planning for 2010.3

The master trace sample will have two main components: a 0.5 percent systematic sample of addresses and all addresses from a sample (of unknown rate) of block clusters in, and not in, the Accuracy and Coverage Evaluation (ACE) program.4 The sample of block clusters will facilitate

3

The current plans for the master trace sample are not fully documented; therefore, it is possible that some of the panel's suggestions may already be included in the design.

4

The ACE survey is the 2000 census version of the postenumeration survey used in the 1990 census.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report The Master Trace Sample BASIC STRUCTURE The master trace sample is a probability sample of addresses that will be selected during the various stages of formation of the final 2000 master address file. The plan is to keep all of the relevant information on census processes for each address and its occupants in the sample by retaining the full history of values for each data field as the census progresses, in addition to other information related to field enumeration. Thus the information will include both the data collected from the respondents and data related to census operations, such as number of follow-up attempts, thereby showing how the various stages of data collection, processing, and treatment work. The resulting database is intended to be used to examine a wide variety of questions concerning census operations, including evaluation of potential alternatives to current census procedures for use in planning for 2010.3 The master trace sample will have two main components: a 0.5 percent systematic sample of addresses and all addresses from a sample (of unknown rate) of block clusters in, and not in, the Accuracy and Coverage Evaluation (ACE) program.4 The sample of block clusters will facilitate 3 The current plans for the master trace sample are not fully documented; therefore, it is possible that some of the panel's suggestions may already be included in the design. 4 The ACE survey is the 2000 census version of the postenumeration survey used in the 1990 census.

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report analysis of effects that occur at a local level, such as those deriving from the local administration of the census (mainly nonresponse follow-up, coverage improvement programs, and other local field work, that occur at a local level). The 0.5 percent systematic sample of 600,000 addresses will contain roughly 500,000 housing units that receive the short-form questionnaire and 100,000 housing units receiving the long-form questionnaire (ignoring “Be Counted” forms, which are all short-form questionnaires). Of the 100,000 long forms, 25,000 will be subsampled for inclusion in the Content Reinterview Survey (a sample of housing units, not in ACE block clusters, originally receiving the long form, that are asked to fill out the long form a second time to measure response variance). Of the systematic sample, 0.3 percent (1,800 addresses) are expected to fall in ACE block clusters (0.3 percent being the overall ACE sampling rate). In the current plans (see Bureau of the Census, 1999), the master trace sample selection has five separate samples: a sample of addresses in the mailback universe on the Decennial Master Address File (DMAF); a sample of addresses from update/leave areas; a sample of addresses from list/enumerate areas; a sample of addresses discovered during various field operations, especially nonresponse follow-up; and a sample of addresses generated from other sources, such as “Be Counted” forms. The information collected for these addresses will include input from five sources: files used to track and control census operations; files generated as a result of data capture and processing; files generated as a result of final editing and allocation (filling in of missing responses) of the response data; files from Accuracy and Coverage Evaluation responses; and data from the Content Reinterview Sample. A sample of questionnaires will be double-keyed, with differences reconciled, to measure errors due to the data capture process. The information collected in the master trace sample will be linked to two additional databases that will provide information on field operations. The master trace sample database will make use of enumerator-level data extracts from the PAMS/ADAMS5 and OCS 2000 databases; the former manages the compensation of enumerators and other field staff; the latter monitors the completeness of field enumeration at the level of the local 5 PAMS is the preappointment management system; ADAMS is the automated decennial administrative management system.

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report census office. The PAMS/ADAMS database is planned to be linked to the master trace sample database through use of the enumerator's social security number. This link will facilitate research on the cost and quality of the nonresponse follow-up operation at various stages (initial, closeout, final attempt) and may show whether there is any relationship between enumerator characteristics (e.g., enumerator productivity or experience, wage rate, local turnover rate) and data accuracy. Another link to an administrative records database will permit comparisons of census data with administrative records data for the master trace sample addresses; in this role, this administrative records database will be considered a component of the master trace sample database. The Panel on Decennial Census Methodology (National Research Council, 1988) recommended that the Census Bureau collect a master trace sample in conjunction with the 1990 census. The Panel on Alternative Census Methodologies (National Research Council, 1999) repeated this recommendation for application to the 2000 census. We believe that the master trace sample database has the potential to be the single most useful source of information for assessing alternative designs for the 2010 census. This is due to its integration, at the individual household level, of information from all key census processes, so that the interactions of various factors can be examined. Therefore, we strongly endorse the recommendation of the Panel on Alternative Census Methodologies for the 2000 census, and we note that the current design is much broader than described by the previous panels, especially with regard to the plans to incorporate information on enumerators and from administrative records. Since even more limited versions of a master trace sample are difficult to implement, the Census Bureau deserves strong praise for its efforts toward making this version of the master trace sample a reality. In addition, the planned sample size will greatly facilitate detailed analysis of factors underlying census accuracy. Because of its breadth, it would be useful if the master trace sample could be overrepresented in various postcensus studies, as is being done with the Content Reinterview Survey. However, some relevant processes of the 2000 census may not be covered by the current plans for a master trace sample database, and these processes could be represented through use of relatively modest additions to the current plans. Recommendation: The current plans for the master trace sample database should be augmented so that data for all key steps in the pro-

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report cess—starting with address assignment and ending with a final disposition for each case—are included in the master trace sample data base. In particular, the panel recommends that the master trace sample be modified to include information on “Be Counted” forms that cannot be geocoded, on the individual components of the primary selection algorithm, 6 on specific aspects of ACE operations and processing (e.g., the effects of imputation routines used), and any other census processes that are not currently represented, possibly through not being linked to an address on the DMAF. Some of these other processes that may not be represented include coverage improvement activities and components of telephone questionnaire assistance. “Be Counted” forms (census questionnaires that are widely distributed in public places) must have an address that can be located in census geography (i.e., geocoded) or they are rejected and are not included in the census. It would be useful to be able to assess the process of determining which addresses can and cannot be geocoded and the types of households or individuals that are on forms that cannot be geocoded. The primary selection algorithm is used to determine which information relative to a census address is the information to be used for that address when an address has multiple submissions, and it is also used to identify duplicate entries. While the results of the primary selection algorithm are included in the plans for the master trace sample, it is not clear that the operations of the components of this algorithm will be able to be examined so that improvements to the algorithm can be identified. Finally, by ACE operations, we mean the initial ACE interview, the ACE follow-up interview, various stages of computer and clerical matching, and imputations for nonresponse and unresolved matches. OTHER MODIFICATIONS In addition to its major recommendation for augmenting the master trace sample, the panel proposes several other modifications to the plans for the sample. 6 The results of the primary selection algorithm are included in the master trace sample through the differences between the decennial response file 1 (DRF1), which contains every response record received, and the decennial response file 2 (DRF2), which contain addresses with assigned identification numbers and links all data records into one “form.”

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report Use of a Two-Stage Sample Design Methods of statistical inference that make full use of two sampling plans, one a systematic sample at the level of the individual household and one a sample of block clusters, are not straightforward. Analyses that would make full use of both samples would be greatly facilitated through use of a two-stage sample design: first sampling block clusters and then sampling individual housing units from within a block cluster. Also, a change to a two-stage sample design could facilitate the use of hierarchical models that incorporate random effects associated with cluster membership (see, e.g., Goldstein, 1995, Sedransk et al., 1997), which could provide a more thorough understanding of factors that affect the data. While the panel understands that time is short, the Census Bureau should consider implementing this design change. Oversampling ACE Blocks and Stratification The panel has three suggestions concerning the design of the master trace sample. The current plans are to apply proportional sampling to both ACE and non-ACE block clusters. Given the additional information collected in ACE block clusters, there is much to be gained from oversampling ACE block clusters in the master trace sample. Also, some block clusters represent more difficult challenges to census processes. It would be useful to stratify the master trace sample of block clusters on some measures related to enumeration difficulties and then to oversample those strata that represent greater difficulties. This stratification would provide a larger sample size for areas for which they are probably needed. In addition, given the possibly greater nonresponse and undercoverage for list/enumerate and update/leave areas and other difficulties with “Be Counted” forms, the Census Bureau should consider oversampling of addresses for these areas and for people returning “Be Counted” forms. As in the case of the previous proposal, the panel recognizes that the data collection will begin very soon, so that this change may no longer be feasible. Improving Information on Enumeration Important information on the number and timing of enumeration attempts at each address will be collected in the master trace sample. This information could be used to examine important issues concerning

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report nonresponse follow-up, such as the benefits of sampling for nonresponse follow-up.7 This information could also help determine the optimal number of attempts at enumeration before accepting proxy information, and it might suggest strategies for selecting optimal times and days for field enumeration. If there are differences in accuracy associated with the number of attempts (or the mode, whether field, telephone, or Internet), they might have important implications for changes in census processes. If this enumeration information is generally missing or is collected in a haphazard manner, analysis will be substantially limited. The panel is aware that some field staff resist this request for enumeration information as an additional burden on the already difficult job of nonresponse follow-up. If this attitude is widespread, the accuracy of the data will suffer. This attitude may be most prevalent in those areas in which a full understanding of census processes is most crucial, the areas in which the mailout-mailback return rate is lowest. While the job of nonresponse follow-up enumeration is difficult, with limited opportunities for training and supervision, the panel's proposal is for a very small amount of information for each enumerator. Given the high value of this information, the panel strongly urges that the Census Bureau examine methods for encouraging field offices to give this a high priority. A memorandum to each local census office explaining why this is being added to the list of enumerator responsibilities might increase the response rate and overall accuracy for the data.8 Unfortunately, this and related proposals may not be able to be fully considered at this stage of planning for the 2000 census. If so, every effort should be made to give the collection of this information by enumerators a much higher priority for the 2010 census, making its collection a routine part of each enumerator's duties. 7 Although this approach is currently not permitted by Title XIII, it might be possible if the law is amended before the 2010 census. 8 However, unusual attention to these data might result in enumerators providing falsified responses consistent with a “perfect” performance. To avoid this effect, the freason for collecting this information should be made clear.

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report Retaining the Master Trace Sample Input Files The retention of input files for use in the master trace sample database needs to be given a high priority. The immediate objective must be to collect and preserve all of the information that will be put into this database. The panel believes that the Census Bureau agrees with this objective, so we offer this proposal mainly to provide emphasis and reduce the chances of an omission. It would be natural during the intense activity that is typical of decennial censuses to overwrite or erase a data file due to the need to collect more contemporaneous information for monitoring some census process. To avoid this possibility, the panel suggests more emphasis and planning be given to the need to capture perishable files as soon as their immediate utility ends. Setting Priorities for Structure, Access, and Research The Census Bureau has postponed final consideration of the database layout for the master trace sample database in favor of final planning for the data collection. The panel thinks that this is sensible. A database structure that can accommodate all of the planned links will be a challenge, though we are confident that a format will be identified that will be satisfactory. Toward that end, the structure that is ultimately selected should facilitate access, that is, it should be designed to facilitate the most likely analyses. Efforts should also be made to keep the structure as uncomplicated as possible so that the database is available and accessible by Bureau staff for analysis as soon as possible to help guide 2010 planning. To help meet this goal, the Census Bureau needs to be specific about what needs in order of priority will be met by the master trace sample and then operationalize those needs through the database design. An assessment of priorities is a preliminary step for the master trace sample database that will help in many areas, such as in deciding which additional inputs should be included. The panel hopes to be able to consider the issue of needs in its future work. Individual data files feeding into the master trace sample database should be made immediately available inside the Census Bureau for “univariate” analysis for help in evaluating the 2000 census. Finally, creation of a public-use master trace sample data file should be explored, possibly addressing confidentiality concerns through a variety of masking techniques. One advantage of doing this is that it would effectively extend the

OCR for page 11
DESIGNING THE 2010 CENSUS: First Interim Report Census Bureau's analytic capabilities. If the confidentiality concerns are nonstandard, the creation of a public-use file should be given low priority. Increasing Resources The panel stresses that the master trace sample database may be the single most useful source of information for assessing alternative designs for the 2010 census. The panel believes that insufficient resources have been allocated to this important and difficult undertaking. The group within the Census Bureau that is currently responsible for collecting the data files to support the master trace sample database and designing the database is relatively small, and most if not all of them have other duties. To the extent possible, additional people should be allocated to this effort, or, barring that, more of the people currently charged with this work should have it as their sole responsibility. Collecting Information for a Model of Total Census Error The panel believes that the Census Bureau's current plans are to make use of a model of total census error to assist with census evaluations and decisions, including the decision on whether and how to make use of the 2010 analog to ACE to support adjustment of census counts. Without a priori support for or use of any particular version of this model, evidence is needed to show that the plans for the master trace sample, augmented by the planned evaluation studies for the 2000 census, contain information that will support reliable estimation of all components of error for the 2000 census.