Click for next page ( 179


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 178
6 Alternatives for Long-Form Data Collection Early chapters of this report primarily concern new methodologies for pro- ducing the census counts the official population totals used in reapportionment, redistricting, and the allocation of federal program funds. Attention in this chap- ter is given to alternative methods for collecting the detailed sociodemographic data that since 1960 have been gathered by distributing a census "long form" to a national sample of households. Sampling was first used to gather additional information (or content) in 1940; in preceding decades, all questions were asked of all households in the census (Goldfield, 1992~. In 1990, as in other recent censuses, the long form contained questions about education, occupation, income, journey to work, ethnicity, and housing. The long form is completed by a sample of census respondents; in 1990, the national sampling rate was one-sixth of all households, although the fraction was as large as one-half in some small jurisdictions. The long form includes all content that is contained in the short form completed by all other households. The 1990 long form contained a total of 33 questions for each household member and 26 questions about the housing unit; for the short form, the respective numbers were 7 questions per household member and 7 housing questions. The long form thus requires considerably more information from a given household than does the short form. Some advocates of census reform have questioned the collection of this additional information as part of the decennial census. One current argument is that the accuracy of the decennial population figures would improve and census costs would decline if long-form data collection were eliminated, reduced, or displaced in time from the effort to enumerate the population. Others have 178

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 179 suggested that some of the data gathered in the 1990 census could be collected through alternate methods, such as sample surveys or tabulations of administra- tive records (see Chapter 5), and made available for use in a more timely manner (see, e.g., Sawyer, 1993; U.S. House of Representatives, 1993~. Still others have challenged the quality of data collected on the long form, noting the high rates (relative to the decennial short form) at which this information is gathered indi- rectly, either by imputation or from someone outside the household, particularly for minority populations (see, e.g., Ericksen et al., 1991~. These four criticisms of the decennial long form-effect on population cov- erage, cost of collection, quality of data, and infrequency of supply are impor- tant factors to weigh in evaluating alternative data collection vehicles. (We interpret the call for more timely data to be a request for more frequent supply, and we reserve use of the word timeliness to mean the speed with which results are published after data collection has been completed.) We discuss these consid- erations further in subsequent sections. First, however, we pause to briefly com- ment on the merits of the first two arguments that the long form has negative effects on census coverage and cost. A comparative analysis of mail return rates from the 1990 census suggests that dropping the decennial long form would yield a trivial improvement in overall census coverage (Keeley, 19931. Further study would be useful, but, at present, there is also no evidence that differential coverage would be reduced by eliminating the long form. The presence of the long form therefore does not appear to diminish to any meaningful degree the accuracy of the decennial popu- lation totals used for reapportionment, redistricting, and resource allocation. Adoption of the one-number concept and integrated coverage measurement in the 2000 census would, in any case, compensate for any small impact the long form might have on coverage. The potential cost savings associated with eliminating the long form from the decennial census should be weighed against the costs of alternative methods of gathering comparable information. The decennial long form meets a wide range of user needs, often mandated by law, for information on the characteristics of small geographic areas and subpopulations (see Bureau of the Census, 1994a, for a review of the legal status of census content). Such information can be used, for example, in allocating federal program funds, and distributing a longer census form to a national sample of households has historically been regarded as a cost- effective means of providing the required data. Two contradictory pressures in census reform-one to obtain better informa- tion more frequently and the other to reduce the respondent burden of the decen- nial long form have lent support to two proposed alternatives for long-form data collection: continuous measurement and matrix sampling. In the sections that follow, we address these two proposals under which the long-form data burden might be reduced or eliminated from the census enumeration while the means to collect comparable sample data for small areas and subpopulations is

OCR for page 178
180 COUNTING PEOPLE IN THE INFORMATION AGE retained. The Census Bureau is conducting ongoing research on these alternative methods, and we review the results of that work to date. CONTINUOUS MEASUREMENT There have been proposals in the past that collection of small-area sample data, such as those typically provided by the census, be conducted throughout the decade, rather than once every 10 years. Kish (1981, 1990), Horvitz (1986), and Herriott et al. (1989b) have proposed a variety of data collection schemes that involve this key concept of extending data collection in a more or less continuous fashion. As a part of its planning activities for the 2000 census, the Census Bureau has included an evaluation of this type of process and, on the basis of preliminary research, has indicated a commitment to investigating fully the feasi- bility of continuous measurement as part of the 2000 census development process (Bureau of the Census, 1993b). Recently, Alexander (1993J proposed a way in which a continuous measurement program might be instituted in conjunction with the 2000 census. These proposals for continuous measurement share two main features: (1) virtually continuous data collection operations instead of starting and stopping every 10 years, with presumed benefits for data quality through the maintenance of a permanent enumeration staff and improvement through continuous experi- ence and (2) more current small-area sample data throughout the decade (except for the smallest geographic units, for which updates of small-area sample data might be based on a 5- or 10-year moving average of sample data). A distinguishing feature of the plan currently being developed and evaluated by the Census Bureau is that it calls for continuous measurement to be conducted in connection with a complete enumeration of the population at one point every decade, whereas most earlier proposals would replace the traditional census com- pletely with a continuous measurement program. The present proposal assumes that a decennial enumeration is required in order to meet constitutional require- ments. This view is supported by a legal review prepared by the Congressional Research Service (Lee, 1993) and subsequent work by the Panel on Census Requirements (Committee on National Statistics, 1993a). Thus, the objective in proposing continuous measurement in these circumstances is twofold: (1) to reduce the cost and burden of the decade enumeration by removing the need to collect small-area sample data as part of the decennial census and (2) to improve the frequency, timeliness, and quality of small-area sample data. In the remainder of this section, we describe the Census Bureau's research of continuous measurement and review progress in the development of a prototype system. We examine methodological and operational issues associated with implementation, and we discuss other considerations such as accuracy, cost, acceptability to census data users, and effect on the decennial enumeration that are important in evaluating the continuous measurement proposal.

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION Overview of the Census Bureau's Continuous Measurement Program 181 Alexander (1993) identified three goals for the Census Bureau's research program on continuous measurement: 1. Determine the basic prototype design for data collection and estimation; 2. Estimate the cost of the operation, or at least give useful upper and lower bounds; and 3. Make general statements about the quality and utility of the data (includ- ing coverage and content) from the continuous measurement system compared with alternative systems. Alexander goes on to identify some decisions that were taken initially in the research process concerning the form of a continuous measurement design: . The continuous measurement prototype will include a complete year-zero (i.e., end-decade) enumeration for reapportionment and redistricting, rather than a "rolling enumeration." The frame for intercensal samples will be the Master Address File (MAF). The prototype assumes implementation in time to replace sample data for the 2000 census, rather than waiting for 2010. The development plan is based on the assumption that a decision as to whether to replace sample data from the 2000 census with a continuous measurement operation will be made in 1997. The continuous measurement prototype must produce data for most 1990 long-form characteristics for small areas-census tracts/block-numbering areas and block groups with more or less the same reliability requirements as the 1990 long-form sample. The prototype assumes direct sample-based estimates for small areas, rather than relying on model-based indirect or synthetic estimates or administra- tive records. The basic small-area (tract/block-numbering area or block group) esti- mates will be rolling accumulations (moving averages) of 5 years of data. Cur- rent plans call for a 3-year moving average for 1999-2001 (with a corresponding increase in the monthly sample size). nation. Data collection will be spread evenly across the year and across the The survey will be a separate survey rather than an expansion of any current federal household survey. The design will include a combination of mail, telephone, and personal VlSlt interviews. The Census Bureau has developed a schedule for implementing a research program on the feasibility of conducting a continuous measurement operation. This plan is shown in Table 6.1, which is extracted from Alexander (1994~. It indicates an expanding effort at developing a full system of continuous measure

OCR for page 178
82 COUNTING PEOPLE IN THE INFORMATION AGE TABLE 6.1 Accelerated MAF-Based Continuous Measurement: Data Collection Activities Fiscal Year Data Collection Activity Objectives 1994 Research, planning, outreach only RDD test with 2,000/month total in 3-4 sites, starting November 1994. Convert to split-sample questionnaire test in July 1995. Small mail pretest. 1996 Address-list-based test with 4,000/month total in 4 sites, starting October 1995. Win over a few key federal users Contact nonfederal users Remove feasibility doubts Get commitment to $10 million for 1996 testing Get demonstration file of cumulative estimates Test alternative versions Get user acceptance of testing/decision process . Get commitment to fiscal 1997 funding and decision process Better demonstration file of cumulative estimates Develop/test field procedures Get user input and decide whether to proceed further . Decide to eliminate 2000 long form if fiscal 1997 results successful Get commitment to fiscal 1998- 2002 funding conditional on ~ . . Decision process 1997 MAF-based "development survey" for Demonstrate actual procedures congressional-district-level estimates, Produce actual high-level full speed in January 1997. Rural sample estimates clustered in PSU. Measure coverage, quality of estimates Close scrutiny of 1995, 1996, 1997 data by all users Final decision to drop 2000 long form in December 1997 1998 Expand MAF-based sample size; change Final content determination procedures and questionnaire to fix Final procedures problems found in fiscal year 1997. Further evaluation of quality Better rural spread. More actual high-level estimates 1999 Full MAF-based system. Complete rural Collect small-area data to replace spread. 2000 long form Note: MAF = Master Address File; RDD = random-digit dialing; PSU = primary sampling unit. SOURCE: Based on materials provided to the panel by C. Alexander, Bureau of the Census.

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 183 ment over the next six years. This plan is characterized by three features: (1) a steadily increasing level of resources over time, from a relatively modest research effort in 1994 to the full system (as currently envisaged) in 1999; (2) a series of decision points at which the results of the research to date and other develop- ments are evaluated and a decision is made whether to proceed with plans for a continuous measurement operation in place of long-form data collection in con- nection with the 2000 census; (3) parallel efforts at developing data collection capabilities, the estimation system, reliable cost estimates, and user needs. Expansion of Research and Development Staff at the Census Bureau have been pursuing research and development of a prototype system for continuous data collection. Proposals for the prototype have undergone several revisions, particularly with regard to such characteristics as sample size and date of initiation. The current proposal (Alexander, 1994) involves a random-digit dialing (RDD) survey starting in November 1994 (fiscal 1995) at three to four geographic sites, totaling approximately 2,000 households per month. The sites may possibly be the same sites as those for the 1995 census test, but this has not been determined. The Census Bureau plans to use the data from the survey to produce 6- or 9- month cumulations, in the form of data tapes for prospective users, in mid-1995. These data tapes would have the same format as that proposed for the mature continuous measurement program in order to provide users with a sense of what estimates and data products would be available. In October 1995, the RDD survey would be replaced by an address-list- based test in four sites involving a total of about 4,000 households per month. The sample size corresponds to the level of sampling that these sites would receive under the national system as currently proposed. Experience from actual field operations would be used to refine cost estimates. The Census Bureau has begun to work with census data users at some federal agencies to study the implications of continuous measurement. In particular, the Census Bureau has established contact with officials at the Department of Trans- portation and the Department of Housing and Urban Development. Data users in state and local government and in the private sector will also be consulted; such efforts will intensify when prototype data products become available. Program Milestones Three key decision points for the continuous measurement program are: 1. October 1995: The Census Bureau could stop before increasing to the proposed level of effort for example, if cost estimates increase substantially- but the program does not call for a large investment at this stage.

OCR for page 178
84 COUNTING PEOPLE IN THE INFORMATION AGE 2. October 1996: Extensive data collection would begin in order to provide estimates for congressional districts and to work on remaining problems. Users will have had time to consider the demonstration files from the fiscal 1995 program and some of the fiscal 1996 results. A conditional decision would be made to drop the long form from the 2000 census if the fiscal 1997 program is successful. 3. September 1997: A final decision would be made about whether to retain the long form for the 2000 census or replace it with the continuous measurement program. One possible complication in reaching this decision point is that, in early 1997, the Census Bureau must provide, for congressional review and ap- proval, a list of topics to be included in the 2000 census. It is unclear whether the means of collecting data on these topics must also be determined by early 1997. If continuous measurement is not implemented for the 2000 census, then the Census Bureau would try to maintain program activities, such as updating the Master Address File, using current survey interviewers. The MAF could serve as a frame for household surveys, such as the Current Population Survey. We note that a decision to use a long form as part of the 2000 census does not necessarily imply that continuous measurement would not be implemented during the next decade. If continuous measurement is implemented in place of a long form in 2000, then the sample size of the monthly survey would increase from approximately 80,000 households in 1997 to 100,000 in 1998 and, finally, to 325,000 in 1999. In 1997, estimates would be available for states and other large areas. A system- atic (and geographically sequenced) sample of households within census blocks would begin in 1999. Current Initiatives The Census Bureau has now established a project team to carry out the research on the practical issues in developing a system of estimates obtained from a continuous measurement program. This group will be responsible for the development of the RDD telephone survey at the four test sites and the subse- quent transition to an address-list-based survey. The group will examine various aspects of questionnaire and content issues, working with interested parties and advisory panels, evaluating cost components, developing operational systems, developing the survey design and estimation procedure, and evaluating the im- pact on current household survey programs throughout the federal statistical system. The panel is impressed by the breadth encompassed in this early thinking in developing the plans to examine the desirability and feasibility of a continuous measurement system. We recognize the need to develop the various aspects of the research effort in parallel. This is essential if an effective continuous mea- surement system is to be developed. The evaluation of each of the aspects of the

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 185 system depends on the research efforts in the other areas. To study whether user needs will be met effectively by continuous measurement, we believe that it is imperative that simulated products be provided to users at an early stage in the investigation, and that user comments be solicited to guide the process (see below). These user responses may well influence the requirements for data collection and estimation. Preliminary data collection and estimation procedures must also be in place to develop the simulated products. Thus, all features of the system must be developed in a synchronized fashion. We emphasize the need to maintain a strong commitment to the principle of synchronized development over time. Some aspects of the research are likely to be easier to develop, to have a clearer path, and to be under the Census Bureau's control to a greater extent than others. A situation could quite conceivably develop in which research into the data collection methods is progressing rapidly and reaching an advanced stage with many decisions finalized, whereas essential research into user needs is lagging and hence not informing appropriately the decisions on data collection methods. Close monitoring will be needed to ensure that uniform and timely progress is made on all fronts. Recommendation 6.1: The panel endorses further research and evalua tion of a continuous measurement program. In conducting this work, the Census Bureau should establish, and continually reinforce, a com- mitment to simultaneous research and development of cost estimation, data collection and processing methods, estimation procedures, and user needs. Key Operational Features The Census Bureau's current prototype combines three main components: (1) continuous updating of a national MAF; (2) a large periodic sample survey to collect intercensal data, using the MAF as a frame; and (3) an integrated esti- mates program, which produces estimates from the periodic sample survey, using other data sources such as the decennial census, the master address list, and administrative records to enhance the estimates. The data and estimates from the continuous measurement program would in turn be used to enhance estimates from other national household surveys and the demographic estimates program. In current plans, the intercensal long form (ILF) would sample about 250,000 addresses nationally each month, drawn from all geographic areas. Different addresses would be included each month. The initial sample would be mailed a data collection form. A subsample (possibly all) of the mail nonreturns would be followed up by telephone whenever possible. A further subsample of those who cannot be contacted successfully by telephone (for whatever reason) would be followed up in person. Annual average estimates would be produced for large geographic areas in

OCR for page 178
86 COUNTING PEOPLE IN THE INFORMATION AGE which the population exceeds 250,000~.g., states, large metropolitan statistical areas, and groups of counties. For small areas, such as tracts and block groups, a moving 5-year average would be produced annually using data from the previous 5 years. The survey frame of mailing addresses would be updated quarterly using Postal Service mail delivery lists and possibly lists from local governments. City-style addresses would be geocoded to the block level. For areas without city-style addresses (rural delivery routes, post office boxes, general delivery, physical description only), additional efforts would be needed (see Alexander, 1993~. The extent to which these efforts are needed depends on the extent to which city-style addresses are prevalent in the MAP in 1997. The development of cost-effective procedures for handling such addresses is an important step in establishing the viability of a monthly national ILF survey. Once the data have been collected, survey weights would be applied. Ini- tially, each household record would be weighted by the inverse of the selection probability. This weight would depend on whether the data for the unit were collected by mail, telephone, or personal visit. These base weights would then be adjusted for nonresponse, as a means of accounting in the estimation system for those nonvacant households that fail to provide data through any of the mail, telephone, or personal-visit modes. For cases of missing item-level data from respondents, imputation would be used to compensate for the missing data. Fi- nally, some type of posts/ratification would be used to ensure that the survey weights agree, at some level of geography, with accurate independent estimates of the population size. Because the exact nature and benefit of the post- stratification procedures are unclear at present, the Census Bureau is evaluating the reliability of the ILF estimates under the conservative assumption that there would be no benefit from such a procedure. User Support for Continuous Measurement Data Products Although a change to continuous measurement requires considerable meth- odological and operational changes, the major question is how well it meets the requirements of small-area data users. Continuous measurement would provide users with more current data (for all but the smallest geographic areas) and with greater frequency than data collected every 10 years. More frequent estimates could be especially valuable to those making decisions about the distribution of funds or who are concerned with measuring characteristics of populations that change considerably during a 10-year period. However, the notion of replacing single-point-in-time data for small areas with 5-year rolling averages may be troublesome for some users. Although some social and economic statistics are collected over long periods of time, most census long-form data users are likely to be unfamiliar or uncomfortable with the concept of moving averages and may have to reexamine their use of the data.

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 187 The Census Bureau should therefore proceed vigorously with an outreach program to explain cumulative estimates and to discuss possible applications with current and prospective data users. As noted above, the Census Bureau has begun to work with census users at some federal agencies to acquaint them with the type of data product that would be produced by a continuous measurement program. The Bureau of Transportation Statistics is arranging a small workshop on the potential value of continuous measurement data for transportation plan- ning. Discussions with data users in state and local governments and with private sector users are also planned. Assessing the potential demand for an innovative data collection system poses a strategic challenge: it may be difficult to interest prospective data users without a data product. Continuous measurement is new and different. Skepti- cism about its value among long-form data users is understandable and should be anticipated. It is easy for current users of the decennial long form to see what information would be lost by dropping the long form, but the benefits of a new program or new methods may not always be obvious. Simulated data products should be an excellent tool for engaging long-form data users and for measuring potential demand and acceptance of a continuous measurement program. Simulated data products can be developed from census and current survey data. Also, as mentioned above, the Census Bureau will begin a small random- digit dialing telephone survey of 2,000 households per month in four sites starting in November 1994 and continuing for 6 to 9 months. One purpose of this test survey is to provide demonstration files of cumulative estimates to distribute to census data users. The estimates will be accumulations of cross-section or snap- shot surveys. The RDD survey will be followed in October 1995 by a MAF- based test survey in four sites of at least 4,000 households per month and will provide data users with additional experience using cumulative estimates. The demonstration files are intended to get user reactions to moving averages and to determine the level of demand and acceptance of continuous measurement data. In addition to providing data in the development of simulated data products, the test surveys will also help to identify and define operational issues. Recommendation 6.2: The Census Bureau should initiate discussions with all potential users of continuous measurement data, including state and local governments and private-sector users. A research program should be developed to answer user questions. The Census Bureau should also develop a program to inform data users of the simulated data products emerging from the test surveys and to get their reactions. Total Error and Frequency of Data Products Will data users regard more frequent cumulative estimates as superior or inferior to single-point-in-time data once per decade? Ultimately, the question to

OCR for page 178
188 COUNTING PEOPLE IN THE INFORMATION AGE be addressed is the extent to which continuous measurement improves accuracy relative to its cost. Because continuous measurement estimates will be based on the same sample size, for each S-year moving average, as the census long-form sample data estimates were based in 1990, the sampling errors for these two different types of estimates should be comparable. If continuous measurement estimates are subject to so much less bias than once-per-decade estimates, on average across the decade, that their total errors are substantially smaller, then it is likely that continuous measurement will be cost-effective. If the bias that results from outdatedness over time of once-per-decade estimates is modest com- pared with the standard error of sampling, then continuous measurement will offer little relative advantage. Clearly the situation will vary with the level of geography and the character- istics being estimated. At broader geographic levels, sampling errors (for S-year moving averages) will be small; hence, the biases that result from outdatedness will be relatively major. Certain characteristics (e.g., housing characteristics) change relatively slowly over a 10-year period for most geographic areas, and in these cases the benefits of continuous measurement will probably be modest. Conversely, the benefits may be considerable for characteristics that change rela- tively quickly in many geographic areas in the years after the decennial census. To reach conclusions about the benefits of continuous measurement, it is neces- sary to weigh the importance of accuracy (i.e., mean square error) across different levels of geography and across estimates of different characteristics. Recommendation 6.3: The Census Bureau should evaluate the gains in accuracy that may be offered by continuous measurement for estimates of various characteristics at varying levels of geography. In making accuracy assessments, the Census Bureau should take full advantage of simulations, based on existing census and survey data, to provide realis- tic scenarios for the changes in estimates over time. As part of its out- reach program, the Census Bureau should provide long-form data users with accompanying estimates of bias and precision for various geo- graphic levels and aggregations of one to five years of data. Costs of Long-Form Data Collection Cost estimates of operating a continuous measurement program and the po- tential savings from eliminating long-form questions are not yet well defined. Various assumptions about the intercensal long-form survey relating to such matters as the cost of frame maintenance, response rates, follow-up effort re- quired, and the percentage of interviews completed using various data collection modes (mail, telephone, and personal visit) are based on very limited knowledge. The prototype test surveys could be very helpful in refining assumptions used for cost estimates and determining the direct costs of various operations.

OCR for page 178
92 COUNTING PEOPLE IN THE INFORMATION AGE In summary, then, although it is possible that the decennial count could be improved by dropping the long form, the panel believes that the extent of such improvement is likely to be small, with little effect on differential coverage. Furthermore, any effects on net or differential coverage would be corrected by the use of integrated coverage measurement procedures in producing the final census population totals. Regardless of whether the long form is eliminated or reduced in size, how- ever, continuous measurement could have several positive consequences for the decennial census. Operation of a continuous measurement program would im- prove the ability of the Census Bureau to maintain a continuous presence in local areas over the decade, which would enable those concerned with response and coverage improvement to conduct more effective outreach programs and im- prove public response to the census. Also, a higher-quality MAP would be available for the decennial census operation because of the more extensive regu- lar updating that would be required to support continuous measurement. A higher-quality MAF should result in fewer missed dwellings and fewer erroneous address inclusions, thus reducing the costs of mail nonresponse follow-up and decreasing the level of undercoverage due to missed housing units. Data Quality As noted above, response rates are a major factor in determining the quality of data from census and survey operations. Lower mail response rates (relative to the decennial short form) and difficulties in nonresponse follow-up, especially for minority populations, have been cited as evidence of significant problems in the quality of data on sociodemographic characteristics provided by the decennial long form. In the 1990 census, data were gathered indirectly, either by imputa- tion or from someone outside the household, for 14.4 percent of black non- Hispanic households and 10.2 percent of Hispanic households that were mailed the long form. The corresponding percentages for the short form were 4.9 per- cent and 3.3 percent, respectively (Ericksen et al., 1991~. In a continuous measurement program, improvements in data quality should result from a greater ability to develop and retain a well-trained field and opera- tions staff, a uniform workload for managing and controlling operations, and more cost-effective use of hardware and software for data collection and manage- ment systems. With a continuing operation such as would be needed to conduct a continuous measurement program, repeated opportunities are available to refine and improve the design, field and data processing procedures, survey instru- ments, and estimation procedures. The continuing presence of an experienced staff and an established operation would lead to a greatly reduced risk (compared with a once-a-decade collection) that a serious unforeseen problem will arise and would introduce efficiencies into the operations that are not possible with a major effort mounted once a decade.

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 193 However, as noted in the preceding section on costs, potential improvements in data quality must be weighed against the potential challenges of operating in a noncensus environment. Poor response rates and difficulties in follow-up could result in lower data quality data from continuous measurement. As in matters of cost, large uncertainties prevent reaching definitive conclusions about data qual- ity at this time. Further progress in the Census Bureau's research and develop- ment program should provide sufficient evidence on which to make informed comparative assessments of data quality from the decennial long form and from an intercensal long-form survey. Changes in Survey Form and Content A concern often raised during discussions of continuous measurement as an alternative to the census long form is that pressures for changes in the continuous measurement content and design throughout the decade (whether through budget reductions or emerging new data needs) would lead to a loss in comparability over time. Although the panel agrees that such pressures are inevitable, we do not believe that they are unmanageable or that the possibility of such problems should weigh heavily in the decision on whether to proceed with the continuous measurement research and testing program. Significant fluctuations in budget allocations to a continuous measurement program would create serious difficulties for the planned output of such a pro- gram, but, should such fluctuations occur, changes in sample size (rather than content) would be the most likely response. Such changes would have an impact on geographic and subpopulation detail and accuracy and would reduce the fre- quency of the output. Demands for changes in content would be most likely to take the form of requests for new questions or additional detail on existing topics, and the re- sponse to such demands could most easily be handled through the conduct of supplementary surveys. Other Potential Benefits of a Continuous Measurement Program In addition to providing data more frequently, a continuous measurement program could potentially offer other direct benefits to the Census Bureau and the federal statistical system. Supplements to Monthly Collections A continuous measurement program, conducted on a monthly basis, would provide an excellent vehicle for conducting supplementary surveys i.e., addi- tional questions on specific topics. Such questions could be asked at the same time as the regular set or could be posed in the form of a subsequent follow-up

OCR for page 178
94 COUNTING PEOPLE IN THE INFORMaTION AGE targeted at households or individuals with particular characteristics as determined by the standard questionnaire. A supplement could be included for one month only, to give estimates for a particular topic of interest at a broad geographic level. The same supplement could be repeated annually or at longer periods to give national and major subnational estimates of change over time. Alterna- tively, the same supplement could be repeated for several successive months to give estimates at finer geographic levels. The value of such supplementary inquiries, whether directed at completely new topics or at obtaining deeper insights into topics covered in the regular questionnaire, has been demonstrated clearly in other continuing household sur- veys in the United States and abroad. The panel believes that careful exploration of supplementary survey capabilities should be included as part of the research and testing plan for the continuous measurement program. In using the continuous measurement program to collect supplementary data or as a screening device (see below), care would be needed to ensure that the presence of these extra components did not negatively affect the response rates, or change the respondents' answers to the core questions, thus affecting move- ment over time estimates for these core components. Sample Frame for Current Demographic Surveys The Master Address File itself might constitute a high-quality, cost-effective frame for other demographic surveys. The Census Bureau conducts a number of periodic household surveys- for example, the Current Population Survey, the National Health Interview Survey, and the Survey of Income and Program Par- ticipation as well as one-time surveys, and the benefits of having an up-to-date MAF as a sampling frame for these surveys are potentially great. Depending on the legal requirements with regard to the confidentiality of the MAF data, perhaps this frame could be used by other federal statistical agencies for their own sur- veys (not conducted by the Census Bureau). If Title 13 of the United States Code is amended to permit access to allow sharing of address lists with federal, state, and local officials, as has been proposed (see Recommendation 2.4 of this report and Bureau of the Census, 1994g:March), then not only would the use of the MAF as a frame be possible at a federal level, but also it could be used by state and local agencies for conducting surveys. Screening Device for New Demographic Surveys A second feature would be the use of the ILF survey itself as a screening device for rare populations, which would be subsequently surveyed at a later time via telephone or personal visit. This feature would make feasible surveys that otherwise would have prohibitive screening costs. Thus, a continuous measure- ment program has the very real potential to enhance the data collection capability

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 195 of the Census Bureau (and the federal statistical system more generally) to in- clude areas that heretofore have not been practical because of cost considerations. It would be difficult to quantify the value of such capabilities and to credit the continuous measurement system with potential cost savings. Nevertheless, this important potential benefit should be recognized when evaluating the prototype continuous measurement system. Support for Research and Development Initiatives Continuous measurement would create a more conducive environment for statistical use of administrative records by providing the opportunity for periodic checks on the quality of administrative records to be incorporated in the inte . grated estimates program. Summary The panel believes that the ongoing thorough review of census requirements, costs, and methods presents an opportunity to undertake a full evaluation of the viability and desirability of instituting a permanent continuous data collection program to obtain traditional census data on population and housing characteris- tics. We believe that the efforts by the Census Bureau to develop a prototype for continuous measurement provide a very promising start to this process. We are especially encouraged that a project team has been established and has begun to carry out work on a program of evaluation for continuous measurement. Our position is that considered judgments about the merits and drawbacks of continu- ous measurement can be made only after extensive study to describe exactly how such a system would operate, what it would produce, and how much it would cost. The Census Bureau's initiation of a project team, along with a development plan, promises that such information will become available through an active program of research and development. Much of the interest and discussion surrounding continuous measurement to date has concerned its cost when fully implemented (Committee on National Statistics, 1993c). As we stated in our interim report, this panel is not convinced that continuous measurement would provide a less costly alternative to the tradi- tional long form. What continuous measurement would offer is greater fre- quency of small-area sample data and, possibly, improved data quality. Benefits for the decennial enumeration of the population might result from the removal of the requirements to collect and tabulate sample data as part of the decennial census operation, but the evidence for such benefits is not well documented. Many interesting issues surround an evaluation of the merits and feasibility of a continuous measurement program to collect small-area sample data. Many aspects of this program need to be developed and evaluated in concert, and a great deal more needs to be tested and learned on all fronts before rational

OCR for page 178
96 COUNTING PEOPLE IN THE INFORMATION AGE decisions about the prospects for such a scheme can be reached. In addition to the question of whether a continuous measurement program such as the one proposed has merit over the long term, there are additional questions relating to the feasi- bility of introducing a continuous measurement program of sufficient quality and scope in time to act as a viable replacement for any small-area data collection (i.e., a long form) as part of the 2000 census. Another important consideration in judging the merits of continuous mea- surement is the relationship between the decennial enumeration and statutory requirements for information. A recent review (Bureau of the Census, 1994a) found that legislative mandates exist for most of the items collected on the 1990 census long form, but the statutes do not specify that the data must be collected in the decennial census. The key question is whether the decennial census is the most appropriate vehicle for collecting this information. Considerable research and development must occur to answer that question and thus determine the extent to which continuous measurement might replace, rather than supplement, data collection using the decennial census long form. Continuous measurement represents a fundamental change in methodology for obtaining data of the type traditionally collected on the decennial census long form. It has implications that extend far beyond issues of coverage, cost, quality, and frequency. In particular, the relationship of a continuous measurement pro- gram to other federal government surveys and to state and local governments is a very important topic that lies beyond the scope of the panel's work. Careful evaluation and widespread consideration of its implications will be needed to clarify the merits of this proposal. MATRIX SAMPLING The more modest of the two proposed alternatives to long-form data collec- tion is to change the nature of the decennial collection of small-area sample data by using a technique known as matrix sampling to reduce the respondent burden on individual households. In this section, we discuss the general approach and the Census Bureau's plans for research on matrix sampling. As indicated at the beginning of this chapter, there is concern that the respon- dent burden imposed by the use the of census long form, as in the 1990 and previous censuses, will give rise to reduced mail return rates in the 2000 census. Matrix sampling is a technique designed to spread and reduce the response bur- den, while meeting small-area data needs. Overview of Matrix Sampling Matrix sampling is a technique to spread the respondent burden associated with collecting a given quantity of data across a larger group of respondents than would be the case without its use. Thus, rather than collecting data on a set of m

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 197 data items from n respondents, a group of somewhat more than n respondents is included (generally, severalfold n), with each asked to respond to fewer than m items (generally, substantially fewer). This is done in such a way that the reliabil- ity requirements for estimates and tabulations are equivalent (or even superior) to those that would be achieved using n respondents to all items, while collecting just nm total item responses, or perhaps somewhat fewer (so that total burden is in fact reduced, as well as spread). The use of matrix sampling obviously implies the use of several different data collection forms, each to be administered to a random subsample of the full respondent sample. In any application this increases the complexity of adminis- tering the data collection, processing the data, and analyzing the data. To be worthwhile, therefore, a matrix sampling plan must achieve benefits through the increased spread, and possible overall reduction, in respondent burden. Later, we discuss the kinds of circumstances in which these benefits would be realized. The Census Bureau is considering the use of matrix sampling for the collec- tion of sample data (long-form data) in the 2000 census. By spreading the response burden, this methodology reduces the maximum burden on any single responding household. The aim is to increase the mail response rate for forms containing long-form data, thus increasing the quality and reducing the cost of census data collection. For this to be successful, it is necessary that the reduction in nonresponse achieve a critical threshold through the reduction in content asked of each respon- dent. Otherwise, matrix sampling may actually act to increase cost. Suppose that, as in the 1990 census, a proportion p of the household population are asked to complete a long form, with the remaining households completing a short form. Suppose that the short-form response rate is rS, and the long-form rate is r,, where rat < rS. Then, the overall response rate is prl + (1 -p) rS = rat + (1 -p)(rS - rig. Now suppose that with matrix sampling the proportion of the population asked to complete a form with something other than basic short-form data is kp (where k > 1 and kp ~ 1), and that the response rate for the matrix sampling forms is rm, where rat ~ rm < rS Then, the overall response rate under matrix sampling is kp rm + (1 - kp) rS = rm + (1 - kp)(rS - rm). Hence, matrix sampling can be beneficial only if rm + (1 - kp)(r5 - rm) > rat + (1 - p)(rS - rig This is achieved provided that rS rm, the quantity (rm - r~/(k - 1) must be sizable to make the gains worthwhile. In other words, the difference between the response rates for the matrix form and the long form must be more than (k- 1) times as great as the difference between the response rates for the short form and the matrix form.

OCR for page 178
98 COUNTING PEOPLE IN THE INFORMATION AGE For example, suppose that instead of using a single long form given to 16.7 percent of households (p = 0.167), three different matrix forms were constructed, each containing one-third of the total content of sampled items. Suppose each form is given to 16.7 percent of households, so that overall 50 percent of house- holds receive a matrix form (k = 3), with the other 50 percent receiving a short form. Suppose that the mail return rate for the short form is 80 percent (rS = 0.8), whereas the return rate for the long form is 70 percent (re = 0~7~. Then matrix sampling will increase mail response overall provided that the average mail re- turn rate for the matrix forms (rm ~ satisfies the inequality 0.8< rm + (rm-0.7~/~3-1~; that is, the matrix form response rate must exceed 76.7 percent (rm > 0.767), so that it is almost equal to the short form return rate of 80 percent. If matrix forms were to achieve an overall response rate of 79 percent (rm = 0~79), almost equal to the short form rate of 80 percent, then the overall mail return rate would be 79.5 percent, only a modest increase from the 78.3 percent overall rate that would be achieved with a long form and no matrix sampling. The example described above would not permit any cross-tabulations to be produced among items from the different matrix forms. This is likely to be such an important requirement that it seems very probable that, if there were to be three matrix forms, on average each would have to include substantially more than one-third of the items. In such a case it seems likely that the return rate rm is more likely to be closer to the long-form response rate Cry ~ than to the short form rate (rS). Simply put, to be useful matrix sampling will have to eliminate a very large proportion of the differential response rates between the short and long forms. Current evidence suggests that this is unlikely to happen. Before reviewing the thinking behind the use of matrix sampling for the 2000 census, and considering how to evaluate its possible use, we will review the general circumstances under which matrix sampling is an effective data collection device. Conditions Favorable to Matrix Sampling The following conditions lend themselves to matrix sampling: 1. Collection of all data items from a single given respondent is impossible or impracticable or leads to very low response rates. 2. There is interest in statistics that are aggregated across items. 3. There is little interest in cross-tabulations, or at least only a subset of cross-tabulations are of interest. 4. Different sampling rates are desirable for different items. 5. There is a strong relationship among responses to various subsets of the items, which permit reductions in variance through estimation techniques that model (directly or indirectly) the relationships among variables.

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 199 Sample surveys to assess the educational achievement of populations often meet with circumstances very favorable to the use of matrix sampling. Often a large pool of test items is developed to cover the full range of materials to be assessed. This makes it infeasible to think of giving the whole battery of test items to a given student. To make the data collection workable, each student must be assessed using a subset of the test. The prime statistics of interest are those that aggregate across the test items. The mean score on a particular item is of much less interest than the mean score on the whole test. Every responding student contributes data to such an estimate. Intercorrelations among test items are of some interest, and the matrix forms are developed so that those of particu- lar interest can be estimated, but they are of less interest than the overall score. Finally, results for the individual test items are generally highly correlated, and this can be used in developing the estimation procedure for the statistics on the distribution of the test scores. A student's achievement on a well-chosen but relatively small subset of the test items is highly predictive of the student's score on the whole test. For an example of the use of matrix sampling in an assessment of educational achievement, see Beaton and Zwick (19924. It is notable that, of the five conditions outlined above, four of them in general do not apply strongly in the case of census long-form data. The mail response rate for the long form in 1990 was lower than that of the short form, but not greatly so (Keeley, 1993~. There is considerable interest in many of the cross-tabulations, not only two-way but also higher-dimensions. In fact, these may well be the statistics of prime interest, both at the small-area level and for larger areas. For the most part, there is little or no interest in statistics aggre- gated across items. In those areas of such interest, total income derived from components, for example, it would seem very problematic to spread these items across different forms (asking some households to report some components of income and other households to report different components), and even this is of little or no use for statistics other than mean income. Finally, although there may be possible gains in efficiency that can be obtained through estimation procedures that utilize the intercorrelations among the variables, this is not immediately evident and has not yet appeared in the Census Bureau's plans for evaluating matrix sampling. The one property favorable to the use of matrix sampling in its broadest sense that might apply in the 2000 census is that of having different sample rates for different items. There might be requirements to have some data at a finer level of geographic detail than others (but without the need for full enumeration in either case). In the absence of the other conditions favorable to matrix sam- pling, such a requirement is probably best served by having a sequence of nested long forms, with each form containing a subset of the items contained on each longer form. As noted in the panel's interim report (p. 36), this variant of matrix sampling has been used in the 1950, 1960, and 1970 censuses.

OCR for page 178
200 COUNTING PEOPLE IN THE INFORMATION AGE Matrix Sampling in the 1995 Census Test The Census Bureau has proposed the use of some form of matrix sample design in its 1995 census test and has identified two purposes for using matrix sampling in the test. The first is to obtain information about the impact of the operational complexities that arise from using several different forms (in addition to the short form) with varying sampling rates. The second is to obtain informa- tion about the relative response rates for the different forms. Two alternative plans have been proposed by the Census Bureau in its discussion with the panel. The first plan involves four different forms of the long form. One contains all of the long-form items, and the other three each include all items from two of the three broad areas of interest social, economic, and housing. One possible imple- mentation of this design calls for 10 percent of those receiving other than a short form to receive the full long form, and 30 percent of such households would receive the other three "medium" forms. It has been proposed that for the 1995 census test, 80 percent of households receive a short form, 6 percent receive each of the three medium forms, and the remaining 2 percent receive the long form. Thus, any given long-form questions will be asked in 14 percent of households. The second proposal involves a series of nested forms. There would be three forms, in addition to the short form. The longest of these would include the full content to be collected. The second extended form contains a subset of the items on the full form, and the third will be even shorter still, containing a subset of the items from the second form. As indicated above, this variant of matrix sampling has been used in the 1950 and 1960 censuses for housing items and was used extensively for both population and housing items in 1970, albeit with only two different forms. We encourage the Census Bureau to review carefully the history of the use of matrix sampling and to ensure that the information to be obtained from the census test has not actually been well established in the past. The designs proposed for the 1995 census test may be effective in evaluating the two aspects targeted by the Census Bureau. It is very important that there be a realization both inside and outside the Census Bureau that the information to be obtained from the census test will not be adequate of itself to determine whether matrix sampling is a suitable approach to long-form data collection in 2000. If the use of matrix sampling in the 1995 census test demonstrates its operational feasibility (and its use in three previous censuses suggests that it should be feasible), then a substantial research program is needed to establish the worth of proceeding with it. This is discussed in the next section. The panel does not think it is likely that the first of the above matrix sam- pling plans proposed for the 1995 census test will demonstrate substantial im- provement in response rates. The small difference between the long- and short- form mail response rates in 1990, and the fact that the three medium length forms require, on average, two-thirds as long to complete as the long form (relatively longer if one includes the short-form data in such reckoning) combine to make it appear unlikely that the medium-length forms will achieve response rates notice

OCR for page 178
ALTERNATIVES FOR LONG-FORM DATA COLLECTION 201 ably higher than for the long form itself. The second approach of using nested forms seems much more likely to give useful information about the effect of form length and content on response rates. Matrix Sampling for the 2000 Census If the 1995 census test indicates that matrix sampling is operationally fea- sible, then the panel believes that it is important that the Census Bureau undertake fundamental research on three fronts before the case for using matrix sampling in the 2000 census can be sufficiently established. The three areas in which re- search is required are: (1) establishing the relationship between form length and content with mail response rates, (2) ascertaining requirements for cross-tabula- tions, at the small-area level, of the data to be captured on a sample basis, and (3) investigating possible gains in estimation efficiency by using the intercorrelation structure of the data (i.e., composite estimation). The results of this research can be combined with the findings from the 1995 census test and past censuses concerning the operational costs and complexity of matrix sampling. To establish the relationship between form length and composition and mail nonresponse, a program of cognitive research and experimental studies will be needed. The cognitive research should shed guidance as to which aspects of the long form give rise to total mail nonresponse and the extent to which the length of the form per se is a factor. Experimental studies can be used to evaluate the effect of various proposed forms on mail response rates, although, as with other studies of mail response rates, the interpretation of the findings will be hampered by the fact that the tests are not conducted in the atmosphere of public awareness that surrounds a census. On the basis of experience both in the census and in other survey settings, one aspect that is likely to have a substantial impact on the long-form response rate is the presence and format of the questions on income. Thus, in investigating mail response rates and matrix sampling, we urge the Census Bureau to research fully the exact requirements for the income data at the small geographic level. It would seem likely that mail response rates might be improved somewhat by ensuring that income is not asked any more often than is necessary. If other data items are required at a finer geographic level than is income, these could be included on a shortened version of the long form that does not include income. That said, and without detailed insight into census data requirements, the panel does not really expect that there will be any data items requiring a finer level of geographic breakdown than does income. If indeed the presence of income questions does prove to be a major determinant of long-form response rates, then it will be important to intensify cognitive and experimental research into the best format for asking income questions, so as to have the least negative impact on mail response rates. At the same time, the Census Bureau needs to conduct an evaluation of the requirements for cross-tabulations from the 2000 census, especially at the small

OCR for page 178
202 COUNTING PEOPLE IN THE INFORMATION AGE area level. The establishment of sets of cross-tabulations that are required, and at what level of precision, will drive and define the determination of possible alter- native census forms for use in matrix sampling. Perhaps not all cross-tabulations require the same level of accuracy. This might lead to the development of a long form and one or more medium-length forms containing a subset of items from the long form, the subset being those items for which more precise cross-tabulations are required. The third avenue of research is to examine the possible development of composite estimation and other techniques that make use of the correlation be- tween two items that are both present on some census forms, but only one of which is available on other forms. These estimators will utilize this correlation to give estimates that are more precise than those obtained by just weighting the data using the inverse of selection probabilities and perhaps posts/ratifying to some control totals. It would be particularly worthwhile to see if such procedures can be developed that improve precision for estimates of cross-tabulations. If a nested sequence of forms is used, for example, then a stratified (or ratio) multi- phase sample estimator might be used to give greater reliability for tabulations of those characteristics included only on the longest of the forms. All of this research into the effectiveness of matrix sampling makes sense only in the context of a given set of content. Thus, the Census Bureau will be limited in its ability to evaluate matrix sampling appropriately until such time as the content requirements for the 2000 census become reasonably clear. Conse- quently, the plan to investigate the operational complexities in the 1995 census test, and then proceed to investigate the other aspects of matrix sampling as content requirements become clearer, seems a sound one. Finding sufficient developmental lead time to evaluate fully plans for matrix sampling may be difficult as a result of the dependence on content. Recommendation 6.5: The panel endorses the Census Bureau's plan to investigate the impact of form length and content on mail response rates in the 1995 census test. Even if the operational feasibility of multiple sample forms is confirmed in the 1995 census test, the Census Bureau should not introduce matrix sampling without undertaking further re- search. Such research should be assigned low priority relative to other decennial census research projects. On the basis of the evidence that we have seen to date, the panel judges it unlikely that matrix sampling will present an effective alternative to long-form data collection in 2000. Given that content is unlikely to be increased substan- tially beyond that of l99O, it does not appear likely that the conditions will exist that are needed to make matrix sampling an effective option for the census. The most likely possibility is that there would be a long form and a medium form containing that subset of the long-form data items for which the most precise cross-tabulations are required.