c. The constraint of maintaining the current budget needs to be considered, particularly since moving data capture from a respondent recall–based approach to one involving greater use of technology and data extraction from receipts, scanners, and administrative sources has the potential to increase collection costs.

d. The CE program produces two main data products: published tables and microdata files. The current design is based on the idea that two surveys are needed to get a complete picture of spending—the Interview for large or regular purchases and the Diary for small or difficult-to-recall items. In the CE production process, data from the Interview and Diary are integrated at an aggregate level for publication tables; they are not integrated for the microdata files. (It may be possible to create synthetic households from the two surveys at the micro-level, but the CE program has not attempted this due to the difficulty of the project, limited resources, and the fact that historically the microdata were viewed as secondary to the publications.) This presents a problem for microdata users and means that usually they will use data from one survey or the other, but not both. With the possibility that a redesigned CE may capture data from many sources (e.g., scanners, receipts, diaries, recall interviews, administrative sources, etc.) this problem may be exacerbated. It is important that the CE program continue to make available quality microdata files to the public. These data files may include synthetic data that account for missing data in order to give a complete picture of spending at the household level. The redesigned CE must allow for a straightforward integration of the various data sources into one complete picture of spending at the microdata level. (Note: this is a new requirement which is not met by the current design/processing system.) As with any work that involves imputation and synthetic data, practical implementation of this approach will require a complex balance of multiple factors, including (a) implicit or explicit modeling assumptions; (b) the extent to which those assumptions are consistent with the data for specific subpopulations and expenditure types; (c) bias and variance effects arising from (a) and (b); (d) costs and complexity for the statistical agency; and (e) costs and complexity for the final data user.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement