ADDITIONAL COMUNICATION TO PANEL VIA E-MAIL FROM KATHY DOWNEY ON NOVEMBER 11, 2011
This is to followup on our conference call this afternoon. As Adam and I stated, Mike Horrigan (OPLC associate commissioner), Jay Ryan (CE division chief), and John Layng (CPI division chief) were concerned about comments overheard at the 10/27 CNSTAT meeting regarding the difficulty in designing a survey that meets all of the CPI requirements. After some discussion, the program managers decided that the CE data requirements document is sufficient.
Therefore, contrary to previous direction to the panel that both the CPI Requirements of the CE (William Casey, June 17, 2010) and the CE Data Requirements (Henderson, Passero, Rogers, Ryan, Safir, May 24, 2011) collectively form the requirements for the survey, the program managers ask that the panel members treat the CE Data Requirements as the mandatory requirements for the survey. The CPI data requirements document is still helpful in terms of providing larger context for data usage, but these are not requirements that the panel’s recommendations needs to meet. We hope that this relaxation of constraints provides the panel with greater flexibility in considering their recommended design changes.
1. CE mission statement:
a. The mission of the Consumer Expenditure Survey program (CE) is to collect, produce, and disseminate information that presents a statistical picture of consumer spending for the Consumer Price Index, government agencies, and private data users. The mission encompasses analyzing CE data to produce socio-economic studies of consumer spending and providing CE data users with assistance, education, and tools for working with the data. CE supports the mission of the Bureau of Labor Statistics, and therefore CE data must be of consistently high statistical quality, relevant, and timely, and it must protect respondent confidentiality.
2. The technical aspects of CNSTAT’s task from the Summary Statement of Work in the proposal are as follows:
a. The National Research Council, through its Committee on National Statistics, will convene an Expert Panel to contribute to the planned redesign of the Consumer Expenditure (CE) Surveys by the U.S. Bureau of Labor Statistics (BLS).
b. The Panel will review the output of a data users’ needs forum and a methods workshop, both convened by BLS.
c. The Panel will conduct a household survey data producer workshop to ascertain the experience of leading survey organizations in dealing with the types of challenges faced by the CE surveys.
d. The Panel will conduct a workshop on redesign options for the CE surveys.
e. The redesign options workshop will be based on papers on design options the Panel commissions from one or more organizations.
f. Based on the workshops and its deliberations, the Panel will produce a consensus report at the conclusion of a 24-month study with findings and recommendations for BLS to consider in determining the characteristics of the redesigned CE surveys.
3. What CE expects from the report:
a. The report should synthesize information gathered through the BLS data users’ needs forum, BLS methods workshop, CNSTAT household survey data producer workshop, CNSTAT CE redesign
i. The design recommendations should include a menu of comprehensive design options with the highest potential, not one specific all-or-nothing design.
ii. The design recommendations should be flexible to allow for variation in program budget, staffing resources and skills, ability of the data collection contractors to implement, legal agreements to be obtained (e.g., access to other data sources), etc.
b. The report will include recommendations about future research that needs to be done, but that is not the focus. As much as possible, the focus should be on concrete design proposals that could be implemented.
c. It should focus on a comprehensive design, and include an approximate timeline for development, pilot testing, and implementation. This timeline should not exceed 5 years for development and pilot testing, and a new survey in the field within 10 years.
d. In the recommendation, the Panel should focus special attention on addressing issues with the current CE surveys:
i. Underreporting of expenditures
ii. Fundamental changes in the social environment for collection of survey data
iii. Fundamental changes in the retail environment (e.g., online spending, automatic payments)
iv. The potential availability of large amounts of expenditure data from a relatively small number of intermediaries such as credit card companies
v. Declining response rates at the unit, wave and item levels
e. The Panel should develop a carefully balanced evaluation of the prospective benefits, costs, and risks of their proposed design recommendations compared to the current CE surveys. The evaluation should include a consideration of the following factors:
i. The evaluation be based on extensive and carefully balanced evaluation of literature and industry knowledge on methodology and practice that is currently available or likely to be available in practical form in the next five years;
ii. Data collection technologies currently available or likely to be available in practical form with the next five years;
iii. Administrative record and external data sources and technologies currently available or likely to be available in practical form with the next five years; and
iv. The evaluation should be reflective of the tradeoff between cost and improvement on measurement error.
4. “Yes, there are two paths you can go by, but in the long run there’s still time to change the road you’re on.” (Jimmy Page and Robert Plant, Stairway to Heaven) a. CE is pursuing two roads to the redesign:
i. a redesign from scratch, and
ii. changes within the current design
b. The focus of the Panel should be on the redesign from scratch (4.a.i). In doing so, BLS would like the Panel to keep the following considerations in mind:
i. The Panel should be aware of the research that CE is undertaking to improve the current design.
ii. In considering a new design options, CE is particularly interested in approaches that focus on proactive approaches to gathering expenditure data—whether they be from records, receipts, etc. or by providing respondents the ability to easily record purchases in real time. While retrieval of data from memory in a standard reactive interview is appropriate for a number of data elements, CE views a proactive data collection methodology for expenditure data as a high priority.
c. As mentioned, CE is currently researching or is planning to research a number of ideas for improving the current design, including a Web diary, individual diaries, streamlining the Interview survey, reducing the length of the bounding interview, double placement of diaries, reconciliation of expenditures and income/assets, etc. (4.a.ii).
a. Maintain same budget.
b. Maintain value of the survey to taxpayers and data users.
6. What we know:
a. CE needs to support CPI needs.
b. CE needs to support other data users as much as possible as long as the design to meet those needs meets the core CE mission.
c. What makes CE unique is the complete picture of spending, in all categories, at the household level, with household income, assets, and demographics.
7. What we don’t know:
a. The final level of expenditure detail needed to support CPI’s needs after redesign.
i. CE has a very detailed set of current technical requirements from CPI (http://www.bls.gov/cex/duf2010casey2.pdf).
1. For example, there are cases where the level of detail in the CE is not sufficient for CPI, such as for gasoline, food away from home, and medical care procedures.
2. Also, there are cases where the CE sample size is not sufficient for CPI’s purposes such as in calculating Entry Level Index selection probabilities at the PSU level or calculating base period weights using annual calendar periods.
iii. CE is currently looking anew at its own data requirements and in that process will attempt to clearly state where it can and cannot meet CPI’s needs in terms of CPI’s current detailed technical requirements. A report will be completed by the end of April, in advance of the award of the contract for the redesign option RFP.
iv. As the redesign process develops it is critical that ongoing dialog be maintained between CE and CPI in terms of how the redesign options would affect/change the CPI’s current detailed technical requirements.
v. In particular, CPI will need to make assessments as to the efficacy of the inputs received from CE, along with possible alternative approaches, to meet its technical requirements.
vi. BLS views this dialog as an iterative process that must accompany the evaluation of redesign options.
b. What importance should we place on possible future CPI information needs that could be provided by a redesigned CE?
i. Rob Cage’s presentation, along with the document in 7.a.i above, will outline some possible future CPI information needs that could be provided possibly by a redesigned CE.
ii. Please note: These possible future CPI information needs are not requirements of the redesigned CE. CE views these future information needs as ones to be evaluated in terms of the following prioritized goals:
1. Does the redesign meet the data needs of CE?
2. Does the redesign meet the current requirements of CPI, an assessment of which includes an evaluation by CPI of the efficacy of alternative approaches in the cases where the redesigned CE does not meet its current technical detailed requirements?
3. Within the framework of the redesigned CE, is there sufficient flexibility, especially with respect to time and cognitive burden, to collect additional data from respondents that could meet possible future information needs of CPI?
d. All of the feasible technological solutions for data collection.
e. Data users’ reaction to collecting less than the complete picture of spending and using more imputed/modeled data to create that missing data.
i. That is, would they find it acceptable to collect fewer data, either as part of a multiple matrix design, or because there are some expenses we won’t collect, either because they are too hard to collect (like tolls on trips) or because they are such a small percentage of total spending (like reading materials)?
ii. Whether an approach to impute/model for a much larger amount of missing data is feasible depends on the reaction of data users and issues related to staffing and implementing a much larger statistical modeling system into production.
iii. Would a split sample and data collection design be feasible—one that is based on a smaller sample for which all expenditures are collected and a larger sample that takes advantage of matrix sampling and greatly reduces the burden of any given interview.
8. Consensus on the design so far:
a. CE needs to publish a complete picture of spending, but we do not need to collect all of those data directly from respondents.
b. To reduce burden and improve data quality, CE is interested in moving away from a retrospective recall-based design to one that is more proactive.
i. The current Interview design calls for collecting almost all categories of spending from all households (the Diary is used to collect some small frequently purchased items, food, and clothing). For the most part, this collection is done through a three-month recall.
ii. The proposed design should not be based on a retrospective recall survey, but instead should focus on features that are proactive in collecting information from respondents or other sources. These design elements would be fundamentally different from those of the current CE surveys, and potentially include innovative features such as the use of mobile devices (e.g., smart phones, PDAs, tablets), financial software, electronic purchase records, receipt scanning, and auxiliary data.
iii. Retrospective recall may be incorporated into the proposed design as a method of “filling in gaps” or collecting information not otherwise provided.
c. The constraint of maintaining the current budget needs to be considered, particularly since moving data capture from a respondent recall–based approach to one involving greater use of technology and data extraction from receipts, scanners, and administrative sources has the potential to increase collection costs.
d. The CE program produces two main data products: published tables and microdata files. The current design is based on the idea that two surveys are needed to get a complete picture of spending—the Interview for large or regular purchases and the Diary for small or difficult-to-recall items. In the CE production process, data from the Interview and Diary are integrated at an aggregate level for publication tables; they are not integrated for the microdata files. (It may be possible to create synthetic households from the two surveys at the micro-level, but the CE program has not attempted this due to the difficulty of the project, limited resources, and the fact that historically the microdata were viewed as secondary to the publications.) This presents a problem for microdata users and means that usually they will use data from one survey or the other, but not both. With the possibility that a redesigned CE may capture data from many sources (e.g., scanners, receipts, diaries, recall interviews, administrative sources, etc.) this problem may be exacerbated. It is important that the CE program continue to make available quality microdata files to the public. These data files may include synthetic data that account for missing data in order to give a complete picture of spending at the household level. The redesigned CE must allow for a straightforward integration of the various data sources into one complete picture of spending at the microdata level. (Note: this is a new requirement which is not met by the current design/processing system.) As with any work that involves imputation and synthetic data, practical implementation of this approach will require a complex balance of multiple factors, including (a) implicit or explicit modeling assumptions; (b) the extent to which those assumptions are consistent with the data for specific subpopulations and expenditure types; (c) bias and variance effects arising from (a) and (b); (d) costs and complexity for the statistical agency; and (e) costs and complexity for the final data user.