the reliability of an observation, it is important that data users be given a clear sense of how much manipulation has been made to the original data; at a minimum, a strange-looking data value might be more credible if it had an associated data flag indicating that it had been reviewed with the respondent or had been reconciled with other variables in the record. Although users may very often want to benefit from the expertise of people who process a large technical database like ARMS, they should have the opportunity to make their own decisions about how to treat data.

Data users have a need for tracking and understanding the impact of imputations for missing data. The relatively simple conditional mean imputation practices used for much of ARMS will generate data that are not appropriate for sophisticated multivariate analysis; for such work, users would need to perform their own imputations or use techniques for analyzing partially observed data. Within the metadata framework, ERS can provide signals to users regarding what values were reported and what values were imputed.

Another critical area in which such knowledge is not recorded in a usable form in ARMS is the history of management of individual survey cases. Systematic collection and organization of such data on attempted contacts with respondents, together with relevant data on interviewer, respondent, and neighborhood characteristics, are particularly important for use in understanding and potentially improving the methods of case administration as well as in understanding nonresponse and detecting nonresponse bias. In light of the relatively high nonresponse rate in ARMS, making such data available should have a high priority.

Highlighting these two issues in no way should be taken to diminish the importance of collecting and organizing other metadata and paradata. In particular, efforts should begin to collect systematic information on interviewers, to document the processes underlying questionnaire design, to document changes in interviewing practices, to note the types of records respondents use, to record any special efforts or incentives used in gain the cooperation of the respondent, and other such factors.

As ARMS moves toward computerization—a step we advocate in this report and that seems inevitable in the long run—it makes sense to build the capabilities for capturing and organizing metadata and paradata as an integral component of ARMS data collection, processing, and products. The need for metadata and paradata makes the transition to digital data collection much more urgent.


Recommendation 5.5: NASS and ERS should develop a program to define metadata and paradata for ARMS so that both can be used to identify measurement errors, facilitate analysis of data, and provide a basis for improvements to ARMS as part of the broader research and development program the panel recommends.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement