precisely only by using batteries of many questionnaire items has a solid conceptual and theoretical justification. Any research participant’s report of a past behavior, mood state, action tendency, hope, attitude, or goal will, of necessity, contain some random measurement error, both because of ambiguity in the memories and other internal psychological cues consulted when making the rating, and because of ambiguity in the meanings of the words used in the question and the words in the offered answer choices. As documented by the Spearman-Brown prophecy formula, the greater the number of questions asked to tap a construct, the more effectively the random measurement error in each item is cancelled out by that in the others, yielding a precise assessment of the variance shared across the items. This is why the measurement of a personality attribute, an attitude, and any other such construct is routinely accomplished by asking respondents to answer a remarkably large set of questions tapping the same thing. This practice is frustrating to some researchers, because participants can only answer a limited number of questions within the time frame of a study’s data collection budget, so the more questions that tap a single construct, the fewer constructs can be gauged in a single study. The enhanced precision of assessment has routinely been preferred by researchers at the expense of breadth of construct sets and at the expense of participant fatigue in answering what may appear to be the same question over and over.

However, the above logic ignores an important fact of assessment: systematic measurement error. It has long been recognized that any measuring instrument may be biased, so error in its assessments may not all be random. And if the same bias is present in a set of questions all measuring the same construct, combining responses will not yield a canceling out of the error. Indeed, combining responses will cause the shared error to represent an increasingly prominent proportion of the variance of the final index, as the number of items combined increases and the amount of random error decreases. Thus, averaging or adding together responses to large sets of items to create indices is not the solution to all measurement problems.

Ironically, much of the shared bias in questions used to build indices is created unwittingly by the researchers themselves who seek to minimize measurement error. Even more strikingly, there appears to be a remarkably simple and practical solution to these problems that will make researchers and participants happier with the process and outcomes of their efforts. By avoiding the use of question formats that create random and systematic measurement error, researchers may be able to replace long batteries with sets of just two or three items that are well written, clear in meaning, and easy to answer, yielding psychometrics comparable to or better than those of long batteries while allowing for the measurement of a much broader array of constructs in a single questionnaire.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement